Using k6 HAR converter to record sessions

Posted by Robin on Jan 26, 2018

With the new k6 HAR converter, it is dead simple to use a browser to record a user session and then let k6 replay it in a load test. This article describes the HAR converter feature and how you can use it.

Many load testing tools today offer HTTP recording functionality of some kind, that lets you use a browser to e.g. login to your web site, surf around on it and do the various things on the site that your regular users tend to do, then create a user scenario script from the actions you took. When you later run a load test using that user scenario script, the simulated users in your load test will perform the same actions that you did.

Sometimes, recorders are more or less full web browsers that have been integrated into the  load testing tool itself. A disadvantage with that is that you can only use the browser the load testing tool vendor created, and it is likely it is not very similar to the browsers your real users are using when they go to your site. This can lead to a recording that doesn't accurately reflect the behaviour of end user browsers, or it can lead to issues getting the site/app to work properly in the recorder.

Other times, recorders are HTTP proxy-based, which is better from the standpoint that you can use more or less any browser you like, when making the recording. There are, however, disadvantages with this method also. Working with proxies is kind of painful - you have to keep reconfiguring your browser all the time and it is easy to forget whether you're currently using the proxy, or not. At Load Impact we have been using a proxy-based recording method for many years, and I can safely say it has been a mixed experience. Sometimes, it is the only recording method that works, but it is always somewhat of a pain to use.

Perhaps the best recording method of all, is when the browser itself records what is happening. That means there are no settings to keep fiddling with, and the browser will behave exactly like it does for real users accessing your site. However, most browsers have for a long time lacked any type of recording functionality. Because of this, we here at Load Impact created a plugin for Chrome that have long been our preferred way of making HTTP recordings. Just install the plugin once and you're ready to make recordings with the press of a button.

So, why not adapt the Chrome plugin to generate k6 scripts then?   Well, we thought it would be nice to support more browsers than just Chrome, and we saw that several browsers had support for saving session recordings as HAR files. HAR is a standardized format, supported by a bunch of applications, so it seemed like a logical choice to make k6 able to read it. We put out a $500 bounty on Bountysource, asking the community to help create a HAR converter/importer for k6. The resulting pull request was just merged into the master k6 branch, and is available in the new v0.19.0 release. If you install or upgrade k6 now, or use the latest k6 Docker image, you will have the new HAR support built in.

Great, how do you use it then?

Basically, you use Chrome or Firefox (Microsoft Edge can reportedly also do it), open up the developer tools and in the "Network" tab you can right-click on the list of requests/URLs and choose "Save All as HAR". Here is a screenshot from Firefox:

Screen Shot 2018-01-15 at 11.20.55.png

Note the checked "Persist logs" option also. It means the network request log will not be cleared upon a page change. Normally, you'll want the whole session on the site to be recorded, but the recorders default behaviour is to record each visited page separately, and to clear the log when you go to a new page.

The Chrome dev tools look and behave very similarly: just right-click in the area where all the URLs are listed, and you get a pop-up menu where you can choose "Save as HAR with content":

Screen Shot 2018-01-15 at 11.28.36.png

This means that you can use either Chrome or Firefox (or Edge), open up developer tools and click the "Preserve log" option, then surf around on your site and do stuff, and finally save the whole session as a .HAR file.

Then you use k6 itself to convert the HAR file into Javascript that k6 can execute:

k6 convert myfile.har

If your saved HAR-file was called myfile.har k6 should now have created a new  har-script.js file that can be executed directly (k6 run har-script.js).

A step-by-step example

Let's look a bit closer at the new converter functionality. We'll make an actual recording and examine what happens, step by step.

1. We start Chrome and open a new incognito window:

Screen Shot 2018-01-18 at 13.11.46.pngYou don't have to use an incognito window, but it is often better as Chrome might have lots of state (cookies etc) saved, that may get sent/used during your session and which could either change the behaviour of your site/app, or at the very least clutter your HAR data (and the converted k6 script) with lots of headers, cookies etc you don't need.

2. Now we open developer tools and check the "Preserve log" checkbox to make sure all data from multiple page loads will be saved in our HAR file:

Screen Shot 2018-01-18 at 13.16.41.png

3. We go to http://test.loadimpact.com and load that page. It is a very simple page and only requests three resources in total: the main HTML, a stylesheet and a PNG image:

Screen Shot 2018-01-18 at 13.21.43.png

4. After waiting 30 seconds or so (to simulate user think time/reading the page), we click the "News" link on the page, which causes Chrome to load news.php (a new page)

Screen Shot 2018-01-18 at 13.21.43-2.pngThen we end up on the "News" page:

Screen Shot 2018-01-18 at 13.24.41.png

5. OK, the session is finished now. We right-click on the list showing the four URLs we have loaded and choose to "Save as HAR with content". Chrome will suggest naming the file "test.loadimpact.com.har", which is fine.

Screen Shot 2018-01-18 at 13.25.57.png

6. We then fire up a terminal window and run k6 to convert the file: Screen Shot 2018-01-18 at 13.30.18.png

 Done!  We can now execute this k6 script immediately, if we want:

Screen Shot 2018-01-18 at 13.31.58.png

Note here that running this script will take about 30 seconds because we waited that long in our recording session before we loaded the second page ("News"). When replaying the recording k6 will also wait the same amount of time in between the two page loads.

k6 HAR converter options

To see what you can do with the converter, use the k6 help convert command:

Screen Shot 2018-01-15 at 11.45.11.png

Let's go through the options:

-O, --output

This should be fairly obvious: it allows you to specify the name of the output file. The default name is "har-script.js".

--only

This option allows you to supply a comma-separated list of domains which are the only ones you want to fetch things from in your k6 test. This means that k6 will filter out any requests that go to domains other than these.

--skip

Allows you to specify some domains that you want to exclude from the k6 test, meaning that the generated k6 script will not contain any requests for this domain.

--batch-threshold

This option requires a bit of explanation:

Real web browsers will usually download multiple resources in parallel, over multiple TCP connections. The common behaviour is to open a single connection to the target system, download the main HTML for a page, parse that HTML and find out what dependencies there are. Then the browser generates a list of things (CSS files, JS files, images etc) it needs to fetch in order to render the page, and when it sees it needs to fetch many things it will open multiple, concurrent TCP connections to the target host so that it can issue many HTTP requests in parallel, speeding up the transfer of all the files.

Many load testing tools do not emulate this behaviour very well, but will only use a single connection per simulated virtual user. k6 tries to do better by implementing something called "batch requests". http.batch() is a function in the k6 script API that lets script writers issue multiple HTTP requests in one single call. When k6 encounters such a call, it will let each VU fire up multiple, concurrent TCP connections, just like a real web browser, and issue requests in parallel over those connections. The end result is much more realistic load on the target system, if the batch requests are correctly constructed.

What you often want, is a k6 script that contains a series of batch requests, where each single batch request contains the full list of things the browser knows it needs to fetch at any one point in time. Typically, the first such batch would contain a single request for the "/index.html", or similar, as that is always the one single thing the browser knows it needs at first.

After loading and parsing index.html, the browser may be able to create a list of things it needs: some images, CSS files and a Javascript or two. It will load these things over multiple connections, as fast as possible. The second batch request for k6 should therefore contain the URLs for all these resources, so k6 can load them in parallel too.

After loading e.g. a certain Javascript, that Javascript may make the browser load another bunch of image files. These files are often loaded together, in parallel, but a little while after the Javascript itself has been loaded (because it takes a while for the browser to execute the Javascript code). This means that when emulating the session with k6 it would be appropriate to bunch these image requests together in a third batch call.

So... to avoid having to create all these batch requests manually, and decide which individual requests belong to what batch request, the HAR converter has some intelligence in it that tries to automate this. It will look at the time interval between two consecutive requests, and if that time interval is short enough (shorter than 500 ms, by default), the requests will be grouped inside the same k6 batch request, for concurrent fetching by the VUs in k6.

And this threshold - the 500 ms one - is what you can change using the --batch-threshold option: if you type e.g. --batch-threshold 1000 it means any consecutive requests that happen within 1 second of eachother will be put in the same k6 batch request. Or, to put it another way, if there is a delay of more than 1 second between two requests, the current k6 batch request will be considered "full" or "done" and a new k6 batch request will be started.

Wrapping up

The HAR converter is very useful because HAR is a format supported by many other tools, whose output can now be used by k6 to control VU behaviour. Here is a list of some useful tools that are able to output HAR files:


Run a Free Test

Topics: k6, Session recorder, har converter, load testing strategy, Load Testing, ci automation, Continuous Load Test, DevOps

Recent Posts

Popular posts

Posts by Topic

see all

Subscribe to Email Updates