Load Testing using CircleCI and k6

By Robin. In Devops. On 2017-07-19

As we’re big believers in dogfooding we’ve used our open source load testing tool, k6, to set up load testing automation with CircleCI. This is how we did it, and some tips and tricks to make your load testing automation easier. Let’s go!

Setting up CircleCI and Load Testing with k6

Before you set things up, you’ll need to have taken k6 for a spin and have some basic familiarity with CircleCI configuration.

We’ve set up our CircleCI config file (circle.yml) to make sure k6 is downloaded and installed, and to execute k6 in the test section of the configuration.

Here’s a scaled back version of our CircleCI (circle.yml) file:

   - "~/k6-bin"
   - mkdir -p ~/k6-bin
   - |
     if [[ ! -f ~/k6-bin/k6 ]]; then
       curl -O -L https://github.com/loadimpact/k6/releases/download/v0.15.0/k6-v0.15.0-linux64.tar.gz;
       tar -xvzf k6-v0.15.0-linux64.tar.gz;
       mv k6-v0.15.0-linux64/k6 ~/k6-bin/k6;
   - ~/k6-bin/k6 run -q loadtests/main.js

For more information, see the CircleCI with k6 guide and accompanying GitHub repo. 

When and How Often We Load Test

So, when should a load test be triggered? Well, a load test is different from other tests: it takes more time. A unit or functional test runs more quickly, so it’s likely you want to run those types of tests with every single code change.

Since load tests take longer to run, should you run a load test with every build, or on a less frequent schedule? We recommend running load tests about once a day. We run our load tests as nightly builds so the results are ready when we log on every morning.

For the Load Impact API, for example, we have 700 or so unit and functional tests, each testing a small part of the entire API. They take about 1-2 minutes, which is not super fast, but it’s fast enough that no-one has spent the time to optimize it yet. A full build-test-deploy sequence for our API takes only a few minutes, which is totally acceptable. It’s fast enough for us not to context switch.

Load tests, however, need to collect the appropriate amount of measurements for you to be able to draw any meaningful conclusions from them. Thus, at minimum, load tests tend to run for at least 10 minutes, if not much longer. The length of a load test comes with a price though: if we developers have to wait for results for 30-60 minutes (or more), we’re very likely to have context switched by the time a test return results.

Some people recommend that you run load tests on every commit. But when your team has grown beyond a couple of developers, you risk running too many tests too often, besides the context switching issue. Multiple load tests run too often and at the same time could provide skewed results that wouldn’t help anyway, depending on how your testing/pre-production environment looks of course. If you set up separate environments per feature branch or pull/merge request, concurrent load tests might not be an issue for you.

With that lengthy explanation, we recommend you run your unit and functional tests with every commit. Load tests, as much as we love them, should be run at a lower frequency: about once a day is usually enough.

Setting up Your Load Testing Automation with CircleCI

CircleCI builds are usually triggered by a GitHub action like a PR or a new commit. What’s missing? Well, there’s no built in time-based way to trigger a build. (Yes, other similar tools have a scheduler.) So with CircleCI, periodic load testing automation is a little trickier since you can’t say, for example, “run a load test every night at 2 AM.”

To run our nightly load tests, we have to use an external scheduler and call CircleCI’s API to trigger a build instead. We use cron and curl for this.

But, before you can automate the execution of your load tests you need to set your goals. Goals that k6 needs to determine a pass from a failure so that it can signal this to CircleCI. (More on this in a moment.)

If a goal is not met, k6 will exit with a non-zero exit code, causing that build step and the build in general to fail.

Setting up Notification Channels (Slack)

At LoadImpact, we use Slack as our chat tool of choice, so naturally it’s also the place we have pass/fail feedback delivered from the tools and services we use in our development and automation workflows.

As do most developer tools and services these days, CircleCI has a Slack integration that we make good use of to tell us the overall pass/fail of a build. We also have more specific integrations set up to get more granular pass/fail feedback, like the LoadImpact Slack integration.

In general, we never look at the output from the tools and services in our pipeline unless they scream loudly that something is wrong!

In CircleCI you set up your notification channels under “Chat Notifications” in the project settings:


Setting Load Testing Goals

We talked earlier about setting goals for your load tests. But how do you set them? What metrics do you choose and why? The short answer is “it depends,” which is exactly the short answer no-one likes to hear. The good news is there are some reasonable rules of thumb you can start with.

For example, for a website, the general wisdom (based on research) is this: if your website takes longer than one second to load, it’s too slow. (Kissmetrics says 40% of people abandon a site that takes more than 3 seconds to load.) If the loading time is over 10 seconds, you can assume that almost no-one is sticking around for that.

Your ideal loading time will be site or system specific depending on what you’re trying to do. For example, if you have an API that downloads a huge amount of data, that GET request will likely take longer than an API that just gets an email address for a user.

We recommend that you first run some baseline tests to see what your overall site, app or API performance seems to be. From this you can establish your baseline for comparison and improvement. Then, set your goals to maintain at least that level of performance at load. Once you begin performance tuning, try setting goals that reflect your target response times, so that you know adding or changing features won’t affect user experience.

To set up those goals in k6, you use thresholds. Here’s a small k6 script example: 

import { check, sleep } from "k6";
import http from "k6/http";

export let options = {
   thresholds: {
       http_req_duration: ["p(95)<500"]

export default function() {
   check(res, {
       "is status 200": (r) => r.status === 200

The above script has a single goal/threshold, that the response time at the 95th percentile should stay below 500 ms. You can add as many thresholds as you need in a k6 script.

For more guidance on picking appropriate goals, check out this page at the Nielsen / Norman Group site and Steve Souders’ excellent site as well.

Understanding Your Load Test Results

As we mentioned before, CircleCI just gives us an overall pass/fail notification. You’ll know only that the load test failed: you’ll have to go investigate why it failed.

As with other types of tests, load tests should have the power to fail a build: they're your performance guardians. It’s our strong belief that load testing should be as important and ubiquitous as unit or functional testing. No matter the size of the change or the perceived insignificance of the system component being changed, you should load test before shipping to production. Even small changes might cause large effects you haven’t foreseen. Treat these tests as your safety mechanisms.

Now, if your test fails, first look at what threshold(s) failed and dig from there. If you have set up thresholds for specific URLs you’ll know right away where to go look on the server side (in an APM product like New Relic for example, if you use one of those).

If you have more broad thresholds, on say http_req_duration for example (meaning response times from all URLs would be considered), then you’d need to look at the response times of the individual URLs. In general, when interpreting a response time graph from a load test, watch for trends. You want the response times to stay constant no matter what traffic volume you’re sending to the target system. If you see a response time graph trending upwards as traffic volume was increased, then that is a good indication the URL is worth digging deeper into, and also a good candidate for getting a threshold setup for subsequent test runs ;)

 How did your integration of CircleCI and k6 work out? Any tips to add? We’re looking forward to your insights.