We believe load tests are your performance guardians. If the minimum performance levels you’ve set aren’t met, a build should fail. Load testing is as important as unit and functional testing, and we run our load tests with every nightly build. No matter how small our code changes are, we use load tests as safety mechanisms to make sure we didn’t accidentally torpedo our performance with a small change. (Nowadays systems, apps and dependencies can be so complex it’s difficult to predict how any change will affect the rest of the system.)
From a load testing perspective, you want your stats to stay constant (or, if you’re performance tuning, have the response or load time decline). Thus, if your response time stays constant as the traffic and load increases, you know your system is scaleable and can deal with traffic regardless of volume.
Troubleshooting Performance Trends
When one of our performance tests fail, we first look in our logs. You should, too. Watch for trends. The trends can tell you what’s going wrong when you compare stats from a failed load test to those in previous, successful tests. Look for significant changes, not necessarily for specific data points. Ideally, your load times should stay constant no matter what the traffic volume.
Let’s say your load times are not constant and stable. Look into why the response time increases. If your page or API response times increase, look into your APM metrics or similar. You’ll likely see a response time graph that dramatically ramps up as traffic does. Check your logs to see what else peaked at that time.
Also, check to see if any URLs or API endpoints might be slowing response. If more than one are, dig deeper into your server monitoring metrics.
Here’s how we look at it. We think of it as investigating both sides of the performance equation. By default, load testing tools like Load Impact and k6 look at your site, app or API from the user perspective, and rightly so, since we’re tracking user load and user experience. But to see what’s happening when that user-side performance degrades, we have to watch what’s happening on the server side, and that’s where a server agent and an APM come in. We often hear of folks who watch their APM results in their production environment, which is useful for current traffic levels. But what they don’t see is what happens when there’s an increase in traffic or a traffic spike. APM products are definitely our valuable friends, but they’re only half the story.