For the most accurate load test, best practices (and common sense) dictate that your testing environment be as close to real production and user behavior as possible. We get it, though! Best practices don’t necessarily reflect the challenge and rush of a real development pipeline. For example, if you’re like us, you likely have a much larger amount of data in your production environment than in your testing and staging environment. A detailed, extremely realistic testing database is nice, but perhaps not necessary. Let us explain.
The ideal practice is to every now and then sync production data to staging and scrub it of any personally identifiable data (PID) like emails and credit card info before testing use. (Your country or area probably has laws about using PID, so this is important.)
Using scrubbed production data can be a difficult task, if you have lots of data and a complex system. If your system consists of, say, 100 different microservices, it can be hard to tell what to scrub.
But that level of realism might not be necessary: it puts much more time and effort demands on you, the load tester. It’s more important to test frequently than worry about exact, detailed, precise, real-world simulations of user data. Achieving the ideal practice is fine if you want to go the extra mile, but likely unnecessary in the real world.
As with so many things, we know that the 80/20 rule applies: we can use a simpler setup and still get valuable results. For us, agility and efficiency is more important than detailed accuracy. Thus, we test on an emptier test database, which we know likely skews the app behavior. We have used it for long enough that we have a good feel for how this test database maps to real-world user experience. If you’re using a similar database, experience will teach you what the right amount of data is, or how that data can be extrapolated to your production system.
If extrapolating like we do doesn’t sound like something you want to do, try a cloned, scrubbed production database. Of course, you can just not worry about the amount of data, just testing the code and infrastructure, trusting your query and data storage setup will stay constant. This is probably unwise. The queries you thought were great for a small data size might explode at a certain limit, and data access and queries can be a major bottleneck. Our advice here, as always, is to test often and regularly to avoid headaches later.