Bandwidth limited websites

What’s holding you back?

When you hit the limits of how much load your website can handle, you almost always want to know what it is that is holding you back. You already know that you’ve reached the limit, but what part needs to be changed in order to go higher?

The more load a web site gets, the more resources it’s going to consume. One of the many types of resources that the server needs to function will run out before the others. Sure, an extremely well balanced server setup will run out of all types of resources at the same time, but that’s probably not very common. To figure out what resource type that is causing the bottle neck, you need to look a different things. Loadimpact.com offers several interesting performance metrics that will reveal what’s holding you back. Then of course, as soon as you fix that, the next bottleneck is going to become visible, but that’s another blog post.

In this post, I’ll share some information about how you can determine that your web site performance is held back by bandwidth issues and a bit about what you can do to solve it.

How do I know it’s the bandwith?

Depending on where you host your web site, you may have access to tools and graphs from the hosting company that can give you a lot of information. But assiming that you don’t have that tool available, let’s look at how you can use Loadimpact.com to tell.

To be able to show you, I created a very simple web site that is very very bandwidth limited. The website contains one single .html file, absolutely no Python, PHP, Java, Perl or anything similar involved at all. The one file is called heavy.html and contains roughly 16Mb of the letter A. When lots of concurrent user requests heavy.html, a lot of bits will have to leave the web server all at the same time. This is the graph from the test (click to enlarge):

bandwidth

The graph reveals two interesting things. First of all, if you didn’t know already, you can add more than the two standard data series to your Loadimpact graphs. By default, Loadimpact will give you number of active clients and average response time. In this case, I’ve added the Bandwidth data series.

Second. The bandwidth graph pinpoints exactly what I was hoping for. That the bandwidth usage actually hits a plateau at roughly 70 Mbit/s. This means that somewhere between the software on my test server and the software on the measuring probe, there’s a bandwidth limitation of about 70 Mbit/s. It’s important to point out that this result doesn’t point out the exact location of that bottleneck, it just tells you it’s there. To make sure that you have a bottleneck in actually in your hosting environment, you should run the same test from different test servers. Loadimpact currently offers 8 different load zones, each load zone is in a different geographic location. Make sure you run tests from different load zones, or even more interesting, add 4-5 load zones into the same test. If you can still see your plateau at the same bandwidth usage, you can be fairly sure that you’ve found your limit.

And don’t worry if you’ve already run a series of tests using Loadimpact and didn’t add bandwidth to the graph. The data is still stored on our servers and you can add bandwidth when looking at older tests as well. So you might already have interesting data to analyze.

Ok, so what do I do about it?

If you are held back by bandwidth limitation, next step is obviously to try and do something about it. There are many potential ways you can bring down the bandwidth need.

Use compression

Make sure you use compression like gzip or deflate. By compressing the content before it’s sent from the server to the browser, you pay with some CPU resources to gain a little bandwidth. It’s safe to enable since the server will only send compressed content if the browser says it can handle it. Check if your web site uses compression with our Page Analyzer service. Enter the URL you want to test and when the result comes back, click the green plus sign to expand:

expand

Check for the Content-Encoding header in the response:
compressed_ok

Minify things

By removing white space from all static content such as javascript and css files, you can gain some bandwidth Minifying is most often done in your web application software rather, so how to do it depends on what type of Web application you are running. Read more about minifying here:
http://wpmu.org/why-minify/

Reduce image quality

It may sound backwards, but a lot of web sites are sending images to the browser in 300 DPI, that’s great if the user want’s to print the image, but most images are just displayed on the actual web site where 72-96 DPI is sufficient. Not that the very term DPI means that much on web pages, but it still. A good text about the why and how is found here
http://www.webdesignerdepot.com/2010/02/the-myth-of-dpi/

Have your cache settings correct.

In a test such as the one in this post, I didn’t want caching to happen because I wanted to illustrate something. If caching had been enabled, the response headers in the image above would have included an ‘Expires’ header. But in real life, you probably want your web server to instruct the browser to cache all static content. Correctly configured caching means that the browser wont download the same logotype, javascript and css files etc. every single time it loads a web page from your server. Google has written a bit about caching as part of their PageSpeed Rules.
https://developers.google.com/speed/docs/best-practices/caching

Use a CDN.

A method that actually covers a lot of the above tips all in one is to use a content delivery network (CDN). A CDN provider will store your static content on their servers and serve it for you. Unless you are one of the bigger Internet companies, chances are that the CDN provider have more bandwidth available. They almost always also have more than one physical location, so that a user from Spain gets the content from a server in or near Spain while a UK user gets his content from a server in the UK. The end result is that the user gets his content faster and your server never have to see the traffic. The better CDN providers can also do some of the things like minifying of even image quality reduction automatically for you. So chances are that you end up saving both time and bandwidth.

That’s it

Opinions? Questions? Tell us what you want think in the comments below.

Load testing tools vs page speed tools

In a recent post, we talked about the difference between load testing tools and site monitoring tools. Another quite common question is what the difference is between Load testing tools and page speed tools. They both measure response time and they both are important when it comes to assess the performance of your web site. So what it the difference and which one do I need?

For most webmasters, the answer is probably that you need both. But before we talk about why, let’s try a car analogy.

Your own limousine business

Let’s say that you are the CEO of a limousine business. Every single day, you get a call from a client that wants to be picked up at the airport and taken to the city hotel. You send out one of your cars to pick the client up at the airport. The car navigates through traffic and safely leaves the client at the hotel he wanted to go to. Pretty easy.

Now, every now and then, your drivers report that the client wants to get to the hotel quicker and if your business can’t handle that, the client will switch to another limousine business. Since you don’t want to lose your customer, you take the feedback very seriously and start looking into what you can do about it. If you have any intention to do your work seriously, you’re probably going to start with measuring how long it actually takes to get the client from the airport to the hotel. A sensible thing to measure would be to look at the time elapsed from the client’s phone call until he is left of at his hotel. You also look into how long time it takes after the call until the  car is heading to the airport, how long time it takes to locate the client at the airport and how long time it takes to drive from the airport to the hotel.

After some careful analysis, you end up with a very good understanding of what takes time and you probably have a good idea about some of the things you can do to speed it up. Perhaps you decide to always have a car ready at the airport. You might want to change your stretch limousine car to a Ferrari or even to a motorcycle (an existing service in Paris among other cities) to make the actual trip a bit faster.  There’s a lot of different things you can do to make your service quicker. At some point, you are happy with the performance improvements and when you are, you have optimized your ability to take one client from one destination to another as fast as possible.

What does a limousine have to do with web pages?

Back to the subject of this article. A lot of web masters have received the same type of feedback as you did as CEO of the limousine service. But instead of complaining about how quick the clients gets to the hotel, they complain that your web page feels slow. And in reality, most of your clients won’t even complain, they’ll simply direct their browser to a different web site and never look back.

So, as a web master, you want to do the equivalent of measuring how fast your service is.  To do this, you want to get hold of a page speed measurement tool, and there are plenty to select from. The two most well known tools are Google PageSpeed Tools and Yahoo! YSlow, they don’t stop at measuring the actual page load times, they also give you a lot of insight into what is considered good enough and what you can do to improve the page load speed.

As you begin to implement various fixes to improve your page load time, you most likely go back to your tool of choice and redo your measurements in an iterative process. At some point, you are happy with the performance improvements and when you are, you have optimized your ability to serve one page to one client as fast as possible.

A more complicated limousine service

In reality, the limousine service has a lot of different clients. Not all of them is a single person that want to go from the airport to the hotel.  Some clients is a single person that want to go from the train station to the hotel and another type of client is a party of 10 people that want to go from the hotel to the airport. And sometimes things gets really complicated, you get 100 clients calling pretty much at the same time wanting to go in all kinds of directions. For most business owners, having a lot of clients calling would be a nice problem, but it’s even more important to be able to keep the service level high, otherwise you just get a lot of disappointed clients, fast.

So, some of the optimizations you made to serve one client really quick will still be valid. Having cars waiting at the airport probably still makes sense, but should you really have a Ferrari or a motorcycle waiting there? Perhaps the stretch limousine that takes 10 passengers was a pretty good thing after all, or a mix? Clearly, this is a much more complicated thing to measure and optimize and to be honest, if I was the CEO of the limousine service, I would have to think hard to even know where to begin.

Load testing tools

The purpose of load testing tools is to help you simulate how your web site performs when you have a lot of clients at the same time. You will find out that some of the optimizations you made to make a single page load really quick makes perfect sense also when you have a lot of concurrent clients. But other optimizations actually makes things worse. An example would be database optimizations,  the very indexes that makes the web page super fast as long as the page only require good read performance may hurt you a lot when some client requests are writing to the same tables at the same time. Another example may be memory consumption. When one single web page is being requested, a script that uses a lot of memory can go unnoticed or even speed things up, but in a high load scenario, high memory consumption would almost certainly hurt performance when the web server starts to run out of memory.

So if I was a web master, I do have a pretty good idea where to begin when optimizing a web site for many concurrent users. I would start with a load testing tool.

Load testing tools vs page speed tools

Back to the original question. What is the difference between load testing tools and page speed tools and which one should I use? Again, the answer is that you probably should use both.

Fast loading web pages is crucial so you should absolutely use one of the page speed tools available. Web users turn their back to slow pages faster than you can type Google PageSpeed Tools in your search bar.  And the bonus is that a lot of the things you do to optimize single page load times are going to help performance also in high load scenarios.

Fast loading web pages that keep working when you have a lot of visitors is perhaps even more crucial. At least if your web business relies on being able to serve users. If you want to know how your web page performs when you have 10, 100 or even 10000 users at the same time, you need to test this with a load testing tool such as loadimpact.com/

Opinions? Questions? Tell us what you want think in the comments below.

On load testing and performance April-13

What are others are saying about load testing and web performance as of now? Well, apparently more people have things to say about how to make things scale rather than how to measure it, but anyway, this is what cought our attention recently:

  • Boundary.com has a two part interview with Todd Hoff, founder of High Scalability and advisor to several start ups. Read the first part here: Facebook Secrets of web performance
  • Another insight in how the big ones are doing it is this 30 minute video from this years Pycon US. Rick Branson of Instagram talks about how they handle their load as well as the Justin Bieber effect.
  • The marketing part of the world have really started to understand how page load times affect sales as well as Google rankings. In this article Online marketing experts portent.com explains all the loops and hoos they went trough to get to sub second page load time. Interesting read indeed.
  • One of my favorite sources for LAMP related performance insights is the MySQL performance blog. In a post from last week, they explain a bit about how to use their tools to analyze high load problems.
What big load testing or web performance news did we miss? Have your say in the comments below.

Load testing tools vs monitoring tools

So, what’s the difference between a load testing tool (such as 
http://loadimpact.com/
) and a site monitoring tool such as Pingdom (
https://www.pingdom.com/
). The answer might seem obvious to all you industry experts out there, but nevertheless it’s a question we sometimes get. They are different tools used for different things, so an explanation is called for.

Load testing tools

With a load testing tool, you create a large amount of traffic to your website and measure what happens to it. The most obvious measurement is to see how the response time differs when the web site is under the load created by the traffic. Generally, you want to find out either how many concurrent users your website can handle or you want to look at the response times for a given amount of concurrent users. Think of it as success simulation: What happens if I have thousands of customers in my web shop at the same time. Will it break for everyone or will I actually sell more? Knowing a bit about how your website reacts under load, you may want to dig deeper and examine why it reacts the way it does. When doing this, you want to keep track of various indicators on the web site itself while it receives a lot of traffic. How much memory is consumed? How much time spent waiting for disk reads and write? What’s the database response time? etc. Load Impact offers server metrics as a way to help you do this. By watching how your webserver (or servers) consume resources, you gradually build better and better understanding about how your web application can be improved to handle more load or just to improve response times under load. Next up, you may want to start using the load testing tool as a development tool. You make changes that you believe will change the characteristics of your web application and then you make another measurement. As you understand more and more about the potential performance problems in your specific web application, you iterate towards more performance.

Monitoring tools

A site monitoring tool, such as Pingdom (
https://www.pingdom.com/
), might be related, but is a rather different creature. A site monitoring tool will send requests to your web site on a regular interval. If your web site doesn’t respond at all or, slightly more advanced, answers with some type of error message, you will be notified. An advanced site monitoring tool can check your web site very often, once every minute for instance. It will also test from various locations around the world to be able to catch network problems between you and your customers. A site monitoring tool should be able to notify you by email and SMS as soon as something happens to your site. You are typically able to set rules for when your are notified and for what events, such as ‘completely down’, ‘slow response time’ or ‘error message on your front page’ In recent years, functionality have gotten more advanced and beside just checking if your web site is up, you can test entire work flows are working, for instance if your customers can place an item in the shopping card and check out. Most site monitoring tools also include reporting so that you can find out what your service level have been like historically. It’s not unusual to find out that the web site you thought had 100% uptime actually has a couple of minutes of down time every month. By proper reporting, you should be able to follow if downtime per month is trending. Sounds like a good tool right? We think it deserves to be mentioned that whenever you detect downtime or slow response time with a site monitoring tool, you typically don’t know why it’s down or slow. But you know you have problems and that’s a very good start.

One or the other?

Having a bit more knowledge about the difference between these types of tools, we also want to shed some light on how these can be used together. First of all, you don’t choose one or the other type of tool, they are simply used for different things. Like measuring tape and a saw, when your building a house you want both. We absolutely recommend that if you depend on your web site being accessible, you should use a site monitoring tool. When fine tuning your web site monitoring tool, you probably want to set a threshold for how long time you allow for a web page to load. If you have conducted a proper load test, you probably know what kind of response times that are acceptable and when the page load times actually indicates that the web server has too much load. Then, when your site monitoring tool suddenly begins to alert you about problems you want to dig down and understand why, that’s when the load testing tool becomes really useful. As long as the reason for your down time can be traced back to a performance problem with the actual web server, a load testing tool can help you a long way. Recently, I had a client that started getting customer service complaints about the web site not working. First step was to set up a web site monitoring tool to get more data in place. Almost directly, the web site monitoring tool in use was giving alerts, the site wasnt always down, but quite often rather slow. The web shop was hosted using standard web hosting package at one of the local companies. I quickly found out that the problem was that the web shop software was just using a lot of server resources and this was very easy to confirm using a load testing tool. Now the client is in the process of moving the site to a Virtual Private Server where resources can be added as we go along. Both types of tools played an important role in solving this problem quickly. Questions? Tell us what you want to know more about in the comments below.

Top 5 ways to improve WordPress under load

WordPress claims that more than 63 million web sites are running the WordPress software. So for a lot of users, understanding how to make WordPress handle load is important. Optimizing WordPress ability to handle load is very closely related to optimizing the general performance, a subject with a lot of opinions out there. We’ve actually talked about this issue before on this blog. Here’s the top 5 things we recommend you do to fix before you write that successful blog post that drives massive amounts of visitors.

#1 – Keep everything clean and up to date

Make sure that everything in WordPress is up to date. While this is not primarily a performance consideration, it’s mostly important for security reasons. But various plugins do gradually become better and better with performance issues, so it’s a good idea to keep WordPress core, all plugins as well as your theme up to date. And do check for updates often, I have 15 active plugins on my blog and I’d say that there’s 4-6 upgrades available per month on average. The other thing to look out for is to keep things clean. Remove all themes and plugins that you don’t currently use. Deactivate and physically delete them. As an example, at the time of writing this. my personal blog had 9 plugins that needed upgrading and I had also let the default WordPress theme in there. I think it’s a pretty common situation so do what I did and make those upgrades.

#2 – Keep the database optimized

There are two ways that WordPress databases can become a performance problem. First is that WordPress stores revisions of all posts and pages automatically. It’s there so that you always can go back to a previous version of a post. As handy as that can be, it also means that the one db table with the most queries gets cluttered. On my blog, I have about 30 posts but 173 rows in the wp_posts table. For any functionality that lists recent posts, related posts and similar, this means that the queries takes longer. Similary, the wp_comments table keeps a copy of all comments that you’ve marked as spam, so the wp_comments table may also gradually grow to become a performance problem. The other way that you can optimize the WordPress database is to have mysql do some internal cleanup. Over time the internal structure of the mysql tables also becomes a cluttered. Mysql provides an internal command for this: ‘optimze table [table_name]‘. Running optimize table can improve query performance a couple of percent with in turn improves the page load performance. Instead of using phpmyadmin to manually delete old post revisions and to run the optimize table command, you should use a plugin to do that, for instance WP Optimize. Installing Wp optimize on my blog, it told me that the current database size was 7.8 Mb and that it could potentially remove 1.3 Mb from it. It also tells me that a few important tables can be optimized, for instance wp_options that is used in every single page request that WordPress will ever handle.

#3 – Use a cache plugin

Probably the single most effective way to improve the amount of traffic your WordPress web site can handle is to use a cache plugin. We’ve tested cache plugins previously on the Load Impact blog, so we feel quite confident about this advice. The plugin that came out on top in our tests a few years ago was W3 Total Cache. Setting up W3 Total Cache requires some attention to details that is well beyond what other WordPress plugins typically requires. My best advice is to read the installation requirements carefully before enabling the page cache functionality since not all features will work on all hosting environments. Read more about various WordPress cache plugins here, but be sure to read the follow up.

#4 – Start using a CDN

By using a CDN (content delivery network), you get two great performance enhancements at once. First of all, web browsers limit the number of concurrent connections to your server, so when downloading all the static content from your WordPress install (css, images, javascripts etc.), they actually queue up since not all of them is downloaded at the same time. By placing as much content as possible on a CDN, you work around this limitation since your static content is now served from a different wen server. The other advantage you get is that CDN typically have more servers than you do and there’s a big chance that (a) one of their servers is closer to the end user than your server and (b) that they have more bandwidth than you do. There are a number of ways you can add CDN to your WordPress install. W3 Total Cache from #3 above handles several CDN providers (Cloudflare, Amazon, Rackspace) or even lets you provide your own. Another great alternative is to use the CloudFlare WordPress plugin that they provide themselves.

#5 Optimize images (and css/js)

Looking at the content that needs to be downloaded, regardless if it’s from a CDN or from your own server, it makes sense to optimize it. For css and js files, a modern CDN provider like CloudFlare can actually minify it for you. And if you don’t go all the way to use an external CDN, the W3 Total Cache plugin can also do it for you. For images you want to keep the downloaded size as low as possible. Yahoo! has an image optimizer called Smush.it that will drastically reduce the file size of an image, while not reducing quality. But rather than dealing with every image individually, you can use a great plugin name WP-Smushit that does this for you as you go along.

Conclusion and next step

There are lots and lots of content online that will help you optimize WordPress performance and I guess it’s no secret that these top 5 tips are not the end of it. In the next post, I will show you how a few of these advises measures up in reality in the LoadImpact test bench.

Announcing the Load Impact API beta

We are pleased to announce the Load Impact API beta!

the developer.loadimpact.com API documentation site

For people who do not know what an API is, or what it is good for, our API allows you to do basically everything you can do when logged in at loadimpact.com, like configure a load test, run a load test, download results data from a load test. But the API can be used by another program, communicating with Load Impact over the Internet. This means that a developer can write an application that will be able to use our API functionality to configure and run load tests on the Load Impact infrastructure – and this can happen completely without human involvement, if the developer chooses it.

The API is very useful for companies with a mature development process, where they e.g. do nightly builds - and run automated tests - on their software. The API allows them to include load tests in their automated test suites, and in that way monitor the performance and scalability of their application while it is being developed. This is useful in order to get an early indication that some piece of newly produced code doesn’t perform well under load/stress. The earlier such problems are detected, the less risk of developers wasting time working on code tracks that don’t meet the performance criteria set up for the application.

The API can also be used by other online services or applications, that want to include load testing functionality as part of the service/product, but where it is preferable to avoid building from scratch a complete load testing solution like Load Impact. They can use our API to integrate load testing functionality as part of their own product, with Load Impact providing that functionality for them.

We have created a whole new documentation section for the API at 
http://developer.loadimpact.com
 where you can find the API reference and some code examples. We will be delighted to hear from you if you are using the API, so don’t hesitate to get in touch with us! Feedback or questions are very welcome!

 

Know your node.js

As part of a follow up to last months column about PHP vs Node.js, I hit some problems with Node under load. As with all technologies, Node.js does have some limitations that may or may not be a problem for your specific use case. If the last column about comparing PHP and Node.js had a deeper message, that message would be that if you want to scale you have to know your stack. To be completely clear, when I say stack I mean the layers of technology used to server http requests. One of the most common stacks out there are simply called LAMP – (L)inux (A)pache2 (M)ySQL (P)HP (or Perl). You now see a lot of references to LNMP, where Apache2 is replaced with Nginx. When building Node.js applications, things can vary a lot since node.js comes with it’s own http server. In my previous text, I used Node.js together with MySQL on a Linux box, so I guess we can dub that the LNM stack if we absolutely need to have a name for it. And when I say Know your stack. I mean that if you want to produce better than average performance numbers, you have to be better than average in understanding how the different parts in your stack works together. There are hundreds of little things that most of us never knew mattered that suddenly becomes important when things come under load. As it happens, watching your application work under load is a great way to force yourself to know your stack a little better.

Background

When testing Apache/PHP against Node.js, I found that the raw performance of Node.js as well as the ability to handle many concurrent clients was excellent. Faster and more scalable than Apache2/PHP. One reader pointed out that the test wasn’t very realistic since there was just one single resource being queried and there was no static content involved. Apache2/PHP could very well relatively better if some of the content was static. So I set up a test to check this and while running this. Node.js crashed. As in stopped working. As in would not server any more http reqeusts without manual intervention. So to keep it shord, Apach2/PHP won that round. But in the spirit of ‘know your stack’, we need to understand why Node.js crashed. The error message I got was this:

Unhandled 'error' event "events.js:71"

First of all, it took a fair amout of googling to figure out what that the error message was really about. Or, rather, the error message was saying that something happened and there’s no error handler for it. So good luck.

Fixing it.

The first indication I got via Google and Stack Overflow was that this may be an issue with Node.js before 0.8.22 and sure enough, I was running 0.8.19. So the first thing I did was upgrade to version 0.8.22. But that did not fix the problem at all (but a later and greater version is of course a nice side effect). With almost all other software involved being up to date, this actually required some structured problem solving.

Back to the drawing board

I eventually managed to trace the error message down to a ‘too many open files’ problem which is Interesting as it answers the crucial question: What went wong? This happened at roughly 250 concurrent users with a test that was accessing 6 different static files. This is what it looks like in LoadImpact:

node_failed_test

So a little depending on timing, and exactly when each request comes in, it would roughly indicate that some 1500 (6 files times 250 users) files can be open at the same time. Give or take. Most Linux systems are, by default, configured to allow relatively small number of open files, e.g. 1024. The Linux command to check this is ulimit:

$ ulimit -n
1024

1024 is the default on a lot of distros, including Ubuntu 12.10 that I was running the tests on. So my machine had 1024 as the limit but it appears that I had 1500 files open at the same time. Does this make any sense? Well, sort of, there are at least 3 factors involved here that would affect the results:

  1. LoadImpact simulates real browsers (simulated browser users or just SBU). A SBU user only opens 4 concurrent connections to the same server even if the script tells it to download 6 resources. The other 2 resources are simply queued.
  2. Each open TCP socket counts as an open file. So each concurrent TCP connection is an open file. Knowing that our limit is 1024, that would indicate that node.js could handle up to 256 concurrent users if each user uses the maximum of 4 open connections.
  3. In our sample, the requests for static resources also opens a file and thereby occupies another file handle. This file is open for less time than the actual connection, but still, under a certain time, a single request can consume 2 open file handles.

So in theory, the limit for concurrent simulated browser users should be 256 or less. But in reality, I saw the number of concurrent users go all the way up to 270 before the Node.js process died on me. The explanation to that is more likely than anything just timing. Not all SBU’s will hit the server at exactly the same time. At the end, hitting problems when running about 250 concurrent users reasons well with the open files limit being the problem. Luckily, the limit of number of open files per process is easy to change:

$ ulimit -n 2048

The next test shows real progress. Here’s the graph:

node_better_test

Problem solved (at least within the limits of this test).

Summary

Understanding what you build upon is important. If you choose to rely on node.js, you probably want to be aware of how that increases your dependency on various per process limitations in the operating system in general and max number of open files in particular. You are more affected by these limitations since everything you do takes place inside a single process. And yes. I know. There are numerous of more or less fantastic ways to work around this particular limitation. Just as there are plenty of ways to work around limitations in any other web development stack. The key thing to remember is that when you select your stack, framework, language or server, you also select all the limitations that comes with it. There’s (still) no silver bullet, even if some bullets are better out of the box than other. Having spent countless of hours with other web development languages, I think I’m in a good position to compare and yes indeed! Node.js delivers some amazing performance. But at present, it comes with a bigger responsibility to ‘Know Your stack’ than a lot of the others.

Node.js vs PHP – using Load Impact to visualize node.js efficiency

It could be said that Node.js is the new darling of web server technology. LinkedIn have had very good results with it and there are places on the Internet that will tell you it can cure cancer. In the mean time, the old work horse language of the Internet, PHP, gets a steady stream of criticism. and among the 14k Google hits for “PHP sucks” (exact term), people will say the most funny terrible things about the language while some of the critique is actually quite well balanced. Node.js introduces at least two new things (for a broader audience). First, the ability to write server side JavaScript code. In theory this could be an advantage since JavaScript is more important than ever on the client side and using the same language on server and browser would have many benefits. That’s at least quite cool. The other thing that makes Node.js different is that it’s completely asynchronous and event driven. Node is based on the realization that a lot of computer code actually just sits idle and wait for I/O most of the time, like waiting for a file to be written to disk or for a MySQL query to return data. To accomplish that, more or less every single function in Node.js is non-blocking. When you ask for node to open a file, you don’t wait for it to return. Instead, you tell node what function to pass the results to and get on with executing other statements. This leads to a dramatically different way to structure your code with deeply nested callbacks and anonymous function and closures. You end up with something  like this:

doSomething(val, function(err,result){
  doSomethingElse(result,function(err,res){
    doAbra();
    doKadabra(err, res, function() {
      ...
      ...
    });
  });
});

It’s quite easy to end up with very deep nesting that in my opinion sometimes affects code readability in a negative way. But compared to what gets said about PHP, that’s very mild critique. And.. oh! The third thing that is quite different is that in Node.js, you don’t have to use a separate http(s) server. It’s quite common to put Node.js behind a Nginx, but that’s not strictly needed. So the heart of a typical Node.js web application is the implementation of the actual web server.

A fair way to compare

So no, it’s not fair to say that we compare Node.js and PHP. What we really compare is Node.js and PHP+Apache2 (or any other http server). For this article, I’ve used Apache2 and mod_php since it’s by far the most common configuration. Some might say that I’d get much better results if I had used Nginx or Lighthttpd as the http server for PHP. That’s most likely very true, but at the end of the day, server side PHP depends on running in multiple separate processes. Regardless if we create those processes with mod_php or fastcgi or any other mechanism. So, I’m sticking with the standard server setup for PHP and I think that makes good sense.

The testing environment

So we’re pitting PHP+Apache2 against a Node.js based application. To keep things reasonable, I’ve created a very (really, very) simple application in both PHP5 and Node.js. The application will get 50 rows of data from a WordPress installation and output it as a json string. That’s it, nothing more. The benefit of keeping it this simple was (a) that I didn’t have to bother about too many implementation details between the two languages and (b) more important that we’re not testing my ability to code, we’re really testing the difference in architecture between the two. The server we’re  using for this test is a virtual server with:

  • 1 x Core Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  • 2 Gb RAM.
  • OS is 64 Bit Ubuntu 12.10 installed fresh before running these tests.
  • We installed the Load Impact Server metric agent.

For the tests, we’re using:

  • Apache/2.2.22 and
  • PHP 5.4.6.
  • Node.js version 0.8.18 (built using this script)
  • MySQL is version 5.5.29.
  • The data table in the tests is the options table from a random WordPress blog.
The scripts we’re using:

Node.js (javascript):

// Include http module,
var http = require('http'),
mysql = require("mysql");

// Create the connection.
// Data is default to new mysql installation and should be changed according to your configuration.
var connection = mysql.createConnection({
   user: "wp",
   password: "****",
   database: "random"
});

// Create the http server.
http.createServer(function (request, response) {
   // Attach listener on end event.
   request.on('end', function () {

      // Query the database.
      connection.query('SELECT * FROM wp_options limit 50;', function (error, rows, fields) {
         response.writeHead(200, {
            'Content-Type': 'text/html'
         });
         // Send data as JSON string.
         // Rows variable holds the result of the query.
         response.end(JSON.stringify(rows));
      });
   });
// Listen on the 8080 port.
}).listen(8080);

PHP code:

<!--?php $db = new PDO('mysql:host=localhost;dbname=*****',     'wp',     '*****'); $all= $db--->query('SELECT * FROM wp_options limit 50;')->fetchAll();
echo json_encode($all);

The PHP script is obviously much shorter, but on the other hand it doesn’t have to implement a full http server either.

Running the tests

The Load Impact test configurations are also very simple, these two scripts are after all typical one trick ponies, so there’s not that much of bells and whistles to use here. To be honest, I was surprised how many concurrent users I had to use in order to bring the difference out into the light. The test scripts had the following parameters:

  • I was using Simulated Browser Users (SBU)
  • The ramp up went from 0-500 users in 5 minutes
  • 100% of the traffic comes from one source (Ashburn US)
  • Server metrics agent enabled
The graphics:
On the below images. the lines have the following meanings:
  • Green line: Concurrent users
  • Blue line: Response time
  • Red line: Server CPU usage

Node.js up to 500 users.

The first graph here shows what happens when we load test the Node.js server. The response time (blue) is pretty much constant all through the test. My back of a napkin analysis of the initial outliers is that they have to do with a cold MySQL cache. Now, have a look at the results from the PHP test:

Quite different results. It’s not easy to see on this screen shot, but the blue lines is initially stable at 320 ms response time up to about 340 active concurrent users. After that, we first see a small increase in response time but after additional active concurrent users are added, the response time eventually goes through the roof completely.

So what’s wrong with PHP/Apache?

Ok, so what we’re looking at is not very surprising, it’s the difference in architecture between the two solutions. Let’s think about what goes on in each case.

When Apache2 serves up the PHP page it leaves the PHP execution to a specific child process. That child process can only handle one PHP request at a time so if there are more requests than than, the others have to wait. On this server, there’s a maximum of 256 clients (MaxClients) configured vs 150 that comes standard. Even if it’s possible to increase MaxClients to well beyond 256, that will in turn give you a problem with internal memory (RAM). At the end, you need to find the correct balance between max nr of concurrent requests and available server resources.

But for Node, it’s easier. First of all, in the calm territory, each request is about 30% faster than for PHP, so in pure performance in this extremely basic setup, Node is quicker. Also going for Node is the fact that everything is in one single process on the server. One process with one active request handling thread. So thre’s no inter process communication between different instances and the ‘mother’ process. Also, per request, Node is much more memory efficient. PHP/Apache needs to have a lot of php and process overhead per concurrent worker/client while Node will share most of it’s memory between the requests.

Also note that in both these tests, CPU load was never a problem. Even if CPU loads varies with concurrent users in both tests it stays below 5% (and yes, I did not just rely on the graph, I checked it on the server as well). (I’ll write a follow up on this article at some point when I can include server memory usage as well). So we haven’t loaded this server into oblivion in any way, we’ve just loaded it hard enough for the PHP/Aapache architecture to start showing some of it’s problems.

So if Node.js is so good…

Well of course. There are challenges with Node, both technical and cultural. On the technical side, the core design idea in Node is to have one process with one thread makes it a bit of a challenge to scale up on a multi core server. You may have already noted that the test machine uses only one core which is an unfair advantage to Node. If it had 2 cores, PHP/Apache would have been able to use that, but for Node to do the same, you have to do some tricks.

On the cultural side, PHP is still “everywhere” and Node is not. So if you decide to go with Node, you need to prepare to do a lot more work yourself, there’s simply nowhere near as many coders, web hotels, computer book authors, world leading CMS’es and what have you. With PHP, you never walk alone.

Conclusion

Hopefully, this shows the inherit differences in two different server technologies. One old trusted and one young and trending. Hopefully it’s apparent that your core technical choices will affect your server performance and in the end, how much load you can take. Designing for high load and high scalability begins early in the process, before the first line of code is ever written.

And sure, in real life, there are numerous of tricks available to reduce the effects seen here. In real life, lots of Facebook still runs on PHP.

Server Metrics Tutorial

Here at Load Impact, we’re constantly developing new features to help make our load testing tool even more powerful and easy to use. Our latest feature, Server Metrics, now makes it possible for you to measure performance metrics from your own servers and integrate these metrics with the graphs generated from your Load Impact test. This makes it much easier for you to correlate data from your servers with test result data, while helping you identify the reasons behind possible performance problems in your site. We do this by installing agents on your servers in order to grab server metrics data, which we can later insert into the results graph.

Having been a pure online SaaS testing tool, we don’t like the hassle that downloads and setups bring, so we’ve tried to make it really simple to set this up and use. (After all, we are trying to make testing more efficient, not more frustrating!)

Here’s 5 steps on how you can get Server Metrics up to a flying start:

Step 1 (Optional): Check out where Server Metrics appears in your Test Configuration

Go ahead and Log In to your account, then select the Test Configuration tab and create a new configuration. Alternatively, if you have current test configurations already set up, select one. Below the “User Scenarios” subsection, you should now find a section named “ Server Metrics”.  Click on “Go to Your account page”.
Alternatively, skip this step altogether and head straight to “Access Tokens” under the “Account” tab.

Step 2: Generating a Token

In order to identify our agents when they are talking to Load Impact, we need an identification key. We refer to this as a token. To generate a token, follow the instructions as stated.
(Yes, you only have to click on the “Generate a new token” button. Once :P )

Step 3: Download and install Server Metrics agent

Now comes the hardest part. You’ll need to download and install our Server Metrics agent for your server platform. Installation should be as basic as following the instructions in our Wizard, on in the README file.

Step 4: Run your test!

Once the server metrics agent is configured, you’ll immediately be able to select it in your Test Configuration (see Step 1). We recommend giving a name that describes the name of the server you’re installing it on for easier identification later on. From here, just configure and run your test as per normal.

Step 5: Viewing Server Metrics in your test results

Once your test has started, you should be able to see your Server Metrics results in real time, just as with all of our other result graphs. Simply select the the kind(s) of server metrics you wish to view in the “Add Graph” drop down menu. This will plot the results for the specific server metric that you wish to view.

And that’s it! You’re all set towards easier bottleneck identification :)
For an example of how these graphs are can help make your load testing life easier, take a look at the test results below. These results show your server’s CPU usage as load is being increased on your website.

Don’t think this is simple enough? Email support [at] loadimpact [dot] com to let us know how we can do one better!

21/12/12 – It’s the end of the world… Again!

With all the hype stating that the end of the world would arrive on the 21 Dec, 2012, we decided that we had to do something.

So in honour of our 666,666th test since our launch in 2009, we fired up a quick test on the Vatican. It seems we have been slightly too popular though, so we were placed in queue and only got executed as the 666,668th test :P

Simple load test, nothing heavy. (You can try a free test here as well). But the results did give some of us at Load Impact a little scare. Take a look at the graphs below:

Coincidence that it looked somewhat like the Crucifix, perhaps? But one thing is to be sure – Don’t mess with the universal forces ;)

Load Impact is the world's most popular load testing tool. We help you measure how many users your website can handle to minimize revenue losses and conversion problems as your site grows.

Since 2009, users from over 190 countries have performed more than 750,000 tests using Load Impact. Learn more.

Twitter

Follow

Get every new post delivered to your Inbox.