Node.js vs PHP – using Load Impact to visualize node.js efficiency

Tweet about this on Twitter59Share on Facebook0Share on LinkedIn67Share on Google+30Share on Reddit2

It could be said that Node.js is the new darling of web server technology. LinkedIn have had very good results with it and there are places on the Internet that will tell you it can cure cancer.

In the mean time, the old work horse language of the Internet, PHP, gets a steady stream of criticism. and among the 14k Google hits for “PHP sucks” (exact term), people will say the most funny terrible things about the language while some of the critique is actually quite well balanced. Node.js introduces at least two new things (for a broader audience). First, the ability to write server side JavaScript code. In theory this could be an advantage since JavaScript is more important than ever on the client side and using the same language on server and browser would have many benefits. That’s at least quite cool.

The other thing that makes Node.js different is that it’s completely asynchronous and event driven. Node is based on the realization that a lot of computer code actually just sits idle and wait for I/O most of the time, like waiting for a file to be written to disk or for a MySQL query to return data. To accomplish that, more or less every single function in Node.js is non-blocking.

When you ask for node to open a file, you don’t wait for it to return. Instead, you tell node what function to pass the results to and get on with executing other statements. This leads to a dramatically different way to structure your code with deeply nested callbacks and anonymous function and closures. You end up with something  like this:

 

It’s quite easy to end up with very deep nesting that in my opinion sometimes affects code readability in a negative way. But compared to what gets said about PHP, that’s very mild critique. And.. oh! The third thing that is quite different is that in Node.js, you don’t have to use a separate http(s) server. It’s quite common to put Node.js behind a Nginx, but that’s not strictly needed. So the heart of a typical Node.js web application is the implementation of the actual web server.

A fair way to compare

So no, it’s not fair to say that we compare Node.js and PHP. What we really compare is Node.js and PHP+Apache2 (or any other http server). For this article, I’ve used Apache2 and mod_php since it’s by far the most common configuration. Some might say that I’d get much better results if I had used Nginx or Lighthttpd as the http server for PHP. That’s most likely very true, but at the end of the day, server side PHP depends on running in multiple separate processes. Regardless if we create those processes with mod_php or fastcgi or any other mechanism. So, I’m sticking with the standard server setup for PHP and I think that makes good sense.

The testing environment

So we’re pitting PHP+Apache2 against a Node.js based application. To keep things reasonable, I’ve created a very (really, very) simple application in both PHP5 and Node.js. The application will get 50 rows of data from a WordPress installation and output it as a json string. That’s it, nothing more. The benefit of keeping it this simple was (a) that I didn’t have to bother about too many implementation details between the two languages and (b) more important that we’re not testing my ability to code, we’re really testing the difference in architecture between the two. The server we’re  using for this test is a virtual server with:

  • 1 x Core Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  • 2 Gb RAM.
  • OS is 64 Bit Ubuntu 12.10 installed fresh before running these tests.
  • We installed the Load Impact Server metric agent.

For the tests, we’re using:

  • Apache/2.2.22 and
  • PHP 5.4.6.
  • Node.js version 0.8.18 (built using this script)
  • MySQL is version 5.5.29.
  • The data table in the tests is the options table from a random WordPress blog.
The scripts we’re using:

Node.js (javascript):

 

PHP code:

 

The PHP script is obviously much shorter, but on the other hand it doesn’t have to implement a full http server either.

Running the tests

The Load Impact test configurations are also very simple, these two scripts are after all typical one trick ponies, so there’s not that much of bells and whistles to use here. To be honest, I was surprised how many concurrent users I had to use in order to bring the difference out into the light. The test scripts had the following parameters:

  • The ramp up went from 0-500 users in 5 minutes
  • 100% of the traffic comes from one source (Ashburn US)
  • Server metrics agent enabled
The graphics:
On the below images. the lines have the following meanings:
  • Green line: Concurrent users
  • Blue line: Response time
  • Red line: Server CPU usage

Node.js up to 500 users.

The first graph here shows what happens when we load test the Node.js server. The response time (blue) is pretty much constant all through the test. My back of a napkin analysis of the initial outliers is that they have to do with a cold MySQL cache. Now, have a look at the results from the PHP test:

Quite different results. It’s not easy to see on this screen shot, but the blue lines is initially stable at 320 ms response time up to about 340 active concurrent users. After that, we first see a small increase in response time but after additional active concurrent users are added, the response time eventually goes through the roof completely.

So what’s wrong with PHP/Apache?

Ok, so what we’re looking at is not very surprising, it’s the difference in architecture between the two solutions. Let’s think about what goes on in each case.

When Apache2 serves up the PHP page it leaves the PHP execution to a specific child process. That child process can only handle one PHP request at a time so if there are more requests than than, the others have to wait. On this server, there’s a maximum of 256 clients (MaxClients) configured vs 150 that comes standard. Even if it’s possible to increase MaxClients to well beyond 256, that will in turn give you a problem with internal memory (RAM). At the end, you need to find the correct balance between max nr of concurrent requests and available server resources.

But for Node, it’s easier. First of all, in the calm territory, each request is about 30% faster than for PHP, so in pure performance in this extremely basic setup, Node is quicker. Also going for Node is the fact that everything is in one single process on the server. One process with one active request handling thread. So thre’s no inter process communication between different instances and the ‘mother’ process. Also, per request, Node is much more memory efficient. PHP/Apache needs to have a lot of php and process overhead per concurrent worker/client while Node will share most of it’s memory between the requests.

Also note that in both these tests, CPU load was never a problem. Even if CPU loads varies with concurrent users in both tests it stays below 5% (and yes, I did not just rely on the graph, I checked it on the server as well). (I’ll write a follow up on this article at some point when I can include server memory usage as well). So we haven’t loaded this server into oblivion in any way, we’ve just loaded it hard enough for the PHP/Aapache architecture to start showing some of it’s problems.

So if Node.js is so good…

Well of course. There are challenges with Node, both technical and cultural. On the technical side, the core design idea in Node is to have one process with one thread makes it a bit of a challenge to scale up on a multi core server. You may have already noted that the test machine uses only one core which is an unfair advantage to Node. If it had 2 cores, PHP/Apache would have been able to use that, but for Node to do the same, you have to do some tricks.

On the cultural side, PHP is still “everywhere” and Node is not. So if you decide to go with Node, you need to prepare to do a lot more work yourself, there’s simply nowhere near as many coders, web hotels, computer book authors, world leading CMS’es and what have you. With PHP, you never walk alone.

Conclusion

Hopefully, this shows the inherit differences in two different server technologies. One old trusted and one young and trending. Hopefully it’s apparent that your core technical choices will affect your server performance and in the end, how much load you can take. Designing for high load and high scalability begins early in the process, before the first line of code is ever written.

And sure, in real life, there are numerous of tricks available to reduce the effects seen here. In real life, lots of Facebook still runs on PHP.

21 thoughts on “Node.js vs PHP – using Load Impact to visualize node.js efficiency

  1. If I understand the plot correctly, you essentially showed that when the number of users goes well beyond the MaxClients, new users have to wait for an old PHP process to complete. And when you have 400 concurrent users, that means waiting for approximately 150 database accesses. In a simple test, php has a startup time of approximately 20ms. So just starting it 150 times takes 3 seconds, and when you exhaust MaxClients, you very likely have to start one after the other.

    I think it’s rather surprising that you only hit the linear time increase at about 400 concurrent users, instead of 256. Maybe the concurrency count is wrong, because Apache can already let the php process die while the request simulator does not have the complete request, yet. Keep in mind, though, that the increase could just be the cumulative startup time of the PHP interpreter for all the waiting users plus the database access and not any measure of real performance.

    Another interesting test would be to run the HipHop PHP compiler against node.js. That would be fairer, because node.js requires more than a standard webhoster provides (more than just LAMP), so PHP should get the benefit of non-standard packages, too. That might change the numbers in the calm territorry quite drastically.

    Can you also plot the memory requirements of node.js and Apache+PHP? (free -m should make it easy to measure the total memory needed by many processes)

  2. Hi arnebab,

    There are some more details to take into account. On this server, MaxClients was set to 256. MaxRequestsPerChild is set to zero which is the Apache2 default. The meaning of that is that each child lives on to handle many requests (serlalized). A child doesn’t die until things have calmed down and Apache decides it only needs MinSpareServers children available. So PHP gets loaded into process space once per child, not once per request.

    Also, on this server, the minimum time for any static resource to get pulled from the server all the way to the browser is 15ms (measured via Chromium’s developer panel). That’s including DNS and all other network induced latency. Testing with a simple PHP script, I see roughly the same delay, perhaps 0.5-1 ms slower on average. Ergo, the PHP startup time as can be observed from the browser includes a lot more than the actual work carried out by Apache/mod_php to fire up the php script. So the time that Apache needs to handle each PHP request is in reality much lower than the 20ms you typically see when looking from a browser. If I measure the time *inside* the PHP request (using microtime), I get numbers as low as 3.0E-6, but that’s obviously missing some of the Apache overhead. The number we’re after is somewhere in between.

    I haven’t done all the math (I should, really), but I’m fairly sure that’s the why we hit linear time much later than your calculation suggests.

    Regarding memory consumption in the graphs. As things stands right now, the LoadImpact server metric agents does report on memory consumption. But’s currently not selectable in the standard graph but it’s very high on my personal wish list. Glad to hear that you’d also like that feature.

    And I agree. It would be really interesting to see what happens if we take PHP to it’s limits, perhaps the HipHop compiler is the best option. Sounds like a summer project :-)

    Regards,

    /Erik Torsner

  3. Great article. I think you did a good job at showing the two, as well as the shortcomings of your test (as there will always be something not taken into account).

    I’m curious how the raw interpreters compare to each other (v8 vs PHP’s, not HipHop). I’m guessing V8 has the edge since there has been so much work put into speed optimization.

  4. Each time the PHP script is executed, the MySQL connection needs to be reestablished, whereas the NodeJS server will establish one connection on startup and reuse it for each request.

    This may be skewing the results, as your MySQL server may have connection limits in place.

    A fairer test would remove this difference, either by making PHP reuse a connection or making NodeJS create a new one per request. The latter is the simplest – move mysql.createConnection() inside request.on(‘end’).

  5. @Robert. Thanks for your input to this test. If time admits it, I’ll run a new test with your suggested changes. On the other hand, the test was kind of put together to visualize the differences between two application stacks sort of ‘out of the box’, using the most common approach, with all the good and bad parts of both stacks. I think I managed to do that, but your kind of constructive input is very welcome as it helps me highlight what could be done better.

    Connection pooling in PHP/PDO is certainly possible and for a simple read-only test like this, I agree that it makes sense. But regardless of stack, connection pooling/reuse has both pros and cons. I think that the argument in the following StackOverflow post goes a long way to explain why it’s sometimes a bad idea. Even if the article assumes that PHP/PDO is used, everything said about transactions applies regardless of language/platform: http://stackoverflow.com/questions/3332074/what-are-the-disadvantages-of-using-persistent-connection-in-pdo

    So, in a small test like this, I agree with you, connection reuse gives an advantage to Node.js that could also be given to PHP. However, rather than hampering the Node.js version by recreating connections, I think using the PDO::ATTR_PERSISTENT attribute in the PHP version makes more sense.

    Many thanks Robert,

    Erik Torsner

  6. you are not testing PHP here at all, you are testing mostly Apache and MySQL.
    Why is it that all nodeJS so religiously want to make this comparisson while getting the basics in computerscience thing all wrong?

  7. @lx. Thanks for your feedback,

    You’re absolutely right and I think that was fairly clear in the article to begin with. Quoting from the 3rd paragraph: “So no, it’s not fair to say that we compare Node.js and PHP. What we really compare is Node.js and PHP+Apache2″. So, we never really set out to compare PHP and Node.js as languages, rather, we wanted to compare how they perform in a standardized but limited real world scenario.

    MySQL usage is the same in both test scripts, even if PHP could have performed better with connection pooling. So the test is to some extent measuring how MySQL performance differs depending on connection method.

    I do think the article is fairly clear on that we end up comparing two different application stacks that favours two different application design patterns. Node.js, due to some of it’s inherent features will perform better than php. Not primarily because PHP is a bad language with inherent performance problems, but because PHP needs to be set up with an http server that lets PHP handle each request in a separate process.

    Again, thanks for your feedback and please come back for more reading, perhaps some of our upcoming articles are more to your liking.

  8. One question – Is PHP more robust with exception handling vs. Node.js? Since each client is essentially a separate thread, an exception in one client wouldnt bring down your whole server. However, with Node.js, the event loop is all in one thread which means that if one client has an exception it would bring down that thread which means all clients are affected. Is this correct?

  9. Although I like all sorts of number crunching and hence enjoyed reading your article I strongly have to agree with what lx said above.
    Yes, you said it’s not a comparison between node and PHP but between node and PHP+Apache.
    However, I still doubt the value of the results very much. Both ‘applications’ do nothing but one query and some string operation. You could as well simply have served a static ‘Hello World’ page. Then at least we would have an somehow cleaner comparison of firing-up-times. But still it would not be saying very much.

    When comparing different web application systems it needs a fully blown real world web application where – well, I don’t need to tell you what is going on in such a beast. And you know better than me that it would take quite an effort to only write comparable systems, let alone benchmark and analyze them.
    There is just more to it than handling simple requests.
    And going down that road – what about load balancing, clustered environments, fail over infrastructures.

    I can very well see some places where node.js can be a very interesting if not outperforming alternative and node.js (and it’s siblings) will probably even completely replace PHP/Python/Ruby-Apache/Nginx/Lighthttp setup for a number of smaller tasks.
    I can’t see it for complex high traffic sites anytime soon though.

Leave a Comment