Benchmarking Codswallop: NodeJS v PHP

Posted: 2013-11-12
Category: PHP

Sometimes people link me to articles and ask for my opinions. This one was a real doozy.

Oh goody, a framework versus language post. Let's try and chew through this probable linkbait.

This is more of a benchmark test than example.

Ok so we're benchmarking NodeJS v PHP. Weird, but I'll go along with it.

External library used for Nodejs was cheerio and PhpQuery for Php.

Well, now we're testing cheerio v PhpQuery which is a bit different, but fine, let's go along with it. These two libraries do essentially the same thing, let you parse HTML and traverse about the DOM model. I can see how one might think it's fair, even if the title is already misleading...

Nodejs took 175.535 sec to complete where as Php took 711.790 sec to complete. Php was four times slower than Nodejs.

Sure it was, because phpQuery uses file_get_contents() which is blocking, meaning each and every single one of those web requests has to be done in turn. PHP is just sitting there waiting for the server to respond, when it could be doing something else. Also where were these tests being run from? The moon?!

We've come a long way from the original title of "NodeJS v PHP", to really asking "cheerio v phpQuery", which is realistically asking "Blocking v Non-Blocking", or Synchronous v Asynchronous".

Benchmarking to see if "doing multiple things at once" is faster than "doing one thing at a time" almost certainly sounds like a waste of time, but it would at least match the actual code examples being run and therefore be a valid test. Let's just pretend it was worded like that, and have a go at this benchmark ourselves.

Setup

I made a repo and shoved a Vagrantfile in there with just the basic Ubuntu 12.10 image. I could have done up a whole Puppet manifest, but this will be a useful learning exercise for people who want to learn how to setup ReactPHP anyway. Vagrant up that box, then ssh in. All the test scripts are in there.

I have no idea what version of PHP he is using because he does't actually say, but let's just go with PHP 5.5 ourselves because it is the current more recent stable version.

$ sudo add-apt-repository ppa:ondrej/php5
$ sudo apt-get update
$ sudo apt-get install php5-cli

That gets PHP ready.

$ sudo apt-get install -y php5-dev libevent-dev
$ wget http://pecl.php.net/get/libevent-0.0.5.tgz
$ tar -xzf libevent-0.0.5.tgz
$ cd libevent-0.0.5 && phpize && ./configure && make && sudo make install
$ echo "extension=libevent.so" | sudo tee -a /etc/php5/cli/php.ini

That should sort out libevent, so we can let PHP work with event loops.

$ sudo apt-get install -y python-software-properties python g++ make
$ sudo add-apt-repository ppa:chris-lea/node.js
$ sudo apt-get update
$ sudo apt-get install -y nodejs

This will install a version of Node much newer than the 0.6.x Ubuntu's default repo will give you.

$ npm install request
$ npm install cheerio

Now we have the NPM modules for Node to do its thing.

Variables

Bandwidth: 15 Mbps
Vagrant Memory: 1024MB
PHP version: v5.5.5
NodeJS version: v0.10.21

I used phpQuery with the one file download, because they haven't bothered getting it on Composer yet. If they're going to flagrantly ignore PSR-0 and Composer I may as well go with performantly packaged option.

Run the Tests

$ cd /vagrant
$ chmod +x ./run.sh
$ ./run.sh

This will run the same two examples from the original article first, then run my non-blocking example put together with a little help from Chris Boden, one of the ReactPHP developers.

Results

My async re-do of the original PHP example kicked the fuck out of everything else.

Here are the numbers:

Node v0.10.21 + Cheerio

real 0m45.142s
user 0m8.081s
sys 0m0.888s

PHP 5.5.5 + phpQuery (Blocking)

real 3m33.601s
user 0m8.685s
sys 0m1.212s

PHP 5.5.5 + ReactPHP + phpQuery

real 0m23.877s
user 0m10.237s
sys 0m1.568s

People like pretty graphs:

Num. Seconds Passed v Page Number

Conclusions

The primary conclusion to draw from this is that doing 200 HTTP requests in sequence is slower than making multiple requests at the same time. Shocker that.

We can also be pretty confident that the original article was completely wrong about everything. PHP is not as pathetic at async code as the original "benchmark" alludes to. It is entirely down to how a package decides to implement libevent or libev, much like ReactPHP has done.

Both systems can probably go faster somehow, and both systems could probably have their API's cleaned up some to make this even easier. They both need some fault tolerance because when I cranked up the number to 1000 both systems had problems.

I'm not going to say either system is faster, just that the massive gap in the original article comes down purely to picking a blocking system. Run it yourself, and make your own conclusions. Let's just say that PHP is not sucking as bad as some people would expect.

Update: A few people have mentioned that Node by default will use maxConnections of 5, but setting it higher would make NodeJS run much quicker. As I said, I was sure NodeJS could go faster but I would never make assumptions about something I don't know much about. I re-ran the test and the results reflect these suggestions. Removing the blocking PHP approach (because obviously it's slow as shit) and running just the other three scripts looks like this:

Num. Seconds Passed v Page Number

Look, they're the same. At this point it is just a network test. The speed between the two systems for handling this specific task is essentially identical, with both systems taking it in turns to "win" as they swap by about 0.3 seconds. This does not really effect any of the rest of the article, because it was assumed node could be tweaked to be more in line with PHP, I was never trying to suggest PHP was faster than node (even though a bunch of you seemed to think I did). Where did that come from?

Observations

It is worth noting that the faster the network connection the less the difference is between the two. At 82 Mbps down Jon Sherrard was reporting "PHP 5.5.5 + ReactPHP + phpQuery" running at 15 seconds and "Node + Cheerio" running at 18 seconds.

I asked a few friends to try having a go at improving the speed of the original posters NodeJS code, and a few alternatives sprung up from Alex Akass. His results have them pegged as only slight speed improvements, while mine had ps4.js clocked at about 9 seconds, which is mental. It did use a lot of child processes and fail when the page count was bumped to 1000 though, which is a useful reminder that none of this is magic and everything has costs.

Thoughts

It seems likely to me that people just assume PHP can't do this stuff, because by default most people arse around PHP with things like MAMP, or on their shitty web-host where is is hard to install things and as such get used to writing PHP without utilizing many extensions. It is probably exactly this which makes people think PHP just can't do something, when it easily can. It is also probably this that causes package developers to generally ignore depending on functionality that would be extension only, just like PyroCMS often has to do.

This is why the work being done by folks like the ReactPHP project is incredibly important. They're wrapping up things like libevent and libev to provide developers with a simple Composer package to base other code on. Simple dependencies abstracting complicated stuff is exactly what modern development is all about, and PHP is keeping up nicely.

The HTTP Client library I used in this example is a little weak and only works with HTTP 1.0, which is problematic. For this reason Igor Wiedler himself recommends that you don't use it, but there is no reason why a better version could not be built.

Guzzle might get some async love soon too wrapping up curl multi, as Nils Adermann just finished up a pull request. Great timing!

Summary

The trolls will no doubt say I am only defending PHP (again) because I am just not clever enough to learn other languages, but really I am tired of people making shit up. Once again people this is an example, not a specific piece of rage against just one person that wrote one shitty article. This happens a lot, and this should be an example to people who will try it again.

PHP has enough legitimate concerns without people just pretending they're scientists and using bullshit numbers to prove that up is left and cheese is made of potatoes.

Update: 08/11/2013

I am happy that the vast majority of people got the point of this article. It got some amazing attention reaching about 40,000 hits on Google Analytics, front page on Hacker News for a bit, etc but the best was several tweets and RTs from the official NodeJS account, who have read it and seem to agree:

So you can be happy or sad about this article, but it is not wrong.

Comments

Gravatar
Khayrattee

2013-11-12

Phil, this is a brilliant analysis and a lot of learning stuff in there (at least for me).

Cheers!

Gravatar
Jan.j

2013-11-12

Really great article and much better test. I really like your blog and it goes to my bookmarks.

Gravatar
Shaun

2013-11-12

Whoop!

Am also getting tired of people repeating how bad PHP is, glad to see people defending it ;)

Gravatar
Bruno Cassol

2013-11-12

Thanks for taking the time to setup and write this Phil. People love to bash things they don't fully comprehend.

Gravatar
Alex Akass

2013-11-12

I should have put a limit on the child processes on my node example ns4.js!

Simplest way is divide your set into sub arrays by dividing by the number of threads/cores avaliable and then using an outer seq loop kick off a sub group at a time and wait for all responses to return till calling this() to move onto the next group (note not a parrell loop). That should deal with node creating to many children processes at once and limit the errors!

Plus sorry for being a tired and stupid douche creating benchmarks on an already stupidly bogged down machine with many vm's running! School boy error! Shows how much I benchmark stuff! Or engage my brain :)

Gravatar
Rafael Gonzaga

2013-11-12

First up, sorry by my poor english =/, but i'll try my best.

I totally agree with you, i remember that once i have heard one guy talking about differences among, PHP, RoR, Djan - whatever - And he said that PHP first of all was uglier than others, said that PHP have too many methods to the same thing and i agree in some points.

But i think - people doesn't realize that these "frameworks" were made on top of their languages, like Rails or Python and then, they craft a line code command tool that automate everything for them, and the world gets perfect hahaha.

It's not about rage, like you said, it's about people finally understand that PHP has being around for time enough to be over tested, over used and has a huge community who make things as the same way in other languages or frameworks are doing, doing things like Laravel 4, Fuel, Symphony ... and even more, like the examples here above.

I know, i ran out a little here about the context, but i just wanted to say it, PHP has more than MAMP, XAMP or whatever - People have to "learn" about the subject before throw rocks in it - and finaly - You don't throw rocks in trees that doesn't give fruits, which mean that logically, the trees who gives more fruits will take more damage hahahaha, if you know what i mean.

Best regards

Gravatar
Hey Dude

2013-11-12

But... but... some things marketed as cheese indeed contain lots of potato!

Gravatar
Troy Goode

2013-11-12

I found this interesting comment on Hacker News that indicates there may be a simple flaw in your node.js script. Mind making the (one line) change and reporting back on any changes to the results?

https://news.ycombinator.com/item?id=6717938

Gravatar
Tim

2013-11-12

Hey! Don't confuse NodeJS people, set maxSockets to something bigger that default (5) http://nodejs.org/docs/latest/api/http.html#httpagentmaxsockets

PS I understand that it is not a benchmark PPS But if it were a benchmark it looks like network benchmark and not language comparison

Gravatar
Sam

2013-11-12

I've never understood why people constantly bash one language over another, especially when it's to do with speed. If speed was the main priority everything would be written in ASM. There are times when ASM is the right language to use, but a website isn't it. Writing good code will have far more impact on performance than how fast the language is. Badly written C++ can be 1000x slower than well written PHP.

There is no perfect language, just pick any language and I guarantee someone is bashing it for some reason or another. A good programmer should use whichever language makes the most sense for their project.

If I'm writing something for the web that needs to be easily installable by a lot of people, I use PHP because that's what it's best at. You can't just drag and drop a NodeJS site via FTP, visit a URL, and have running automatically. Well OK you can, but you need a host that supports that. PHP will do that on any host, including an ultra cheap shared hosting account.

I'm not a PHP fan-boy either, I'm currently writing a site in NodeJS right now because for this particular site it made the most sense, for another site I might use PHP or C# or Java or Python or Ruby and so on. It all depends on which is the best fit.

Gravatar
Blueshifter

2013-11-12

I'm confused about your graph. Either I'm reading it wrong (likely), or PHP+React time travels. At the very least, why is that all squiggly and the other two fairly linear?

Gravatar
Ashish

2013-11-12

Great points. It would actually be interesting to run both scripts through a profiler to see where they were actually spending the bulk of their time. I haven't used phpQuery in a bit but if I remember correctly, the "newDocumnet" calls are surprisingly expensive.

Gravatar

2013-11-12

Blueshifter: There is just a little more concurrency in the ReactPHP script than in the NodeJS example used because Node is restricted to 5 connections. It will do them generally in pretty much the right order whereas React goes nuts and tries to do the lot at the same time.

When NodeJS is configured to use maxConnections of about 64 (as somebody kindly pointed out on HackerNews) the results are considerably closer, which better reflects the intentions of this post: that they're basically the fucking same and people should stop the useless language wars. :)

Gravatar
Aleksander Hristov

2013-11-12

I dont get why are you even wasting time on these kind of people ? Just ignore their stupidness.

Gravatar

2013-11-12

Aleksander: "If you ignore a problem it will go away" has historically not been all that effective. Providing links that show off common problems and embarrass trolls into hopefully shutting up for a bit helps keep the internet that little bit calmer.

Gravatar
Reynold Lariza

2013-11-12

Hi Phil,

I've been following you since I discovered CodeIgniter two years ago.

Last year I have made a REST http server writtein in PHP/CodeIgniter. Its stable, and works nice. But when we used it on high transaction-based application, it crashes. Then I found out that its utilizing a lot of resource in server, particularly memory.

A few months ago, I made a similar generic REST http server in NodeJS. Using the same VM server, it never crashed, nor utilized a lot of memory in the server. We used it on our production applications. So far so good, its up and running as of this time.

Now for my own benchmark for flat REST http requests. NodeJS async really beats PHP sync, similar outcome to what you have done. But with MySQL database on the backend on both, PHP is faster by around 5% according to my tests. (maybe because, my implementation on NodeJS is the fault)

Now you mentioned ReactPHP. That got me interested again with some prototyping. So I figured I'll check it out. However, based on the site, and a couple of research. It seems it still not mature enough. Now, I maybe saying upfront without doing further research. But the impression it gave me is that, as of the time of this writing, it may not be a recommendable implementation, until I could find some real-world high traffic examples using ReactPHP.

Even though your benchmark with ReactPHP vs NodeJS contains a promising result. I cannot see it being used on production-grade applications (yet).

One more thing. I use both PHP and NodeJS on our production-grade application developments. I use PHP for the front-end and low traffic applications, and NodeJS on serving our web services.

P.S. Forgive me for my language manner, I feel kind of tired from work and sleepy while writing this, I just felt like sharing my opinion :P

Gravatar
Aric Lasry

2013-11-12

Thanks Phil for this great benchmark. I'm spending a lot of time optimizing PHP applications and it feels good to see that nodeJS is not "faster" just because of the layer in between (the language and the operating system...) but because of the libev (event loop library which is implemented by nodeJS). This benchmark clearly show why it is usually faster and makes me realize that if we try a little bit more we can make PHP very fast.

Gravatar
Chris Ravenscroft

2013-11-12

Phil,

Something is a bit surprising here. Please bear with me.

You wrote: "At this point it is just a network test." So far, I am with you. I expect network performance to be the major piece as it will by definition be slower than even interpreting code. In fact, when comparing two languages based on network performance, I do not expect to learn much about the languages' performance itself.

You then wrote: "It is worth noting that the faster the network connection the less the difference is between the two" Now, this is weird. If network performance is the bottleneck, then any performance gap between both languages should be amplified, not reduced.

I can only surmise that the issue here is latency rather than bandwidth. It could be that the libraries used, in both cases, add too much overhead for the language performance to be measurable (again) The problem could have to do with how they manage their connections pools, linger settings...

I would be very interested in a comparison using only UDP messages to work around these overhead issues.

Gravatar

2013-11-12

Chris: I said: "It is worth noting that the faster the network connection the less the difference is between the two" and you followed it saying "Now, this is weird. If network performance is the bottleneck, then any performance gap between both languages should be amplified, not reduced."

It seems you think that the faster the network the bigger the differences between the two, right?

Well, maybe. But if the code in use is bad at handling concurrency then its going to have to wait for some HTTP threads, so the slower the network the more waiting is happening. PHP was doing a great job because React has quite a high default number of max connections. Node was limiting to 5 connections, so it had to do a lot more waiting for HTTP requests to complete and therefore had worse performance on slow networks than it would on higher speed networks.

Gravatar
Gggeek

2013-11-13

I know this was never meant to be a serious benchmark in the first place, but it probably would be more interesting to measure separately the networking part and the dom parsing part - we could them better compare pieces of code offering similar functionality and evaluate approaches for optimizing

Gravatar
Cyril Mazur

2013-11-13

Nice one. People please stop bitching around with the tech you use, just get it done, your users don't care if your website runs PHP, node or whatever.

Gravatar
Sam Reed

2013-11-13

Some more NodeJS tests:

0.10.22:

maxSockets default real 0m30.691s user 0m10.089s sys 0m0.628s

maxSockets 64 real 0m14.559s user 0m10.409s sys 0m0.548s

maxSockets Infinity real 0m10.290s user 0m9.133s sys 0m0.672s

Node 0.11.8: (implicit maxSockets Infinity) real 0m7.480s user 0m6.372s sys 0m0.680s

Gravatar
Weng Fu

2013-11-13

I am waiting to see a benchmark between Nodejs and Microsoft Word.

Gravatar
Saiyine

2013-11-13

Sir, you have won an Internet.

Gravatar
Antonio

2013-11-18

Wow! I just ran this test (PHP only, as on my MediaTemple Gridserver Node.js is not possible), just to have this amazing results:

PHP 5.3.27 (cli) (built: Jul 25 2013 17:48:04) + phpQuery

real 6m6.591s user 0m16.230s sys 0m0.710s PHP 5.3.27 (cli) (built: Jul 25 2013 17:48:04) + ReactPHP + phpQuery

real 0m19.127s user 0m17.840s sys 0m0.860s == Complete ==

For me, this is completely new and open up a whole new world on async php. Thank for taking the time to write this, Phil.

Gravatar
Ravin

2013-11-27

I will say to the people who bash PHP for fun and pettiness, similar to Richard Dawkins quote to people who detest science. "Science is interesting, and if you don't agree, you can f*** off". By the way, very good article, well done.

Gravatar
Rick

2013-11-28

More crappy linkbait that needs someone with authority to put them down. What's even worse is that ONolan's picked it up on the Ghost blog and now claims Ghost is 687% faster than "PHP Alternatives".

http://www.appdynamics.com/blog/2013/10/17/an-example-of-how-node-js-is-faster-than-php/

http://blog.ghost.org/hosted-platform-preview/ (last paragraph)

Gravatar

2013-11-29

Hey Rick,

I welcome having somebody "put me down" - I love to be corrected when I am wrong - but it certainly wont be you.

  1. The original article was fundamentally flawed in its concept, mine was simply showing off the flaw in this concept with a more realistic comparison - instead of using anecdotal evidence you read on CMS product blogs.

  2. The official NodeJS account retweeted my tweets about this and agreed with me.

https://twitter.com/nodejs/status/400295942311534592

  1. Running a CMS is a very different to what we are talking about here, which is a very specific benchmark of a very specific task.

  2. Ghost v WordPress is not the same as NodeJS v PHP.

  3. WordPress is the devil.

  4. If WordPress used async code now and then, it would be faster.

  5. I don't think you actually read any of this article before you responded, which is a shame.

  6. I don't think you understand benchmarks.

  7. I think you misunderstood Johns claims "Some tests have already found Ghost to be up to 678% faster than alternatives built on PHP".

I could reach out to John and ask for his feedback, but I have no interesting in wasting his time on your ignorance of the facts and lack of logic. Feel free to ask him to comment yourself though.

Gravatar
Rick

2013-11-29

Hey Phil,

My apologies, reading my post back it seems I may have worded it slightly wrong, giving you the wrong impression. I was not calling your article linkbate - I completely agree with your article. I was calling the app dynamics post linkbate.

I actually replied to his blog post prior to replying to yours (disqus username rmwebs), basically saying he may as well have compared any Node.js app to any PHP app. What really annoyed me about his article was how misleading it was, starting gout as Node.js vs PHP and turning instantly into Ghost vs Wordpress, which as I'm sure you'll agree isn't going to be even close to a fair assessment if you try to benchmark them, given how vastly different the two are.

If I'm not mistaken Johns comments in the ghost article are based on the figures from the app dynamics blog post, which would make them pretty skewed to say the least.

So yeah, sorry if it sounded like I was being a dick. I probably shouldn't write comments when half asleep!!

Gravatar
Seppo Yli-olli

2013-12-11

Good article. Personally I have never heard of performance being called the reason people do not like PHP as a language. It is usually more about how the language itself is structured (builtin function namespacing etc) or how PHP array tries to be a one-thing-does-it-all datastructure and it having fairly little in common with arrays in various other languages like eg C. But yes, there's seriously no reason not to use PHP if you like PHP as a language.

Posting comments after three months has been disabled.