I decided to hit 2014 running, and whipped up another shootout experiment; this time between the popular, new node.js versus the popular, old Ruby on Rails! I had selected node.js for a recent project because of its fabled-fast "event-driven, non-blocking I/O" architecture. In a nutshell, node.js is built from the ground up to do slow calls (like query a DB or consume an external API) asynchronously using a callback paradigm. For example, instead of waiting for a DB call to finish, the node.js thread services the next request and gets notified when the first DB call finishes, upon which it invokes the callback you specified (i.e. finish processing the former request that came in). This architecture is great for speed and scalability when your load is truly I/O bound and CPU usage relatively light, as a single node.js thread can handle many incoming requests in this fashion. While the raw concept is not new, it probably became popular in part as the web grew more "AJAX-y" and web requests grew relatively heavier on I/O and light on processing since an entire page isn't being computed per AJAX call.
During aforementioned project, I indeed felt node.js responding quicker under load, but wanted to see some numbers. I setup an experiment and decided to simulate I/O-bound, low-CPU traffic with a very simple page that:
To populate the DB, I generated 8,000 customer records and for each customer randomly generated 1-10 orders on Heroku's free PostgreSQL Hobby Dev plan. Each test app was deployed to Heroku's manifold with one dyno. To simulate load, I used the free tier of Blitz, originating requests out of Virginia (closest to Heroku test apps) with a timeout of 3s, starting and ending with 200 concurrent clients over a 60s interval. The first test I did was a baseline with Rails deployed with Unicorn using 1 worker. Unsurprisingly, Rails was decent. node.js, sailed through the test effortlessly successfully processing over 10K hits, roughly twice the throughput of Rails. See results table below.
Now nobody does Rails production deploys (hopefully) with one worker; the rule of thumb is 2-4 Unicorn workers on Heroku. Fundamental to the comparison here is the incumbent parallel/concurrent model (multiple threads) versus the new single-threaded, non-blocking model. So I configured the Rails app for 4 workers, as RAM requirements are low here. Much better! We are successfully serving thousands of hits with no errors now. We still get ~15% higher, and a steadier throughput with node.js though.
At this point, the results were only semi-satisfying. One expected gains with node.js but the delta was less than I expected. Unicorn uses forked processes, while node.js uses one thread so I save on memory, but for this test every app has 1 dyno = 512MB RAM which is fine for what we are doing. What if I used the power of non-blocking I/O mixed with concurrency? This is where node-forky comes into play. This is a node package that makes running cluster mode easy. node-forky will automatically fork workers up to the number of compute cores you have, and replace dead workers. Rumor has it that on Heroku the number of cores is 4.
The results were rather shocking. Node with node-forky fared worse than node.js on a single thread. I noticed more timeouts and memory errors, leading me to suspect that node-forky and/or node.js cluster mode is more memory-laden or somehow less optimal with concurrency than Unicorn, at least for these load characteristics. Perhaps if Heroku had fewer cores and/or more RAM per dyno, node-forky could shine.
At this point, I wondered what Rails would be like implemented in an "asynchronous, non-blocking I/O" way. Projects like em-postgresql-adapter and async-rails have not been maintained for 1-2 years. Like most technology stacks, it seems the callback programming paradigm did not really take in the Rails world. What DID happen though is it seems like the parallel concurrency model was taken to the next level with technology like Puma. Puma is a new Ruby/Rack server that is both multi-processed and multi-threaded. Unicorn only supports forked processes, but Puma does that AND allows for multi-threading. The joy of this is that even blocking I/O calls can be run concurrently achieving similar theoretical gains as a single-threaded, non-blocking model without having to code in "callback hell". It was time to throw the cat in the fight. After some experimenting, I optimized Puma for this load at 6 threads and 4 workers (processes) for a total of 24 concurrent threads.
The results were enlightening! The performance was about the same as running one thread of node.js. I stuck to MRI for Ruby due to time constraints, but note that JRuby or Rubinius should offer even greater performance since they support true multi-threading on multi-core hardware. Taking a step back, I suppose this isn't too surprising. For all the talk of node.js being single-threaded, it must use a thread pool somewhere, or where do all the long-running processes get off-loaded to? Turns out node.js DOES use an internal C++ threadpool. So conceptually node.js is one process, one thread backed by multiple internal child threads, while Puma is multiple processes backed by multiple child threads. If anything, I'd expect Puma to perform better under concurrent loads.
* 200 concurrent clients start to end, 60s interval, 3s timeout, Virginia
* Values are average of 3 trial runs per setup
* Please do not confuse the meaning of "workers" and "threads" across rows below. I am using the vernacular as the specific technology refers to them, but a Unicorn "worker" is not the same thing as a Heroku "worker" which is not the same thing as a node-forky "worker".
setup | description | hits | timeouts | min. hits/s | max. hits/s |
---|---|---|---|---|---|
Rails on Unicorn | 1 worker, 1 dyno | 5,702 | 0 | 63 | 112 |
Node | 1 thread, 1 dyno | 10,743 | 1 | 160 | 191 |
Rails on Unicorn | 4 workers, 1 dyno | 9,339 | 0 | 85 | 196 |
Node with node-forky | 4 workers, 1 dyno | 8,853 | 7 | 98 | 181 |
Rails on Puma | 6 threads, 4 workers, 1 dyno | 10,910 | 0 | 161 | 201 |
To gather a bit more info on scalability, I maxed out the concurrent clients value on Blitz Free to 250. Rails+Unicorn and Node+node-forky setups were crushed somewhat expectedly. Single-thread node.js and Rails+Puma continued to shine. Rails+Puma demonstrated a bit higher throughput than node.js but with a bit more erratic performance.
* 250 concurrent clients start to end, 60s interval, 3s timeout, Virginia
* Values are average of 3 trial runs per setup
setup | description | hits | timeouts | min. hits/s | max. hits/s |
---|---|---|---|---|---|
Rails on Unicorn | 1 worker, 1dyno | 5,002 | 214 | 18 | 112 |
Node | 1 thread, 1 dyno | 11,998 | 1 | 155 | 228 |
Rails on Unicorn | 4 workers, 1 dyno | 8,407 | 431 | 14 | 227 |
Node with node-forky | 4 workers, 1 dyno | 9,480 | 133 | 60 | 223 |
Rails on Puma | 7 threads, 4 workers, 1 dyno | 12,783 | 13 | 34 | 236 |
Takeaways
Hope this shootout was useful or enlightening for you to make your own decisions. Have a great 2014!
© 2025 Created by Daniel Leuck.
Powered by
You need to be a member of TechHui to add comments!
Join TechHui