TechHui

Hawaiʻi's Technology Community

versus

I decided to hit 2014 running, and whipped up another shootout experiment; this time between the popular, new node.js versus the popular, old Ruby on Rails! I had selected node.js for a recent project because of its fabled-fast "event-driven, non-blocking I/O" architecture. In a nutshell, node.js is built from the ground up to do slow calls (like query a DB or consume an external API)  asynchronously using a callback paradigm. For example, instead of waiting for a DB call to finish, the node.js thread services the next request and gets notified when the first DB call finishes, upon which it invokes the callback you specified (i.e. finish processing the former request that came in). This architecture is great for speed and scalability when your load is truly I/O bound and CPU usage relatively light, as a single node.js thread can handle many incoming requests in this fashion. While the raw concept is not new, it probably became popular in part as the web grew more "AJAX-y" and web requests grew relatively heavier on I/O and light on processing since an entire page isn't being computed per AJAX call.

During aforementioned project, I indeed felt node.js responding quicker under load, but wanted to see some numbers. I setup an experiment and decided to simulate I/O-bound, low-CPU traffic with a very simple page that:

  1. Finds lowest customer ID in DB.
  2. Finds highest customer ID in DB.
  3. Fetches customer by random ID between lowest and highest ID.
  4. Fetches that customer's list of orders sorted by date (by the DB non-indexed).
  5. Return an undecorated HTML list of orders.

To populate the DB, I generated 8,000 customer records and for each customer randomly generated 1-10 orders on Heroku's free PostgreSQL Hobby Dev plan. Each test app was deployed to Heroku's manifold with one dyno. To simulate load, I used the free tier of Blitz, originating requests out of Virginia (closest to Heroku test apps) with a timeout of 3s, starting and ending with 200 concurrent clients over a 60s interval. The first test I did was a baseline with Rails deployed with Unicorn using 1 worker. Unsurprisingly, Rails was decent. node.js, sailed through the test effortlessly successfully processing over 10K hits, roughly twice the throughput of Rails. See results table below.

Now nobody does Rails production deploys (hopefully) with one worker; the rule of thumb is 2-4 Unicorn workers on Heroku. Fundamental to the comparison here is the incumbent parallel/concurrent model (multiple threads) versus the new single-threaded, non-blocking model. So I configured the Rails app for 4 workers, as RAM requirements are low here. Much better! We are successfully serving thousands of hits with no errors now. We still get ~15% higher, and a steadier throughput with node.js though.

At this point, the results were only semi-satisfying. One expected gains with node.js but the delta was less than I expected. Unicorn uses forked processes, while node.js uses one thread so I save on memory, but for this test every app has 1 dyno = 512MB RAM which is fine for what we are doing. What if I used the power of non-blocking I/O mixed with concurrency? This is where node-forky comes into play. This is a node package that makes running cluster mode easy. node-forky will automatically fork workers up to the number of compute cores you have, and replace dead workers. Rumor has it that on Heroku the number of cores is 4.

The results were rather shocking. Node with node-forky fared worse than node.js on a single thread. I noticed more timeouts and memory errors, leading me to suspect that node-forky and/or node.js cluster mode is more memory-laden or somehow less optimal with concurrency than Unicorn, at least for these load characteristics. Perhaps if Heroku had fewer cores and/or more RAM per dyno, node-forky could shine.

At this point, I wondered what Rails would be like implemented in an "asynchronous, non-blocking I/O" way. Projects like em-postgresql-adapter and async-rails have not been maintained for 1-2 years. Like most technology stacks, it seems the callback programming paradigm did not really take in the Rails world. What DID happen though is it seems like the parallel concurrency model was taken to the next level with technology like Puma. Puma is a new Ruby/Rack server that is both multi-processed and multi-threaded. Unicorn only supports forked processes, but Puma does that AND allows for multi-threading. The joy of this is that even blocking I/O calls can be run concurrently achieving similar theoretical gains as a single-threaded, non-blocking model without having to code in "callback hell". It was time to throw the cat in the fight. After some experimenting, I optimized Puma for this load at 6 threads and 4 workers (processes) for a total of 24 concurrent threads.

The results were enlightening! The performance was about the same as running one thread of node.js. I stuck to MRI for Ruby due to time constraints, but note that JRuby or Rubinius should offer even greater performance since they support true multi-threading on multi-core hardware. Taking a step back, I suppose this isn't too surprising. For all the talk of node.js being single-threaded, it must use a thread pool somewhere, or where do all the long-running processes get off-loaded to? Turns out node.js DOES use an internal C++ threadpool. So conceptually node.js is one process, one thread backed by multiple internal child threads, while Puma is multiple processes backed by multiple child threads. If anything, I'd expect Puma to perform better under concurrent loads.

 

* 200 concurrent clients start to end, 60s interval, 3s timeout, Virginia

* Values are average of 3 trial runs per setup

* Please do not confuse the meaning of "workers" and "threads" across rows below. I am using the vernacular as the specific technology refers to them, but a Unicorn "worker" is not the same thing as a Heroku "worker" which is not the same thing as a node-forky "worker".

setup description hits timeouts min. hits/s max. hits/s
Rails on Unicorn 1 worker, 1 dyno 5,702 0 63 112
Node 1 thread, 1 dyno 10,743 1 160 191
Rails on Unicorn 4 workers, 1 dyno 9,339 0 85 196
Node with node-forky 4 workers, 1 dyno 8,853 7 98 181
Rails on Puma 6 threads, 4 workers, 1 dyno 10,910 0 161 201

 

To gather a bit more info on scalability, I maxed out the concurrent clients value on Blitz Free to 250. Rails+Unicorn and Node+node-forky setups were crushed somewhat expectedly. Single-thread node.js and Rails+Puma continued to shine. Rails+Puma demonstrated a bit higher throughput than node.js but with a bit more erratic performance.

 

* 250 concurrent clients start to end, 60s interval, 3s timeout, Virginia

* Values are average of 3 trial runs per setup

setup description hits timeouts min. hits/s max. hits/s
Rails on Unicorn 1 worker, 1dyno 5,002 214 18 112
Node 1 thread, 1 dyno 11,998 1 155 228
Rails on Unicorn 4 workers, 1 dyno 8,407 431 14 227
Node with node-forky 4 workers, 1 dyno 9,480 133 60 223
Rails on Puma 7 threads, 4 workers, 1 dyno 12,783 13 34 236

 

Takeaways

  • Node is fast and scalable, and so is the right technology choice for whatever stack you are on.
  • Node.js with node-forky may be compelling in the future as "cluster mode" implementation improves, giving you the best of non-blocking I/O, plus parallel processing, but right now the parallel part may not be as polished as other technology stacks.
  • You can already achieve similar non-blocking I/O performance with concurrent processing of I/O threads. Conceptually, this IS what node.js is doing, but in other stacks the application programmer does not have to deal with messy "callback hell", although the tools programmer has to deal with hard multi-threaded programming.
  • Tune your servers for your load! Most of the numbers posted above were not optimal out-of-the-box and required configuration tuning.
  • As always, your mileage may vary depending on many variables such as your code quality, traffic patterns, infrastructure choices, work characteristics, and yes, even time of day especially if you are on shared metal.

Hope this shootout was useful or enlightening for you to make your own decisions. Have a great 2014!

Views: 14914

Comment

You need to be a member of TechHui to add comments!

Join TechHui

Sponsors

web design, web development, localization

© 2017   Created by Daniel Leuck.   Powered by

Badges  |  Report an Issue  |  Terms of Service