TechHui

Hawaiʻi's Technology and New Media Community

Scalability: Google I/O 2013 Registration

On March 13th, 4:00am HST, registration for the Google I/O 2013 conference opened to the public. Barely coherent, at 3:45am I rolled out of bed, bumping each wall in my apartment hallway as I stumbled towards my Macbook. I checked to see if my Google+ and Google Wallet accounts were properly set up as required for the registration process, and then I just sat staring at the countdown timer on the registration page. Promptly at 4:00am, the site refreshed and displayed the registration sign up page. Immediately I typed in my information and clicked the register button. I waited again watching the loading ring spin as Google was "searching" for a ticket to give to me. After about 10 minutes or so, the Google Wallet checkout modal window popped up with yet another timer counting down from 5 minutes. After clicking the checkout button, a smaller loader ring spun as the countdown headed towards zero. After 2 minutes passed, I knew something was wrong, but I remained hopeful that I would get my first I/O ticket ever. The timer eventually reached zero but the loader kept spinning. Something was absolutely wrong, so I checked my Google Wallet account and tweets on Twitter with the #io13 hashtag. My account showed a $900 pending charge and other people in Twitter land were having the same issue. Did I get the ticket? I didn’t get a confirmation, so I debated whether to try again or not. I had a pending charge and I didn’t want another one. My instincts told me I should try again, which I did. Again, the checkout modal window popped up after another 10 minutes, but this time it gave me an HTTP 500 error. What the #$%@! How can the almighty Google have so many issues with a simple registration process? Forty minutes into the registration process, I got in a third time. This time it finally worked and I got my confirmation email. The first pending charge canceled and for the first time ever, I’ll be going to the highest sought-after tech conference of the year.

The I/O registration process was a perfect example of how scalability is a hard problem and what to do and what not to do when faced with it. The problems seem to have originated from the checkout process with Google Wallet. My theory is that Google used the I/O registration to stress test the fairly new system. They knew thousands of people around the world were going to simultaneously hit the site and what could be a better test to see if Wallet could handle the load. Google likely had their engineers on standby, expecting for Wallet to fail in some way under the heavy traffic and that’s how they were able to get Wallet functioning perfectly in under an hour. There were thousands of pissed off fanboys and fangirls, but I’m sure all issues were noted and addressed. It is better to have angry fans than angry users that use Google Wallet to drive their commerce websites. Hopefully my theory is correct, and if so, this is a prime example of what to do: stress test the %#@$ out of a system whenever possible to find any point of weakness.

The "what not to do" part of this example is there weren’t any feedback or status updates from Google throughout the process. During the failures, people were left wondering if they had a ticket or not due to the pending charge. Luckily for me, I kept trying until I received a confirmation. In closing, do not leave users in the dark when the system is failing. Be candid and keep them up to date with some sort of notification system such as email, Twitter, a status website, etc... People are generally understanding when they are kept aware of the problems.

Special thanks to Bomberman (http://bomberman.ikayzo.com) for censoring the profanity in this blog.

Views: 136

Tags: bomberman, google, io13, scalability

Comment

You need to be a member of TechHui to add comments!

Join TechHui

Comment by Daniel Leuck on April 1, 2013 at 2:06pm

Its odd that in other areas, such as dropped connections in Google docs, there are immediate visual indicators, information about retries, clear and descriptive error messages, etc.

I'm happy you got your tickets and I especially like the Bomberman plug at the end :-)

Comment by Paul Graydon on March 29, 2013 at 9:35pm

I'm not convinced scalability in this case is such a hard problem.  Google have all the tech (Ganeti, spanner etc) , infrastructure (datacenters all over the world), and staff to handle this stuff.  Time and time again though these events occur and they fail to meet demand (see the fun with the Nexus 4 launch, last year's Google IO etc. etc.)  Numerous companies launch products or events on a larger scale without problem.. so Google really can't be that different.

A few possibilities spring to mind straight away:

1) They're clueless (I don't believe that for a second).

2)  They're not able to reserve sufficient infrastructure in advance (internal politics? Improving the profit margin?)

3) They don't care because either way they'll still sell all the products / tickets.

Sponsors

web design, web development, localization

© 2014   Created by Daniel Leuck.

Badges  |  Report an Issue  |  Terms of Service