Re: OT: Question/Netflix issues?

Paul Graydon Wed, 23 Mar 2011 18:43:34 -0700

On 03/23/2011 09:41 AM, sillywiz...@rs4668.com wrote:

"Lyndon Nerenberg (VE6BBM/VE7TFX)"<lyn...@orthanc.ca>  wrote:

Guess that move to Amazon EC2 wasn't such a good idea. First reddit,
now netflix.
http://techblog.netflix.com/2010/12/four-reasons-we-choose-amazons-cloud-as.html

FWIW, at $DAYJOB we haven't been able to run out a pool of a couple of
dozen EC2 instances for more than two weeks (since last June) without
at least one of them going down.  The same number of hardware servers
we ran ourselves in Peer1 ran for a couple of years with no unplanned
outages.

Amortized over five years, Peer1 colo + hardware is also cheaper than
the equivalent EC2 cost.

Hey everyone! Join the cloud, and stand in the pissing rain.

--lyndon

Interesting, because we run 120 with almost no issues whatsoever (3 failures over the 
past 12 months, none of which caused downtime). I've never had an EBS volume fail in the 
18 months we've used them. IMHO, the "issues" with the cloud are almost always 
at a layer above the infrastructure.

--L

Reddit has routinely had EBS volumes either outright fail (2 majoroutages in the last month/month and a half, both caused by several EBSsvanishing), or show some not insignificant degradation in performance,and it seems barely a month goes by when I don't hear someone on twittertalking about similar with their infrastructures. Most of the problemsI've heard about do seem to revolve around EBS, however, rather thantheir other services. It may be just the nature of people to pick onand shout about the biggest targets, but I'm reasonably sure almost allthe problems I hear about relating to cloud services revolve aroundAmazon and rarely their competitors.


http://highscalability.com/blog/2010/12/20/netflix-use-less-chatty-protocols-in-the-cloud-plus-26-fixes.html

When it comes to other layers in the infrastructure probably one of themost talked about problems is network latency between instances.Netflix had to specifically re-engineer their platform because of it(and other major users talk of similar changes). There is almostcertainly an argument to be made that the outcome of the forcedre-engineering is a good thing as it's generally boosting resilience,but that it's been forced on them in such a way surely should also be ofsome cause for concern also.Reddit seem to be working hard to make their platform as resilient aspossible to their routine problems cause by the infrastructure. One oftheir outgoing dev's gave a pretty interesting read on the problemsthey'd experience with Amazon:http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_down_for_6_of_the_last_24_hours/c1l6ykx

I absolutely do think cloud hosting / virtual servers have value and useand shouldn't be underestimated or written off as a fad, but I'm alsonot entirely convinced at the moment that Amazon is a vendor toparticularly trust with such services, I'd probably also argue thatanyone keeping their eggs in one basket and relying on a single vendorfor such services is taking a significant risk. There are plenty oftools and libraries out there to help provide a standard API for rollingout servers on different platforms. It seems crazy not to takeadvantage of the flexibility the cloud offers to remove as many SPOFs aspossible.


Paul

Re: OT: Question/Netflix issues?

Reply via email to