Re: FYI Netflix is down

2012-07-11 Thread steve pirk [egrep]
On Mon, Jul 9, 2012 at 10:20 AM, Dave Hart wrote: > "We continue to investigate why these connections were timing out > during connect, rather than quickly determining that there was no > route to the unavailable hosts and failing quickly." > > potential translation: > > "We continue to shoot our

Re: FYI Netflix is down

2012-07-09 Thread Dave Hart
On Mon, Jul 9, 2012 at 15:50 UTC, Rayson Ho wrote: > There are also bugs from the Netflix side uncovered by the AWS outage: > > "Lessons Netflix Learned from the AWS Storm" > > http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html "We continue to investigate why these con

Re: FYI Netflix is down

2012-07-09 Thread Rayson Ho
On Sun, Jul 8, 2012 at 8:27 PM, steve pirk [egrep] wrote: > I am pretty sure Netflix and others were "trying to do it right", as they > all had graceful fail-over to a secondary AWS zone defined. > It looks to me like Amazon uses DNS round-robin to load balance the zones, > because they mention re

Re: FYI Netflix is down

2012-07-09 Thread valdis . kletnieks
On Mon, 09 Jul 2012 08:07:14 -0400, Alain Hebert said: > Their wide use of ASIC eliminate a lot of the headache of pure > software implementation. And gets you, in return, the headaches of buggy hardware, where bug-fixing is just a bit harder than "load the new release". ;) pgpSvdXo7xMkN.p

Re: FYI Netflix is down

2012-07-09 Thread Alain Hebert
Hi, Well depending on your "black box", your millage will vary. Their wide use of ASIC eliminate a lot of the headache of pure software implementation. Buffer, timing, expected results, etc. Their "real" sofware only represent a small part of the device and is mostly

Re: FYI Netflix is down

2012-07-09 Thread gb10hkzo-na...@yahoo.co.uk
Steve at pirk, I fail to grasp the concept in your argument. You do realise, do you not, that your $ black boxes from your favourite brand name vendor have software running inside of them do you not ? Case in point for example, the recent LINX issues it wasn't the hardware that gave th

Re: FYI Netflix is down

2012-07-08 Thread Ryan Malayter
On Jul 8, 2012, at 7:27 PM, "steve pirk [egrep]" wrote: > > I am pretty sure Netflix and others were "trying to do it right", as they all > had graceful fail-over to a secondary AWS zone defined. Having a single company as an infrastructure supplier is not "trying to do it right" from an eng

Re: FYI Netflix is down

2012-07-08 Thread steve pirk [egrep]
On Tue, Jul 3, 2012 at 1:00 PM, Ryan Malayter wrote: > Doing it the right way makes the cloud far less cost-effective and far > less "agile". Once you get it all set up just so, change becomes very > difficult. All the monitoring and fail-over/fail-back operations are > generally application-spec

Re: FYI Netflix is down

2012-07-06 Thread James Downs
On Jul 6, 2012, at 1:50 PM, Dan Golding wrote: > This happens all the time. Not saying Netflix is doing this, but lots of > other folks are. It’s a trap that’s easy to fall into. Especially with Netflix did the reverse. The moved *to* Amazon, so they could do "noops".

RE: FYI Netflix is down

2012-07-06 Thread Dan Golding
> -Original Message- > > I imagine Netflix is mature enough to track this data as you suggest, > and that's why they use AWS - downtime isn't a big deal for their > business unless it gets really, really bad. There is another possibility that is probably much more widespread amongst AWS

Re: FYI Netflix is down

2012-07-04 Thread Randy Bush
> Tell that to people in the third world without utilities. >>> Also, I don't think there is an acceptable level of downtime for >>> water. >> coming soon to a planet near you i work there regularly. the typical nanog kiddie does not. randy

Re: FYI Netflix is down

2012-07-04 Thread Kyle Creyts
Tell that to people in the third world without utilities. On Jul 3, 2012 8:32 PM, "Randy Bush" wrote: > > Also, I don't think there is an acceptable level of downtime for > > water. > > coming soon to a planet near you > > randy > >

Re: FYI Netflix is down

2012-07-03 Thread Randy Bush
> Also, I don't think there is an acceptable level of downtime for > water. coming soon to a planet near you randy

Re: FYI Netflix is down

2012-07-03 Thread Ryan Malayter
Jon Lewis wrote: > It seems like if you're going to outsource your mission critical > infrastructure to "cloud" you should probably pick at least 2 > unrelated cloud providers and if at all possible, not outsource the > systems that balance/direct traffic...and if you're really serious > about it,

Re: FYI Netflix is down

2012-07-03 Thread George Herbert
On Jul 3, 2012, at 10:38 AM, Jay Ashworth wrote: > - Original Message - >> From: "Steven Bellovin" > >> Subject: Re: FYI Netflix is down >> On Jul 2, 2012, at 3:43 PM, Greg D. Moore wrote: >> >>> At 03:08 PM 7/2/2012, George Herbert w

Re: FYI Netflix is down

2012-07-03 Thread Jay Ashworth
- Original Message - > From: "Steven Bellovin" > Subject: Re: FYI Netflix is down > On Jul 2, 2012, at 3:43 PM, Greg D. Moore wrote: > > > At 03:08 PM 7/2/2012, George Herbert wrote: > > > > If folks have not read it, I would suggest readin

Re: FYI Netflix is down

2012-07-03 Thread Seth Mattinen
On 6/29/12 8:22 PM, Joe Blanchard wrote: > Seems that they are unreachable at the moment. Called and theres a recorded > message stating they are aware of an issue, no details. > I didn't see anyone post this yet, so here's Amazon's summary of events: http://aws.amazon.com/message/67457/

Re: FYI Netflix is down

2012-07-03 Thread Jon Lewis
On Mon, 2 Jul 2012, david raistrick wrote: On Mon, 2 Jul 2012, James Downs wrote: back-plane / control-plane was unable to cope with the requests. Netflix uses Amazon's ELB to balance the traffic and no back-plane meant they were unable to reconfigure it to route around the problem. Someon

Re: FYI Netflix is down

2012-07-03 Thread Jon Lewis
On Mon, 2 Jul 2012, Greg D. Moore wrote: As for pulling the plug to test stuff. I recall a demo at Netapps in the early 00's. They were talking about their fault tolerance and how great it was. So I walked up to their demo array and said, "So, it shouldn't be a problem if I pulled this drive

Re: FYI Netflix is down

2012-07-03 Thread david raistrick
On Tue, 3 Jul 2012, Rodrick Brown wrote: face when implementing BCP today. I doubt Amazon gave much thought to multiple site outages and clients not being able to dynamically redeploy their engines because of inaccessibility from ELB. Considering there's a grand total of -one- tool in the ent

Re: FYI Netflix is down

2012-07-03 Thread Rodrick Brown
On Jul 3, 2012, at 10:58 AM, Ryan Malayter wrote: > James Downs wrote: >> For Netflix (and all other similar >> services) downtime is money and money is downtime. There is a >> quantifiable cost for customer acquisition and a quantifiable churn >> during each minute of downtime. Mature organizati

Re: FYI Netflix is down

2012-07-03 Thread Rodrick Brown
On Jul 3, 2012, at 9:11 AM, "Dan Golding" wrote: >> -Original Message- >> From: James Downs [mailto:e...@egon.cc] >> >> >> On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: >> >>> People are acting as if Netflix is part of some critical service > they >> stream movies for Christ sake.

Re: FYI Netflix is down

2012-07-03 Thread James Downs
On Jul 3, 2012, at 6:11 AM, Dan Golding wrote: > Also, I don't think there is an acceptable level of downtime for water. > Neither do water utilities. I remember a certain conversation I had with a web-developer. We were talking about "zero downtime releases". He thought it was acceptable if t

RE: FYI Netflix is down

2012-07-03 Thread Ryan Malayter
James Downs wrote: > For Netflix (and all other similar > services) downtime is money and money is downtime. There is a > quantifiable cost for customer acquisition and a quantifiable churn > during each minute of downtime. Mature organizations actually calculate > and track this. The trick is to e

RE: FYI Netflix is down

2012-07-03 Thread Dan Golding
> -Original Message- > From: James Downs [mailto:e...@egon.cc] > > > On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: > > > People are acting as if Netflix is part of some critical service they > stream movies for Christ sake. Some acceptable level of loss is fine > for 99.99% of Netfli

Re: FYI Netflix is down

2012-07-02 Thread George Herbert
On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: > People are acting as if Netflix is part of some critical service they stream > movies for Christ sake. Some acceptable level of loss is fine for 99.99% of > Netflix's user base just like cable, electricity and running water I suffer a > f

Re: FYI Netflix is down

2012-07-02 Thread Hal Murray
George Herbert said: > I worked for a Sun clone vendor (Axil) for a while and took some of our > systems and storage to Comdex one year in the 90s. We had a RAID unit > (Mylex controller) we had just introduced. Beforehand, I made REALLY REALLY > SURE that the pull-the-disk and pull-the-redund

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: > People are acting as if Netflix is part of some critical service they stream > movies for Christ sake. Some acceptable level of loss is fine for 99.99% of > Netflix's user base just like cable, electricity and running water I suffer a > few h

Re: FYI Netflix is down

2012-07-02 Thread Rodrick Brown
On Jul 2, 2012, at 7:03 PM, James Downs wrote: > > On Jul 2, 2012, at 1:20 PM, david raistrick wrote: > >> Amazon resources are controlled (from a consumer viewpoint) by API - that >> API is also used by amazon's internal toolkits that support ELB (and RDS..). >> Those (http accessed) API

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 1:20 PM, david raistrick wrote: > Amazon resources are controlled (from a consumer viewpoint) by API - that API > is also used by amazon's internal toolkits that support ELB (and RDS..). > Those (http accessed) API interfaces were unavailable for a good portion of > the ou

Re: FYI Netflix is down

2012-07-02 Thread Steven Bellovin
On Jul 2, 2012, at 3:43 PM, Greg D. Moore wrote: > At 03:08 PM 7/2/2012, George Herbert wrote: > > If folks have not read it, I would suggest reading Normal Accidents by > Charles Perrow. Strong second to that suggestion. --Steve Bellovin, https://www.cs.columbia.edu/~smb

Re: FYI Netflix is down

2012-07-02 Thread Greg D. Moore
At 05:04 PM 7/2/2012, George Herbert wrote: On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore wrote: > At 03:08 PM 7/2/2012, George Herbert wrote: > > If folks have not read it, I would suggest reading Normal Accidents by > Charles Perrow. > > The "it can't happen" is almost guaranteed to happen. ;

Re: FYI Netflix is down

2012-07-02 Thread George Herbert
On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore wrote: > At 03:08 PM 7/2/2012, George Herbert wrote: > > If folks have not read it, I would suggest reading Normal Accidents by > Charles Perrow. > > The "it can't happen" is almost guaranteed to happen. ;-)  And when it does, > it'll often interact i

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
> -Original Message- > From: Greg D. Moore [mailto:moor...@greenms.com] > > > If folks have not read it, I would suggest reading Normal Accidents by > Charles Perrow. > Also, Human Error by James Reason.

Re: FYI Netflix is down

2012-07-02 Thread Brett Frankenberger
On Mon, Jul 02, 2012 at 09:09:09AM -0700, Leo Bicknell wrote: > In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood > wrote: > > from the perspective of people watching B-rate movies: this was a > > failure to implement and test a reliable system for streaming those > >

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, James Downs wrote: back-plane / control-plane was unable to cope with the requests. Netflix uses Amazon's ELB to balance the traffic and no back-plane meant they were unable to reconfigure it to route around the problem. Someone needs to define back-plane/control-plane i

Re: FYI Netflix is down

2012-07-02 Thread Greg D. Moore
At 03:08 PM 7/2/2012, George Herbert wrote: If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow. The "it can't happen" is almost guaranteed to happen. ;-) And when it does, it'll often interact in ways we can't predict or sometimes even understand. As for

Re: FYI Netflix is down

2012-07-02 Thread Joly MacFie
Good band name. > Chaos Gorilla -- --- Joly MacFie 218 565 9365 Skype:punkcast WWWhatsup NYC - http://wwwhatsup.com http://pinstand.com - http://punkcast.com VP (Admin) - ISOC-NY - http://isoc-ny.org

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
I believe in my dictionary Chaos Gorilla translates into "Time To Go Home", with a rough definition of "Everything just crapped out - The world is ending"; but then again I may have hat incorrect :-) -- Thank you, Robert Miller http://www.armoredpackets.com Twitter: @arch3angel On 7/2/12 2:

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
> -Original Message- > From: Leo Bicknell [mailto:bickn...@ufp.org] > > > I want to emphasize _and test_. [snip] > > I used to work with a guy who had a simple test for these things, and > if I was a VP at Amazon, Netflix, or any other large company I would do > the same. About once

Re: FYI Netflix is down

2012-07-02 Thread George Herbert
Late reply, but: On Sat, Jun 30, 2012 at 12:30 AM, Lynda wrote: >... > Second, and more important. I *was* a "computer science guy" in a past life, > and this is nonsense. You can have astonishingly large software projects > that just continue to run smoothly, day in, day out, and they don't hit

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 11:59 AM, Paul Graydon wrote: > back-plane / control-plane was unable to cope with the requests. Netflix > uses Amazon's ELB to balance the traffic and no back-plane meant they were > unable to reconfigure it to route around the problem. Someone needs to define back-plane/c

Re: FYI Netflix is down

2012-07-02 Thread Paul Graydon
On 07/02/2012 08:53 AM, Tony McCrory wrote: On 2 July 2012 19:20, Cameron Byrne wrote: Make your chaos animal go after sites and regions instead of individual VMs. CB From a previous post mortem http://techblog.netflix.com/2011_04_01_archive.html " Create More Failures Currently, Netflix

Re: FYI Netflix is down

2012-07-02 Thread Tony McCrory
On 2 July 2012 19:20, Cameron Byrne wrote: > > Make your chaos animal go after sites and regions instead of individual > VMs. > > CB > >From a previous post mortem http://techblog.netflix.com/2011_04_01_archive.html " Create More Failures Currently, Netflix uses a service called "Chaos Monkey

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 9:23 AM, david raistrick wrote: > When the hardware is outsourced how would you propose testing the > non-software components? They do simulate availability zone issues (and AZ > is as close as you get to controlling which internal power/network/etc grid > you're attached t

Re: FYI Netflix is down

2012-07-02 Thread Cameron Byrne
On Jul 2, 2012 10:53 AM, "Leo Bicknell" wrote: > > In a message written on Mon, Jul 02, 2012 at 12:23:57PM -0400, david raistrick wrote: > > When the hardware is outsourced how would you propose testing the > > non-software components? They do simulate availability zone issues (and > > AZ is as c

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 12:23:57PM -0400, david raistrick wrote: > When the hardware is outsourced how would you propose testing the > non-software components? They do simulate availability zone issues (and > AZ is as close as you get to controlling which internal power/net

Re: FYI Netflix is down

2012-07-02 Thread Grant Ridder
The problem is large scale tests take a lot of time and planning. For it to be done right, you really need a dedicated DR team. -Grant On Mon, Jul 2, 2012 at 11:31 AM, AP NANOG wrote: > This is an excellent example of how tests "should" be ran, unfortunately > far too many places don't do this

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
This is an excellent example of how tests "should" be ran, unfortunately far too many places don't do this... -- Thank you, Robert Miller http://www.armoredpackets.com Twitter: @arch3angel On 7/2/12 12:09 PM, Leo Bicknell wrote: In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400,

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, Leo Bicknell wrote: http://techblog.netflix.com/2011/07/netflix-simian-army.html Yes, Netflix seems to get it, and I think their Simian Army is a great Q&A tool. However, it is not a complete testing system, I have never seen them talk about testing non-software components

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 12:13:22PM -0400, david raistrick wrote: > you mean like this? > > http://techblog.netflix.com/2011/07/netflix-simian-army.html Yes, Netflix seems to get it, and I think their Simian Army is a great Q&A tool. However, it is not a complete testing sys

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, Leo Bicknell wrote: I used to work with a guy who had a simple test for these things, and if I was a VP at Amazon, Netflix, or any other large company I would do the same. About once a month he would walk out on the you mean like this? http://techblog.netflix.com/2011/07/

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood wrote: > from the perspective of people watching B-rate movies: this was a > failure to implement and test a reliable system for streaming those > movies in the face of a power outage at one facility. I want to emphasi

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
While I was working for a wireless telecom company our primary datacenter was knocked off the power grid due to weather, the generators kicked on and everything was fine, till one generator was struck by lighting and that same strike fried the control panel on the second one. Considering the s

Re: FYI Netflix is down

2012-07-02 Thread Todd Underwood
> Actually, it was a very complex power outage. I'm going to assume that what > happened this weekend was similar to the event that happened at the same > facility approximately two weeks ago (its immaterial - the details are > probably different, but it illustrates the complexity of a data cent

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
> -Original Message- > From: Todd Underwood [mailto:toddun...@gmail.com] > > scott, > > >> > >> This was not a cascading failure.  It was a simple power outage Actually, it was a very complex power outage. I'm going to assume that what happened this weekend was similar to the event that

Re: FYI Netflix is down

2012-07-01 Thread steve pirk [egrep]
On Sun, Jul 1, 2012 at 11:38 AM, Jay Ashworth wrote: > Not entirely. Datacenters do go down, our best efforts to the contrary > notwithstanding. Amazon doesn't guarantee you redundancy on EC2, only > the tools to provide it yourself. 25% Amazon; 75% service provider > clients; > that's my appr

Re: FYI Netflix is down

2012-07-01 Thread Jay Ashworth
- Original Message - > From: "Tyler Haske" > How to run a datacenter 101. Have more then one location, preferably > far apart. It being Amazon I would expect more. :/ Not entirely. Datacenters do go down, our best efforts to the contrary notwithstanding. Amazon doesn't guarantee you r

It's the end of the world, as we know it (Was: FYI Netflix is down)

2012-07-01 Thread Jay Ashworth
- Original Message - > From: "jamie rishaw" > you know what's happening even more? > > ..Amazon not learning their lesson. > Please stop these crappy practices, people. Do real world DR testing. > Play "What If This City Dropped Off The Map" games, because tonight, > parts of VA infact

Re: [FoRK] FYI Netflix is down

2012-07-01 Thread Aaron Burt
On Sat, Jun 30, 2012 at 03:15:07AM -0400, Andrew D Kirch wrote: > On 6/30/2012 3:11 AM, Tyler Haske wrote: > >How to run a datacenter 101. Have more then one location, preferably > >far apart. It being Amazon I would expect more. :/ Amazon has many datacenters and tries to make it easy to diversif

Re: FYI Netflix is down

2012-06-30 Thread Brett Frankenberger
On Sat, Jun 30, 2012 at 01:19:54PM -0700, Scott Howard wrote: > On Sat, Jun 30, 2012 at 12:04 PM, Todd Underwood wrote: > > > This was not a cascading failure. It was a simple power outage > > > > Cascading failures involve interdependencies among components. > > > > Not always. Cascading failu

Re: FYI Netflix is down

2012-06-30 Thread Mike Devlin
On Sat, Jun 30, 2012 at 5:04 PM, Bryan Horstmann-Allen < b...@mirrorshades.net> wrote: > > Have a look at Asgard, the AWS management tool they just open sourced. It > implies they rely very heavily on many AWS features, some of which are very > much region specific. > > As to their multi-region ca

Re: FYI Netflix is down

2012-06-30 Thread Bryan Horstmann-Allen
+-- | On 2012-06-30 16:55:53, Mike Devlin wrote: | | But in netflix case, if they architected their environment the way they | said they did, why wouldnt they just fail over to us-west? especially at | their scale, I would

Re: FYI Netflix is down

2012-06-30 Thread Mike Devlin
On Sat, Jun 30, 2012 at 4:45 PM, Bryan Horstmann-Allen < b...@mirrorshades.net> wrote: > Explain Netflix and Heroku last night. Both of whom architect across > multiple > AZs and have for many years. > > The API and EBS across the region were also affected. ELB was _also_ > affected > across the r

Re: FYI Netflix is down

2012-06-30 Thread Bryan Horstmann-Allen
+-- | On 2012-06-30 16:08:40, Rayson Ho wrote: | | If I recall correctly, availability zone (AZ) mappings are specific to | an AWS account, and in fact there is no way to know if you are running | in the same AZ as another

Re: FYI Netflix is down

2012-06-30 Thread Todd Underwood
scott, >> >> This was not a cascading failure.  It was a simple power outage >> >> Cascading failures involve interdependencies among components. > > > Not always.  Cascading failures can also occur when there is zero dependency > between components.  The simplest form of this is where one environ

Re: FYI Netflix is down

2012-06-30 Thread Scott Howard
On Sat, Jun 30, 2012 at 12:04 PM, Todd Underwood wrote: > This was not a cascading failure. It was a simple power outage > > Cascading failures involve interdependencies among components. > Not always. Cascading failures can also occur when there is zero dependency between components. The simp

Re: FYI Netflix is down

2012-06-30 Thread Jared Mauch
The interesting thing to me is the us population by time zone. If amazon has 70% of servers in the eastern time zone it makes some sense. Mountain + pacific is smaller than central, which is a bit more than half eastern. These stats are older but a good rough gauge: http://answers.google.com/a

Re: FYI Netflix is down

2012-06-30 Thread Randy Bush
> Sorry to be the monday morning quarterback, but the sites that went > down learned a valuable lesson in single point of failure analysis. as this has happened more than once before, i am less optimistic. or maybe they decided the spof risk was not worth the avoidance costs. randy

Re: FYI Netflix is down

2012-06-30 Thread Rayson Ho
m] >> Sent: Friday, June 29, 2012 8:42 PM >> To: Jason Baugher >> Cc: nanog@nanog.org >> Subject: Re: FYI Netflix is down >> >> From Amazon >> >> Amazon Elastic Compute Cloud (N. Virginia)  (http://status.aws.amazon.com/ >> ) >> 8:21 PM PDT We

Re: FYI Netflix is down

2012-06-30 Thread Seth Mattinen
On 6/30/12 12:04 PM, Todd Underwood wrote: > This was not a cascading failure. It was a simple power outage > > Cascading failures involve interdependencies among components. > I guess I'm assuming there were UPS and generator systems involved (and failing) with powering the critical load, but

Re: FYI Netflix is down

2012-06-30 Thread Mike Devlin
The last 2 Amazon outages were power issues isolated to just there us-east Virginia data center. I read somewhere that Amazon has something like 70% of their ec2 resources in Virginia and its also their oldest ec2 datacenter..so I am guessing they learned a lot of lessons and are stuck with an aged

Re: FYI Netflix is down

2012-06-30 Thread Jimmy Hess
On 6/30/12, Todd Underwood wrote: > This was not a cascading failure. It was a simple power outage > Cascading failures involve interdependencies among components. Actually, you can't really say that. It's true that it was a simple power outage for Amazon. Power failed, causing the AWS service

Re: FYI Netflix is down

2012-06-30 Thread Todd Underwood
This was not a cascading failure. It was a simple power outage Cascading failures involve interdependencies among components. T On Jun 30, 2012 2:21 PM, "Seth Mattinen" wrote: > On 6/30/12 9:25 AM, Todd Underwood wrote: > > > > On Jun 30, 2012 11:23 AM, "Seth Mattinen" >

Re: FYI Netflix is down

2012-06-30 Thread Seth Mattinen
On 6/30/12 9:25 AM, Todd Underwood wrote: > > On Jun 30, 2012 11:23 AM, "Seth Mattinen" > wrote: >> >> >> But haven't they all been cascading failures? > > No. They have not. That's not what that term means. > > 'Cascading failure' has a fairly specific meaning that

Re: FYI Netflix is down

2012-06-30 Thread Jimmy Hess
On 6/30/12, Todd Underwood wrote: > On Jun 30, 2012 11:23 AM, "Seth Mattinen" wrote: >> But haven't they all been cascading failures? > No. They have not. That's not what that term means. > > 'Cascading failure' has a fairly specific meaning that doesn't imply > resilience in the face of decomp

Re: FYI Netflix is down

2012-06-30 Thread Todd Underwood
On Jun 30, 2012 11:23 AM, "Seth Mattinen" wrote: > > > But haven't they all been cascading failures? No. They have not. That's not what that term means. 'Cascading failure' has a fairly specific meaning that doesn't imply resilience in the face of decomposition into smaller parts. Cascading f

Re: FYI Netflix is down

2012-06-30 Thread Roy
On 6/30/2012 12:11 AM, Tyler Haske wrote: I am not a computer science guy but been around a long time. Data centers and clouds are like software. Once they reach a certain size, its impossible to keep the bugs out. You can test and test your heart out and something will slip by. You can say t

Re: FYI Netflix is down

2012-06-30 Thread Seth Mattinen
On 6/30/12 4:50 AM, Justin M. Streiner wrote: > On Sat, 30 Jun 2012, jamie rishaw wrote: > >> you know what's happening even more? >> >> ..Amazon not learning their lesson. > > I was not giving anyone a free pass or attempting to shrug off the > outage. I was just stating that there are many reas

Re: FYI Netflix is down

2012-06-30 Thread Jimmy Hess
On 6/30/12, Cameron Byrne wrote: > On Jun 30, 2012 12:25 AM, "joel jaeggli" wrote: >> On 6/30/12 12:11 AM, Tyler Haske wrote: > Geo-redundancy is key. In fact, i would take distributed data centers over > RAID, UPS, or any other "fancy pants" © mechanisms any day. Geo-redundancy is more expensiv

Re: FYI Netflix is down

2012-06-30 Thread Cameron Byrne
On Jun 30, 2012 12:25 AM, "joel jaeggli" wrote: > > On 6/30/12 12:11 AM, Tyler Haske wrote: >>> >>> I am not a computer science guy but been around a long time. Data centers >>> and clouds are like software. Once they reach a certain size, its >>> impossible to keep the bugs out. You can test a

Re: FYI Netflix is down

2012-06-30 Thread Jimmy Hess
On 6/30/12, Grant Ridder wrote: > well one would think that they could at least get power redundancy right... It is very similar to suggesting redundancy within a site against building collapse. Reliable power redundancy is very hard and very expensive.Much harder and much more expensive th

Re: FYI Netflix is down

2012-06-30 Thread Justin M. Streiner
On Sat, 30 Jun 2012, jamie rishaw wrote: you know what's happening even more? ..Amazon not learning their lesson. I was not giving anyone a free pass or attempting to shrug off the outage. I was just stating that there are many reasons why things break. I haven't seen anything official on

Re: FYI Netflix is down

2012-06-30 Thread Lynda
On 6/30/2012 12:11 AM, Tyler Haske wrote: > On 6/29/2012 11:07 PM, Roy wrote: I am not a computer science guy but been around a long time. Data centers and clouds are like software. Once they reach a certain size, its impossible to keep the bugs out. You can test and test your heart out and so

Re: FYI Netflix is down

2012-06-30 Thread joel jaeggli
On 6/30/12 12:11 AM, Tyler Haske wrote: I am not a computer science guy but been around a long time. Data centers and clouds are like software. Once they reach a certain size, its impossible to keep the bugs out. You can test and test your heart out and something will slip by. You can say the

Re: FYI Netflix is down

2012-06-30 Thread Andrew D Kirch
On 6/30/2012 3:11 AM, Tyler Haske wrote: How to run a datacenter 101. Have more then one location, preferably far apart. It being Amazon I would expect more. :/ Based on? Clouds are nothing more than outsourced responsibility. My business has stopped while my IT department explains to me tha

Re: FYI Netflix is down

2012-06-30 Thread Tyler Haske
> I am not a computer science guy but been around a long time.  Data centers > and clouds are like software.  Once they reach a certain size, its > impossible to keep the bugs out.  You can test and test your heart out and > something will slip by.  You can say the same thing about nuclear reactors

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
well one would think that they could at least get power redundancy right... On Sat, Jun 30, 2012 at 1:07 AM, Roy wrote: > On 6/29/2012 10:38 PM, jamie rishaw wrote: > >> you know what's happening even more? >> >> ..Amazon not learning their lesson. >> >> they just had an outage quite similar.. t

Re: FYI Netflix is down

2012-06-29 Thread Roy
On 6/29/2012 10:38 PM, jamie rishaw wrote: you know what's happening even more? ..Amazon not learning their lesson. they just had an outage quite similar.. they "performed a full audit" on electrical systems worldwide, according to the rfo/post mortem. looks like they need to perform a "full a

Re: FYI Netflix is down

2012-06-29 Thread Bjorn Leffler
On Sat, Jun 30, 2012 at 3:38 PM, jamie rishaw wrote: > ... > Down: Instagram, Pinterest, Netflix, Heroku, Woot. Pocket(Read It Later), > and on and on.  A bunch of openID sites.  A bunch of DNS sites (think > zoneedit et al).  Infact, probably nearly a /12 if not more of space.. > ... Zoneedit do

Re: FYI Netflix is down

2012-06-29 Thread jamie rishaw
lability Zone have lost power due to electrical storms in the area. >>>> We >>>> are actively working to restore power. >>>> >>>> -Original Message- >>>> From: Grant Ridder [mailto:shortdudey123@gmail.com< >>>> shortd

Re: FYI Netflix is down

2012-06-29 Thread Justin M. Streiner
Cc: nanog@nanog.org Subject: Re: FYI Netflix is down From Amazon Amazon Elastic Compute Cloud (N. Virginia) ( http://status.aws.amazon.com/**) 8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region. 8:31 PM PDT We are investigating elevated errors rate

Re: FYI Netflix is down

2012-06-29 Thread Seth Mattinen
On 6/29/12 8:22 PM, Joe Blanchard wrote: > Seems that they are unreachable at the moment. Called and theres a recorded > message stating they are aware of an issue, no details. > Streaming services and web; just tried my Roku and it failed to connect. ~Seth

Re: FYI Netflix is down

2012-06-29 Thread William Herrin
On Fri, Jun 29, 2012 at 11:42 PM, Grant Ridder wrote: > From Amazon > > Amazon Elastic Compute Cloud (N. Virginia)  (http://status.aws.amazon.com/) > 8:21 PM PDT We are investigating connectivity issues for a number of > instances in the US-EAST-1 Region. > 8:31 PM PDT We are investigating elevate

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
8:49 PM PDT Power has been restored to the impacted Availability Zone and we are working to bring impacted instances and volumes back online On Fri, Jun 29, 2012 at 10:52 PM, Grant Ridder wrote: > They may use it for content, but reddit.com resolves to IPs own by quest > > > On Fri, Jun 29, 2012

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
They may use it for content, but reddit.com resolves to IPs own by quest On Fri, Jun 29, 2012 at 10:51 PM, Seth Mattinen wrote: > On 6/29/12 8:47 PM, Mike Lyon wrote: > > Whatever happened to UPSs and generators? > > > > You don't need them with The Cloud! > > But seriously, this is something li

Re: FYI Netflix is down

2012-06-29 Thread Seth Mattinen
On 6/29/12 8:47 PM, Mike Lyon wrote: > Whatever happened to UPSs and generators? > You don't need them with The Cloud! But seriously, this is something like the third or fourth time AWS fell over flat in recent memory. ~Seth

Re: FYI Netflix is down

2012-06-29 Thread Derek Ivey
lost power due to electrical storms in the area. We are actively working to restore power. -Original Message- From: Grant Ridder [mailto:shortdudey123@gmail.**com ] Sent: Friday, June 29, 2012 8:42 PM To: Jason Baugher Cc: nanog@nanog.org Subject: Re: FYI Netflix is down >From Amaz

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
Yes, although, when you launch an instance, you do have the option of selecting a zone if you want. However, once the instance is started it stays in that zone and does not switch. On Fri, Jun 29, 2012 at 10:47 PM, Ian Wilson wrote: > On Fri, Jun 29, 2012 at 11:44 PM, Grant Ridder > wrote: > >

Re: FYI Netflix is down

2012-06-29 Thread Ian Wilson
On Fri, Jun 29, 2012 at 11:44 PM, Grant Ridder wrote: > I have an instance in zone C and it is up and fine, so it must be A, B, or > D that is down. It is my understanding that instance zones are randomized between customers -- so your zone C may be my zone A. Ian -- Ian Wilson ian.m.wil...@gma

  1   2   >