Re: FYI Netflix is down

2012-07-02 Thread George Herbert
On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: > People are acting as if Netflix is part of some critical service they stream > movies for Christ sake. Some acceptable level of loss is fine for 99.99% of > Netflix's user base just like cable, electricity and running water I suffer a > f

Re: FYI Netflix is down

2012-07-02 Thread Hal Murray
George Herbert said: > I worked for a Sun clone vendor (Axil) for a while and took some of our > systems and storage to Comdex one year in the 90s. We had a RAID unit > (Mylex controller) we had just introduced. Beforehand, I made REALLY REALLY > SURE that the pull-the-disk and pull-the-redund

Contributing to the community

2012-07-02 Thread Matt Chung
I've been so fortunate and appreciative over the years to have colleagues (many whom I consider my close friends) cultivate my career by providing sound advise that I will continue to pass on. In addition to those I've known personally, I have gleaned a substantial amount of information through ma

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: > People are acting as if Netflix is part of some critical service they stream > movies for Christ sake. Some acceptable level of loss is fine for 99.99% of > Netflix's user base just like cable, electricity and running water I suffer a > few h

Northern Virginia 9-1-1 service after storm

2012-07-02 Thread Sean Donelan
Probably not as interesting as talking about Amazon/Netflix. http://www.washingtonpost.com/local/after-storm-911-phone-service-remains-spotty/2012/07/02/gJQA33dHJW_story.html Fairfax County's 911 emergency center operated at just half capacity Monday as Verizon struggled to figure out why bot

Re: FYI Netflix is down

2012-07-02 Thread Rodrick Brown
On Jul 2, 2012, at 7:03 PM, James Downs wrote: > > On Jul 2, 2012, at 1:20 PM, david raistrick wrote: > >> Amazon resources are controlled (from a consumer viewpoint) by API - that >> API is also used by amazon's internal toolkits that support ELB (and RDS..). >> Those (http accessed) API

Re: F-ckin Leap Seconds, how do they work?

2012-07-02 Thread Joly MacFie
On Mon, Jul 2, 2012 at 8:46 PM, Jimmy Hess wrote: > > Someone should write a dastardly system clock daemon to cause the > insertion of frequent spurious positive leap seconds, followed by the >spurious insertion of negative leap seconds. > > Chaos time bandit? -- -

Re: F-ckin Leap Seconds, how do they work?

2012-07-02 Thread Jimmy Hess
On 7/2/12, Steven Bellovin wrote: > On Jul 2, 2012, at 11:47 AM, AP NANOG wrote: >> Do you happen to know all the kernels and versions affected by this? > See > http://landslidecoding.blogspot.com/2012/07/linuxs-leap-second-deadlocks.html > --Steve Bellovin, https://www.cs.columbia.e

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 1:20 PM, david raistrick wrote: > Amazon resources are controlled (from a consumer viewpoint) by API - that API > is also used by amazon's internal toolkits that support ELB (and RDS..). > Those (http accessed) API interfaces were unavailable for a good portion of > the ou

Re: FYI Netflix is down

2012-07-02 Thread Steven Bellovin
On Jul 2, 2012, at 3:43 PM, Greg D. Moore wrote: > At 03:08 PM 7/2/2012, George Herbert wrote: > > If folks have not read it, I would suggest reading Normal Accidents by > Charles Perrow. Strong second to that suggestion. --Steve Bellovin, https://www.cs.columbia.edu/~smb

Re: F-ckin Leap Seconds, how do they work?

2012-07-02 Thread Steven Bellovin
On Jul 2, 2012, at 11:47 AM, AP NANOG wrote: > Do you happen to know all the kernels and versions affected by this? > > See http://landslidecoding.blogspot.com/2012/07/linuxs-leap-second-deadlocks.html --Steve Bellovin, https://www.cs.columbia.edu/~smb

Re: FYI Netflix is down

2012-07-02 Thread Greg D. Moore
At 05:04 PM 7/2/2012, George Herbert wrote: On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore wrote: > At 03:08 PM 7/2/2012, George Herbert wrote: > > If folks have not read it, I would suggest reading Normal Accidents by > Charles Perrow. > > The "it can't happen" is almost guaranteed to happen. ;

Re: FYI Netflix is down

2012-07-02 Thread George Herbert
On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore wrote: > At 03:08 PM 7/2/2012, George Herbert wrote: > > If folks have not read it, I would suggest reading Normal Accidents by > Charles Perrow. > > The "it can't happen" is almost guaranteed to happen. ;-)  And when it does, > it'll often interact i

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
> -Original Message- > From: Greg D. Moore [mailto:moor...@greenms.com] > > > If folks have not read it, I would suggest reading Normal Accidents by > Charles Perrow. > Also, Human Error by James Reason.

Re: FYI Netflix is down

2012-07-02 Thread Brett Frankenberger
On Mon, Jul 02, 2012 at 09:09:09AM -0700, Leo Bicknell wrote: > In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood > wrote: > > from the perspective of people watching B-rate movies: this was a > > failure to implement and test a reliable system for streaming those > >

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, James Downs wrote: back-plane / control-plane was unable to cope with the requests. Netflix uses Amazon's ELB to balance the traffic and no back-plane meant they were unable to reconfigure it to route around the problem. Someone needs to define back-plane/control-plane i

Re: FYI Netflix is down

2012-07-02 Thread Greg D. Moore
At 03:08 PM 7/2/2012, George Herbert wrote: If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow. The "it can't happen" is almost guaranteed to happen. ;-) And when it does, it'll often interact in ways we can't predict or sometimes even understand. As for

Re: FYI Netflix is down

2012-07-02 Thread Joly MacFie
Good band name. > Chaos Gorilla -- --- Joly MacFie 218 565 9365 Skype:punkcast WWWhatsup NYC - http://wwwhatsup.com http://pinstand.com - http://punkcast.com VP (Admin) - ISOC-NY - http://isoc-ny.org

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
I believe in my dictionary Chaos Gorilla translates into "Time To Go Home", with a rough definition of "Everything just crapped out - The world is ending"; but then again I may have hat incorrect :-) -- Thank you, Robert Miller http://www.armoredpackets.com Twitter: @arch3angel On 7/2/12 2:

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
> -Original Message- > From: Leo Bicknell [mailto:bickn...@ufp.org] > > > I want to emphasize _and test_. [snip] > > I used to work with a guy who had a simple test for these things, and > if I was a VP at Amazon, Netflix, or any other large company I would do > the same. About once

Re: FYI Netflix is down

2012-07-02 Thread George Herbert
Late reply, but: On Sat, Jun 30, 2012 at 12:30 AM, Lynda wrote: >... > Second, and more important. I *was* a "computer science guy" in a past life, > and this is nonsense. You can have astonishingly large software projects > that just continue to run smoothly, day in, day out, and they don't hit

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 11:59 AM, Paul Graydon wrote: > back-plane / control-plane was unable to cope with the requests. Netflix > uses Amazon's ELB to balance the traffic and no back-plane meant they were > unable to reconfigure it to route around the problem. Someone needs to define back-plane/c

Re: FYI Netflix is down

2012-07-02 Thread Paul Graydon
On 07/02/2012 08:53 AM, Tony McCrory wrote: On 2 July 2012 19:20, Cameron Byrne wrote: Make your chaos animal go after sites and regions instead of individual VMs. CB From a previous post mortem http://techblog.netflix.com/2011_04_01_archive.html " Create More Failures Currently, Netflix

Re: FYI Netflix is down

2012-07-02 Thread Tony McCrory
On 2 July 2012 19:20, Cameron Byrne wrote: > > Make your chaos animal go after sites and regions instead of individual > VMs. > > CB > >From a previous post mortem http://techblog.netflix.com/2011_04_01_archive.html " Create More Failures Currently, Netflix uses a service called "Chaos Monkey

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 9:23 AM, david raistrick wrote: > When the hardware is outsourced how would you propose testing the > non-software components? They do simulate availability zone issues (and AZ > is as close as you get to controlling which internal power/network/etc grid > you're attached t

Re: FYI Netflix is down

2012-07-02 Thread Cameron Byrne
On Jul 2, 2012 10:53 AM, "Leo Bicknell" wrote: > > In a message written on Mon, Jul 02, 2012 at 12:23:57PM -0400, david raistrick wrote: > > When the hardware is outsourced how would you propose testing the > > non-software components? They do simulate availability zone issues (and > > AZ is as c

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 12:23:57PM -0400, david raistrick wrote: > When the hardware is outsourced how would you propose testing the > non-software components? They do simulate availability zone issues (and > AZ is as close as you get to controlling which internal power/net

Re: F-ckin Leap Seconds, how do they work?

2012-07-02 Thread Joly MacFie
Made the press.. http://www.washingtonpost.com/business/technology/leap-second-bug-takes-down-reddit-and-a-bunch-of-other-sites/2012/07/02/gJQAlXg1HW_story.html -- --- Joly MacFie 218 565 9365 Skype:punkcast WWWhatsup NYC - http://

Re: FYI Netflix is down

2012-07-02 Thread Grant Ridder
The problem is large scale tests take a lot of time and planning. For it to be done right, you really need a dedicated DR team. -Grant On Mon, Jul 2, 2012 at 11:31 AM, AP NANOG wrote: > This is an excellent example of how tests "should" be ran, unfortunately > far too many places don't do this

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
This is an excellent example of how tests "should" be ran, unfortunately far too many places don't do this... -- Thank you, Robert Miller http://www.armoredpackets.com Twitter: @arch3angel On 7/2/12 12:09 PM, Leo Bicknell wrote: In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400,

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, Leo Bicknell wrote: http://techblog.netflix.com/2011/07/netflix-simian-army.html Yes, Netflix seems to get it, and I think their Simian Army is a great Q&A tool. However, it is not a complete testing system, I have never seen them talk about testing non-software components

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 12:13:22PM -0400, david raistrick wrote: > you mean like this? > > http://techblog.netflix.com/2011/07/netflix-simian-army.html Yes, Netflix seems to get it, and I think their Simian Army is a great Q&A tool. However, it is not a complete testing sys

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, Leo Bicknell wrote: I used to work with a guy who had a simple test for these things, and if I was a VP at Amazon, Netflix, or any other large company I would do the same. About once a month he would walk out on the you mean like this? http://techblog.netflix.com/2011/07/

Re: F-ckin Leap Seconds, how do they work?

2012-07-02 Thread Michael Thomas
On 07/02/2012 09:04 AM, Jay Ashworth wrote: - Original Message - From: "Alex Harrowell" On 02/07/12 16:47, AP NANOG wrote: Do you happen to know all the kernels and versions affected by this? 2.6.26 to 3.3 inclusive per news.ycombinator.com/item?id=4183122 Well, my 2.6.32 CentOS6/64

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood wrote: > from the perspective of people watching B-rate movies: this was a > failure to implement and test a reliable system for streaming those > movies in the face of a power outage at one facility. I want to emphasi

Re: F-ckin Leap Seconds, how do they work?

2012-07-02 Thread Jay Ashworth
- Original Message - > From: "Alex Harrowell" > On 02/07/12 16:47, AP NANOG wrote: > > Do you happen to know all the kernels and versions affected by this? > > 2.6.26 to 3.3 inclusive per news.ycombinator.com/item?id=4183122 Well, my 2.6.32 CentOS6/64 machine, which is not running Java,

Re: F-ckin Leap Seconds, how do they work?

2012-07-02 Thread Alex Harrowell
On 02/07/12 16:47, AP NANOG wrote: Do you happen to know all the kernels and versions affected by this? 2.6.26 to 3.3 inclusive per news.ycombinator.com/item?id=4183122

Re: F-ckin Leap Seconds, how do they work?

2012-07-02 Thread AP NANOG
Do you happen to know all the kernels and versions affected by this? -- Thank you, Robert Miller http://www.armoredpackets.com Twitter: @arch3angel On 7/1/12 12:44 PM, George Bonser wrote: -Original Message- From: Roy Sent: Saturday, June 30, 2012 10:03 PM To: nanog@nanog.org Subje

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
While I was working for a wireless telecom company our primary datacenter was knocked off the power grid due to weather, the generators kicked on and everything was fine, till one generator was struck by lighting and that same strike fried the control panel on the second one. Considering the s

Re: FYI Netflix is down

2012-07-02 Thread Todd Underwood
> Actually, it was a very complex power outage. I'm going to assume that what > happened this weekend was similar to the event that happened at the same > facility approximately two weeks ago (its immaterial - the details are > probably different, but it illustrates the complexity of a data cent

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
> -Original Message- > From: Todd Underwood [mailto:toddun...@gmail.com] > > scott, > > >> > >> This was not a cascading failure.  It was a simple power outage Actually, it was a very complex power outage. I'm going to assume that what happened this weekend was similar to the event that

Re: How do the lowest layers of the DSL stack work?

2012-07-02 Thread Stefan Bethke
Am 01.07.2012 um 21:01 schrieb James Bensley: > [15.24 Mbit/s raw bit rate compared to 8.128 Mbit/s net] is quite a drop in > speed and I'm trying to understand where this is happening. ... > According to that extract, it all disappeared because of [Reed-Solomon] > encoding, which is hugely vagu

Re: NANOG Digest, Vol 54, Issue 3 (Comcast's IPv6 Information Site Unreachable)

2012-07-02 Thread Brzozowski, John
Folks, We will report back shortly with some updates. Thanks for the mail. John = John Jason Brzozowski Comcast Cable m) +1-609-377-6594 e) mailto:john_brzozow...@cable.comcast.com o) +1-484-962-0060 w) http://www.comcast6.net =