On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote:
> People are acting as if Netflix is part of some critical service they stream
> movies for Christ sake. Some acceptable level of loss is fine for 99.99% of
> Netflix's user base just like cable, electricity and running water I suffer a
> f
George Herbert said:
> I worked for a Sun clone vendor (Axil) for a while and took some of our
> systems and storage to Comdex one year in the 90s. We had a RAID unit
> (Mylex controller) we had just introduced. Beforehand, I made REALLY REALLY
> SURE that the pull-the-disk and pull-the-redund
I've been so fortunate and appreciative over the years to have colleagues
(many whom I consider my close friends) cultivate my career by providing
sound advise that I will continue to pass on. In addition to those I've
known personally, I have gleaned a substantial amount of information
through ma
On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote:
> People are acting as if Netflix is part of some critical service they stream
> movies for Christ sake. Some acceptable level of loss is fine for 99.99% of
> Netflix's user base just like cable, electricity and running water I suffer a
> few h
Probably not as interesting as talking about Amazon/Netflix.
http://www.washingtonpost.com/local/after-storm-911-phone-service-remains-spotty/2012/07/02/gJQA33dHJW_story.html
Fairfax County's 911 emergency center operated at just half capacity
Monday as Verizon struggled to figure out why bot
On Jul 2, 2012, at 7:03 PM, James Downs wrote:
>
> On Jul 2, 2012, at 1:20 PM, david raistrick wrote:
>
>> Amazon resources are controlled (from a consumer viewpoint) by API - that
>> API is also used by amazon's internal toolkits that support ELB (and RDS..).
>> Those (http accessed) API
On Mon, Jul 2, 2012 at 8:46 PM, Jimmy Hess wrote:
>
> Someone should write a dastardly system clock daemon to cause the
> insertion of frequent spurious positive leap seconds, followed by the
>spurious insertion of negative leap seconds.
>
>
Chaos time bandit?
--
-
On 7/2/12, Steven Bellovin wrote:
> On Jul 2, 2012, at 11:47 AM, AP NANOG wrote:
>> Do you happen to know all the kernels and versions affected by this?
> See
> http://landslidecoding.blogspot.com/2012/07/linuxs-leap-second-deadlocks.html
> --Steve Bellovin, https://www.cs.columbia.e
On Jul 2, 2012, at 1:20 PM, david raistrick wrote:
> Amazon resources are controlled (from a consumer viewpoint) by API - that API
> is also used by amazon's internal toolkits that support ELB (and RDS..).
> Those (http accessed) API interfaces were unavailable for a good portion of
> the ou
On Jul 2, 2012, at 3:43 PM, Greg D. Moore wrote:
> At 03:08 PM 7/2/2012, George Herbert wrote:
>
> If folks have not read it, I would suggest reading Normal Accidents by
> Charles Perrow.
Strong second to that suggestion.
--Steve Bellovin, https://www.cs.columbia.edu/~smb
On Jul 2, 2012, at 11:47 AM, AP NANOG wrote:
> Do you happen to know all the kernels and versions affected by this?
>
>
See
http://landslidecoding.blogspot.com/2012/07/linuxs-leap-second-deadlocks.html
--Steve Bellovin, https://www.cs.columbia.edu/~smb
At 05:04 PM 7/2/2012, George Herbert wrote:
On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore wrote:
> At 03:08 PM 7/2/2012, George Herbert wrote:
>
> If folks have not read it, I would suggest reading Normal Accidents by
> Charles Perrow.
>
> The "it can't happen" is almost guaranteed to happen. ;
On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore wrote:
> At 03:08 PM 7/2/2012, George Herbert wrote:
>
> If folks have not read it, I would suggest reading Normal Accidents by
> Charles Perrow.
>
> The "it can't happen" is almost guaranteed to happen. ;-) And when it does,
> it'll often interact i
> -Original Message-
> From: Greg D. Moore [mailto:moor...@greenms.com]
>
>
> If folks have not read it, I would suggest reading Normal Accidents by
> Charles Perrow.
>
Also, Human Error by James Reason.
On Mon, Jul 02, 2012 at 09:09:09AM -0700, Leo Bicknell wrote:
> In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood
> wrote:
> > from the perspective of people watching B-rate movies: this was a
> > failure to implement and test a reliable system for streaming those
> >
On Mon, 2 Jul 2012, James Downs wrote:
back-plane / control-plane was unable to cope with the requests. Netflix uses
Amazon's ELB to balance the traffic and no back-plane meant they were unable to
reconfigure it to route around the problem.
Someone needs to define back-plane/control-plane i
At 03:08 PM 7/2/2012, George Herbert wrote:
If folks have not read it, I would suggest reading Normal Accidents
by Charles Perrow.
The "it can't happen" is almost guaranteed to happen. ;-) And when
it does, it'll often interact in ways we can't predict or sometimes
even understand.
As for
Good band name.
> Chaos Gorilla
--
---
Joly MacFie 218 565 9365 Skype:punkcast
WWWhatsup NYC - http://wwwhatsup.com
http://pinstand.com - http://punkcast.com
VP (Admin) - ISOC-NY - http://isoc-ny.org
I believe in my dictionary Chaos Gorilla translates into "Time To Go
Home", with a rough definition of "Everything just crapped out - The
world is ending"; but then again I may have hat incorrect :-)
--
Thank you,
Robert Miller
http://www.armoredpackets.com
Twitter: @arch3angel
On 7/2/12 2:
> -Original Message-
> From: Leo Bicknell [mailto:bickn...@ufp.org]
>
>
> I want to emphasize _and test_.
[snip]
>
> I used to work with a guy who had a simple test for these things, and
> if I was a VP at Amazon, Netflix, or any other large company I would
do
> the same. About once
Late reply, but:
On Sat, Jun 30, 2012 at 12:30 AM, Lynda wrote:
>...
> Second, and more important. I *was* a "computer science guy" in a past life,
> and this is nonsense. You can have astonishingly large software projects
> that just continue to run smoothly, day in, day out, and they don't hit
On Jul 2, 2012, at 11:59 AM, Paul Graydon wrote:
> back-plane / control-plane was unable to cope with the requests. Netflix
> uses Amazon's ELB to balance the traffic and no back-plane meant they were
> unable to reconfigure it to route around the problem.
Someone needs to define back-plane/c
On 07/02/2012 08:53 AM, Tony McCrory wrote:
On 2 July 2012 19:20, Cameron Byrne wrote:
Make your chaos animal go after sites and regions instead of individual
VMs.
CB
From a previous post mortem
http://techblog.netflix.com/2011_04_01_archive.html
"
Create More Failures
Currently, Netflix
On 2 July 2012 19:20, Cameron Byrne wrote:
>
> Make your chaos animal go after sites and regions instead of individual
> VMs.
>
> CB
>
>From a previous post mortem
http://techblog.netflix.com/2011_04_01_archive.html
"
Create More Failures
Currently, Netflix uses a service called "Chaos
Monkey
On Jul 2, 2012, at 9:23 AM, david raistrick wrote:
> When the hardware is outsourced how would you propose testing the
> non-software components? They do simulate availability zone issues (and AZ
> is as close as you get to controlling which internal power/network/etc grid
> you're attached t
On Jul 2, 2012 10:53 AM, "Leo Bicknell" wrote:
>
> In a message written on Mon, Jul 02, 2012 at 12:23:57PM -0400, david
raistrick wrote:
> > When the hardware is outsourced how would you propose testing the
> > non-software components? They do simulate availability zone issues (and
> > AZ is as c
In a message written on Mon, Jul 02, 2012 at 12:23:57PM -0400, david raistrick
wrote:
> When the hardware is outsourced how would you propose testing the
> non-software components? They do simulate availability zone issues (and
> AZ is as close as you get to controlling which internal power/net
Made the press..
http://www.washingtonpost.com/business/technology/leap-second-bug-takes-down-reddit-and-a-bunch-of-other-sites/2012/07/02/gJQAlXg1HW_story.html
--
---
Joly MacFie 218 565 9365 Skype:punkcast
WWWhatsup NYC - http://
The problem is large scale tests take a lot of time and planning. For it
to be done right, you really need a dedicated DR team.
-Grant
On Mon, Jul 2, 2012 at 11:31 AM, AP NANOG wrote:
> This is an excellent example of how tests "should" be ran, unfortunately
> far too many places don't do this
This is an excellent example of how tests "should" be ran, unfortunately
far too many places don't do this...
--
Thank you,
Robert Miller
http://www.armoredpackets.com
Twitter: @arch3angel
On 7/2/12 12:09 PM, Leo Bicknell wrote:
In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400,
On Mon, 2 Jul 2012, Leo Bicknell wrote:
http://techblog.netflix.com/2011/07/netflix-simian-army.html
Yes, Netflix seems to get it, and I think their Simian Army is a
great Q&A tool. However, it is not a complete testing system, I
have never seen them talk about testing non-software components
In a message written on Mon, Jul 02, 2012 at 12:13:22PM -0400, david raistrick
wrote:
> you mean like this?
>
> http://techblog.netflix.com/2011/07/netflix-simian-army.html
Yes, Netflix seems to get it, and I think their Simian Army is a
great Q&A tool. However, it is not a complete testing sys
On Mon, 2 Jul 2012, Leo Bicknell wrote:
I used to work with a guy who had a simple test for these things,
and if I was a VP at Amazon, Netflix, or any other large company I
would do the same. About once a month he would walk out on the
you mean like this?
http://techblog.netflix.com/2011/07/
On 07/02/2012 09:04 AM, Jay Ashworth wrote:
- Original Message -
From: "Alex Harrowell"
On 02/07/12 16:47, AP NANOG wrote:
Do you happen to know all the kernels and versions affected by this?
2.6.26 to 3.3 inclusive per news.ycombinator.com/item?id=4183122
Well, my 2.6.32 CentOS6/64
In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood
wrote:
> from the perspective of people watching B-rate movies: this was a
> failure to implement and test a reliable system for streaming those
> movies in the face of a power outage at one facility.
I want to emphasi
- Original Message -
> From: "Alex Harrowell"
> On 02/07/12 16:47, AP NANOG wrote:
> > Do you happen to know all the kernels and versions affected by this?
>
> 2.6.26 to 3.3 inclusive per news.ycombinator.com/item?id=4183122
Well, my 2.6.32 CentOS6/64 machine, which is not running Java,
On 02/07/12 16:47, AP NANOG wrote:
Do you happen to know all the kernels and versions affected by this?
2.6.26 to 3.3 inclusive per news.ycombinator.com/item?id=4183122
Do you happen to know all the kernels and versions affected by this?
--
Thank you,
Robert Miller
http://www.armoredpackets.com
Twitter: @arch3angel
On 7/1/12 12:44 PM, George Bonser wrote:
-Original Message-
From: Roy
Sent: Saturday, June 30, 2012 10:03 PM
To: nanog@nanog.org
Subje
While I was working for a wireless telecom company our primary
datacenter was knocked off the power grid due to weather, the generators
kicked on and everything was fine, till one generator was struck by
lighting and that same strike fried the control panel on the second
one. Considering the s
> Actually, it was a very complex power outage. I'm going to assume that what
> happened this weekend was similar to the event that happened at the same
> facility approximately two weeks ago (its immaterial - the details are
> probably different, but it illustrates the complexity of a data cent
> -Original Message-
> From: Todd Underwood [mailto:toddun...@gmail.com]
>
> scott,
>
> >>
> >> This was not a cascading failure. It was a simple power outage
Actually, it was a very complex power outage. I'm going to assume that what
happened this weekend was similar to the event that
Am 01.07.2012 um 21:01 schrieb James Bensley:
> [15.24 Mbit/s raw bit rate compared to 8.128 Mbit/s net] is quite a drop in
> speed and I'm trying to understand where this is happening.
...
> According to that extract, it all disappeared because of [Reed-Solomon]
> encoding, which is hugely vagu
Folks,
We will report back shortly with some updates.
Thanks for the mail.
John
=
John Jason Brzozowski
Comcast Cable
m) +1-609-377-6594
e) mailto:john_brzozow...@cable.comcast.com
o) +1-484-962-0060
w) http://www.comcast6.net
=
43 matches
Mail list logo