Re: facebook outage

2021-10-04 Thread John Lee
I was seeing NXDOMAIN errors, so I wonder if they had a DNS outage of some sort?? On Mon, Oct 4, 2021 at 5:14 PM Bill Woodcock wrote: > They’re starting to pick themselves back up off the floor in the last two > or three minutes. A few answers getting out. I imagine it’ll take a while > before

Re: massive facebook outage presently

2021-10-04 Thread Doug McIntyre
On Mon, Oct 04, 2021 at 05:50:07PM -0400, b...@theworld.com wrote: > One might think in over six hours they could point facebook.com's DNS > somewhere else and put up a page with some info about the outage > there, that this would be a practiced firedrill. Perhaps, if they didn't decide to be thei

Re: facebook outage

2021-10-04 Thread Bill Woodcock
Ok, I lied, I’m still awake. I got my first successful Facebook main page load at 23:13 UTC, for an overall duration of 8:33, or 513 minutes. Multiplied by three billion users, that’s 1.54 trillion person-minutes. That’s a tera-lapse! Have we had one of those before?

Facebook post-mortems...

2021-10-04 Thread jcurran
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr Also, Cloudflare’s take on the outage - htt

Re: Facebook post-mortems...

2021-10-04 Thread Rubens Kuhl
The FB one seems to be from a previous event. Downtime doesn't match, visible flaw effects don't either. Rubens On Mon, Oct 4, 2021 at 9:59 PM wrote: > > Fairly abstract - Facebook Engineering - > https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2

update - Re: Facebook post-mortems...

2021-10-04 Thread jcurran
On 4 Oct 2021, at 8:58 PM, jcur...@istaff.org wrote: > > Fairly abstract - Facebook Engineering - > https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr > >

Re: Facebook post-mortems...

2021-10-04 Thread Michael Thomas
On 10/4/21 5:58 PM, jcur...@istaff.org wrote: Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr

Re: Facebook post-mortems...

2021-10-04 Thread Jay Hennigan
On 10/4/21 17:58, jcur...@istaff.org wrote: Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr I b

Re: update - Re: Facebook post-mortems...

2021-10-04 Thread Michael Thomas
On 10/4/21 6:07 PM, jcur...@istaff.org wrote: On 4 Oct 2021, at 8:58 PM, jcur...@istaff.org wrote: Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr

Re: update - Re: Facebook post-mortems...

2021-10-04 Thread Rabbi Rob Thomas
>> Fairly abstract - Facebook Engineering - >> https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr >> > > My bad - might be best to ignore

Re: Facebook post-mortems...

2021-10-04 Thread Mel Beckman
The CF post mortem looks sensible, and a good summary of what we all saw from the outside with BGP routes being withdrawn. Given the fragility of BGP, this could still end up being a malicious attack. -mel via cell > On Oct 4, 2021, at 6:19 PM, Jay Hennigan wrote: > > On 10/4/21 17:58, jcu

Re: Facebook post-mortems...

2021-10-04 Thread Patrick W. Gilmore
Update about the October 4th outage https://engineering.fb.com/2021/10/04/networking-traffic/outage/ -- TTFN, patrick > On Oct 4, 2021, at 9:25 PM, Mel Beckman wrote: > > The CF post mortem looks sensible, and a good summary of what we all saw from > the outside with BGP routes being withdra

Re: IRR for IX peers

2021-10-04 Thread Mark Tinka
On 10/4/21 21:55, Nick Hilliard wrote:  Nearly 30 years on, this is still the state of the art. Not an unlike an NMS... still can't walk into a shop and just buy one that works out of the box :-). Mark.

Re: massive facebook outage presently

2021-10-04 Thread Mark Tinka
On 10/4/21 22:23, Baldur Norddahl wrote: Not in such a primitive fashion no. But they could definitely have a secondary network that will continue to work even if something goes wrong with the primary. On IPv6, no less :-). On a serious note, I can't even imagine what it takes to run a ne

Re: massive facebook outage presently

2021-10-04 Thread Mark Tinka
On 10/4/21 22:27, Łukasz Bromirski wrote: I bet FB tested the change on smaller scale and everything was fine, and only then started to roll this over wider network and at that point „something” broke. Or some bug needed a moment to start cascading issues around the infra. This is the a

Re: massive facebook outage presently

2021-10-04 Thread Mark Tinka
On 10/4/21 22:33, Eric Kuhnke wrote: I am starting to see reports that in ISPs with very large numbers of residential users, customers are starting to press the factory-reset buttons on their home routers/modems/whatever, in an attempt to make Facebook work. This is resulting in much heavie

Re: massive facebook outage presently

2021-10-04 Thread Mark Tinka
On 10/4/21 20:48, Luke Guillory wrote: I believe the original change was 'automatic' (as in configuration done via a web interface). However, now that connection to the outside world is down, remote access to those tools don't exist anymore, so the emergency procedure is to gain physical acc

Re: massive facebook outage presently

2021-10-04 Thread Karl Auer
On Tue, 2021-10-05 at 06:31 +0200, Mark Tinka wrote: > Q: What is automation? > A: Breaking the network at scale. P J Plauger (I think) once defined a computer as a mechanism allowing the deletion of vast quantities of irreplaceable data using simple mnemonic commands. He defined a network as a me

Re: Facebook post-mortems...

2021-10-04 Thread Hank Nussbacher
On 05/10/2021 05:53, Patrick W. Gilmore wrote: Update about the October 4th outage https://engineering.fb.com/2021/10/04/networking-traffic/outage/ Thanks for the posting. How come they couldn't access their routers via their OOB access? -Hank

Re: Facebook post-mortems...

2021-10-04 Thread William Herrin
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas wrote: > They have a monkey patch subsystem. Lol. Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code y

Re: Facebook post-mortems...

2021-10-04 Thread Jeff Tantsura
129.134.30.0/23, 129.134.30.0/24, 129.134.31.0/24. The specific routes covering all 4 nameservers (a-d) were withdrawn from all FB peering at approximately 15:40 UTC. Cheers, Jeff > On Oct 4, 2021, at 22:45, William Herrin wrote: > > On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas wrote: >> T

<    1   2