Data Center testing

2009-08-24 Thread Dan Snyder
Does any one know of any data centers that do failure testing of their
networking equipment
regularly? I mean to verify that everything fails over properly after
changes have been made over
time.  Is there any best practice guides for doing this?

Thanks,
Dan


RE: [SPAM-HEADER] - Data Center testing - Email has different SMTP TO: and MIME TO: fields in the email addresses

2009-08-24 Thread Rod Beck
"-Original Message-
From: Dan Snyder [mailto:sliple...@gmail.com]
Sent: Mon 8/24/2009 2:00 PM
To: NANOG list
Subject: [SPAM-HEADER] - Data Center testing - Email has different SMTP TO: and 
MIME TO: fields in the email addresses
 
Does any one know of any data centers that do failure testing of their
networking equipment
regularly? I mean to verify that everything fails over properly after
changes have been made over
time.  Is there any best practice guides for doing this?

Thanks,
Dan
Dan"

It is quite surprising how often data centres lose both primary and backup 
power. 



Re: Data Center testing

2009-08-24 Thread Ken Gilmour
I know Peer1 in vancouver reguarly send out notifications of
"non-impacting" generator load testing, like monthly. Also InterXion
in Dublin, Ireland have occasionally sent me notification that there
was a power outage of less than a minute however their backup
successfully took the load.

I only remember one complete outage in Peer1 a few years ago... Never
seen any outage in InterXion Dublin.

Also I don't ever remember any power failure at AiNet (Deepak will
probably elaborate)

2009/8/24 Dan Snyder :
> Does any one know of any data centers that do failure testing of their
> networking equipment
> regularly? I mean to verify that everything fails over properly after
> changes have been made over
> time.  Is there any best practice guides for doing this?
>
> Thanks,
> Dan
>



Re: Data Center testing

2009-08-24 Thread Dan Snyder
We have done power tests before and had no problem.  I guess I am looking
for someone who does testing of the network equipment outside of just power
tests.  We had an outage due to a configuration mistake that became apparent
when a switch failed.  It didn't cause a problem however when we did a power
test for the whole data center.

-Dan


On Mon, Aug 24, 2009 at 9:31 AM, Ken Gilmour  wrote:

> I know Peer1 in vancouver reguarly send out notifications of
> "non-impacting" generator load testing, like monthly. Also InterXion
> in Dublin, Ireland have occasionally sent me notification that there
> was a power outage of less than a minute however their backup
> successfully took the load.
>
> I only remember one complete outage in Peer1 a few years ago... Never
> seen any outage in InterXion Dublin.
>
> Also I don't ever remember any power failure at AiNet (Deepak will
> probably elaborate)
>
> 2009/8/24 Dan Snyder :
> > Does any one know of any data centers that do failure testing of their
> > networking equipment
> > regularly? I mean to verify that everything fails over properly after
> > changes have been made over
> > time.  Is there any best practice guides for doing this?
> >
> > Thanks,
> > Dan
> >
>


Best Effort QoS Drop Profile Input

2009-08-24 Thread Brad Fleming

Hello all,

We are working on fine tuning drop profiles for customer edge routers  
(Juniper J-2320 in almost all cases) and I was hoping for some input  
from those who are smarter and have done this before.


Basics:
- Sites each have a single T1 into a service provider L3 VPN
- Queue depth is 500ms
- Sites typically will use 90%+ of their line rate while school is in  
session (education network)
- Services offered include: real-time video (in a different queue) and  
best effort (serviced by the queue in question)

- Nearly all BE traffic will be TCP (typical web, email, etc traffic)

We were never able to find a good best practice for configuring drop  
profiles on edge devices. We've been using the following with OK  
results but I was hoping to have some external, more experienced eyes  
take a look...


drop-profiles {
be_any {
fill-level 50 drop-probability 5;
fill-level 70 drop-probability 20;
fill-level 90 drop-probability 50;
}

Has anyone ever run across a publicly documented best practice for  
drop profile configuration?

Does anyone have suggestions on ways to improve the configuration above?

Any input is much appreciated, thanks in advance!

-brad fleming


FCCs RFC for the Definition of Broadband

2009-08-24 Thread Luke Marrott
I read an article on DSL Reports the other day (
http://www.dslreports.com/shownews/FCC-Please-Define-Broadband-104056), in
which the FCC has a document requesting feedback on the definition of
Broadband.

What are your thoughts on what the definition of Broadband should be going
forward? I would assume this will be the standard definition for a number of
years to come.

Thanks.

-- 
:Luke Marrott


Re: Data Center testing

2009-08-24 Thread Jack Bates

Dan Snyder wrote:

We have done power tests before and had no problem.  I guess I am looking
for someone who does testing of the network equipment outside of just power
tests.  We had an outage due to a configuration mistake that became apparent
when a switch failed.  It didn't cause a problem however when we did a power
test for the whole data center.



The plus side of failure testing is that it can be controlled. The 
downside to failure testing is that you can induce a failure. 
Maintenance windows are cool, but some people really dislike failures of 
any type which limits how often you can test. I personally try for once 
a year. However, a lot can go wrong in a year.


Jack



RE: Alternatives to storm-control on Cat 6509.

2009-08-24 Thread Holmes,David A
In my opinion the Sup32 platform has some limitations when the
technology is considered for high data rate, deterministic carrier
customer-facing scenarios. Cisco sells the Sup32 as a wiring closet
aggregation switch the main purpose of which is to connect desktop users
to central core switches. In addition to the lack of
storm-control/broadcast suppression mentioned below, the 61XX line cards
also have a limit of, I believe, 2 ports in an Etherchannel.
Additionally, and perhaps most significantly for deterministic network
design, the copper cards share input hardware buffers for every 8 ports.
Running one port of the 8 at wire speed will cause input drops on the
other 7 ports. Also, the cards connect to the older 32 Gbps shared bus.

In my view, with high data rates, it is difficult, if not impossible, to
build a deterministic network with Sup32s. Cisco's solution for
designing a deterministic network is the sup720 which has a 720 Gbps
crossbar bus. The 67XX 48 port copper line cards have 2 20 Gbps
connectors to the 720 Gbps bus, the 24 port fiber sfp line cards have a
single 20 Gbps connector to the crossbar bus, and the 10 GiG 67XX line
cards have 2 20 Gbps bus connectors. The crossbar bus connects line card
connectors to each other in a point-to-point fashion. 67XX line cards
are adequately provisioned with input and output buffers. There is still
40/48 (1 GiGE copper), 20/24 (1 GiGE sfp), and 40/160 (10 GiGE X2)
oversubscription of ports to switch fabric connectors, however. Sup720
routing table lookups via Ternary Content Addressable Memory (TCAM)
still use the 32 Gbps bus to access the TCAM to search for next hop for
destination IP network.

-Original Message-
From: Peter George [mailto:peter.geo...@lumison.net] 
Sent: Friday, August 21, 2009 3:40 AM
To: nanog@nanog.org
Subject: Alternatives to storm-control on Cat 6509.

Hello,

I have several Catalyst 6500 (Supervisor 32) aggregation switches with
WS-X6148A-GE-TX and WS-X6148-GE-TX line cards.

These line cards do not support storm-control/broadcast suppression.
This impacted us badly during a recent spanning tree event.

As it stands, we are at risk of overwhelming control planes with excess
broadcast or multicast traffic, and I need to find alternative ways to
protect these switches.

I have been researching STP enhancements, and control-plane policing in
the following documents, and would appreciate advice from engineers who
may have had to implement similar workarounds for storm-control in a
service provider setting.

* Configuring Denial of Service Protection
http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/con
figuration/guide/dos.pdf

* Configuring Control Plane Policing
http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/31sga/con
figuration/guide/cntl_pln.pdf

* Configuring Optional STP Features
http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/31sga/con
figuration/guide/stp_enha.pdf

So, if we can't mitigate against STP events using storm-control or
broadcast suppression, what might be the best combination of STP
enhancements and control-plane policing?

For example, is it possible to rate-limit broadcast/multicast, STP and
ARP on a per VLAN basis? If so, how?

Many thanks,

P


--
Peter George
Lumison
t: 0845 1199 900
d: 0131 514 4022

P.S. Lumison have changed the way businesses communicate forever
http://www.unified-communications.net/




--

This email and any files transmitted with it are confidential and
intended
solely for the use of the individual or entity to whom they are
addressed.
If you have received this email in error please notify the sender. Any
offers or quotation of service are subject to formal specification.
Errors and omissions excepted. Please note that any views or opinions
presented in this email are solely those of the author and do not
necessarily represent those of Lumison.
Finally, the recipient should check this email and any attachments for
the
presence of viruses. Lumison accept no liability for any
damage caused by any virus transmitted by this email.



Re: Alternatives to storm-control on Cat 6509.

2009-08-24 Thread Roland Dobbins


On Aug 25, 2009, at 1:03 AM, Holmes,David A wrote:

In my opinion the Sup32 platform has some limitations when the  
technology is considered for high data rate, deterministic carrier  
customer-facing scenarios



The hardware-based FPM is interesting.

---
Roland Dobbins  // 

Sorry, sometimes I mistake your existential crises for technical
insights.

-- xkcd #625




Re: Best Effort QoS Drop Profile Input

2009-08-24 Thread Mikael Abrahamsson

On Mon, 24 Aug 2009, Brad Fleming wrote:

We were never able to find a good best practice for configuring drop 
profiles on edge devices. We've been using the following with OK results 
but I was hoping to have some external, more experienced eyes take a 
look...


drop-profiles {
  be_any {
  fill-level 50 drop-probability 5;
  fill-level 70 drop-probability 20;
  fill-level 90 drop-probability 50;
  }

Has anyone ever run across a publicly documented best practice for drop 
profile configuration? Does anyone have suggestions on ways to improve 
the configuration above?


Any input is much appreciated, thanks in advance!


I think it's fine, personally I've been known to configure core network 
WRED with 0% drop at 40ms and 100% drop at 60ms (reasoning is that 
basically no application will be well suited by having its packets delayed 
much more than 40ms), though at the lower speeds you have, your values are 
probably better. Hm, just realised your "fill level" is in percent, right? 
Then I'd stop dropping packets earler if I were you, having 0% drop up 
until 250ms is not really helping interactive applications, you probably 
want to induce drop earlier (at 40ms or so), so file transfers don't fill 
up the buffer but hopefully the fewer packets/s of an ssh session still 
has low probability of being dropped (and fast-retransmit can do it's job 
fairly unnoticable to the user).


So in your case, I'd have the first drop-probability much sooner, at 10% 
if you have 500ms buffering. Perhaps starting with a 1% at 10% and 3% at 
20% buffer fill rate.



--
Mikael Abrahamssonemail: swm...@swm.pp.se



new collaborative network forensics tool for massive pcap libraries

2009-08-24 Thread Thomas Maufer
I wanted to share with the NANOG community this likely interesting bit of
pcap wrangling technology that Mu announced yesterday. Here is the
announcement on the new network forensics application within
pcapr
:

Collaborative Network Forensics

Mu Dynamics ( http://www.mudynamics.com/ ) took the recently
published dataset by the *U.S. Army Information Technology & Operations
Center* ( ITOC  ) from the “2009 Inter-Service
Academy Cyber Defense
Competition”
as well as the *Schmoo Group’s* “Capture the Capture the
Flag”
( CCTF ) dataset (for a grand total of *15.0 GBytes…26.3 million packets*),
and indexed them all to enable contextual search and instant access to
packets, not to mention Hacker-News/Twitter-style one-liners attached to
packets and searches for a community-oriented collaborative forensics
application.

Check it out (read the blog, linked below, first):

- http://bit.ly/12I62D for the blog and
- http://www.pcapr.net/forensics for the online app

Enjoy!


A brief background on pcapr:

It’s a web-based pcap repository (hence, pcapr) that has some powerful pcap
manipulation capabilities. The pcaps on pcapr are fully decoded and editable
and you can manipulate them in novel ways: You can identify and isolate or
decode streams, remove garbage from the pcap (i.e., extraneous packets from
protocols that you aren’t interested in), reorder packets, save subset or
modified pcaps without destroying the original, etc. All this happens at
http://www.pcapr.net/, which is open to the public.



If you can access the web, you can access the pcapr database and upload your
own local pcaps for analysis. All registered users can upload up to 5 pcaps
into a scratch space that is private to them. There are currently
*250*protocols represented on pcapr across over 1500 pcaps, in
addition to the
forensics application with its 26.3 million packets. Finally, a free
denial-of-service traffic generator is available on pcapr; you can turn any
packet you find on pcapr into a DoS template.


All the best,
~tom

-- 
Thomas Maufer
Mu Dynamics, Inc.   Mu Line Blog: http://bit.ly/mu-line-blog
  * Dir., Tech. Mktg.   Mu Labs Blog: http://bit.ly/mu-labs-blog
  * Solutions ArchitectMu on twitter: http://bit.ly/mu-twitter
   Mu on YouTube: http://bit.ly/mu-youtube
  Mu on Facebook:
http://bit.ly/mu-on-facebook
Mu Community sign-up:
http://bit.ly/mu-community-signup
  Got packets? Use pcapr: http://bit.ly/pcapr
  Email to Thomas Maufer: mailto:
tmau...@mudynamics.com


ISP Security BOF -- NAOG 47

2009-08-24 Thread Warren Kumari

"The time has come," the Walrus said,
"To talk of many things", like how NANOG 47 is fast approaching and  
how I am *sure* that you would like to participate...


This is *your* chance to talk about interesting security related  
topics and provide some feedback on what you would (and would not)  
like to hear about...


Some security thing been buggin' you all year? Some topic that you  
feel strongly about and would like a change to inform others about?  
Step right up and give a talk -- this BOF is traditionally fairly laid  
back and easy going, so its a low stress introduction to presenting...


Slides are welcome, but not required...

W






RE: Data Center testing

2009-08-24 Thread Deepak Jain

Thanks for the kind words Ken.

Power failure testing and network testing are very different disciplines. 

We operate from the point of view that if a failure occurs because we have 
scheduled testing, it is far better since we have the resources on-site to 
address it (as opposed to an unplanned event during a hurricane). Not everyone 
has this philosophy. 

This is one of the reasons we do monthly or bimonthly, full live load transfer 
tests on power at every facility we own and control during the morning hours 
(~10:00am local time on a weekday, run on gensets for up to two hours). Of 
course there is sufficient staff and contingency planning on-site to handle 
almost anything that comes up. The goal is to have a measurable "good" outcome 
at our highest reasonable load levels [temperature, data load, etc].

We don't hesitate to show our customers and auditors our testing and 
maintenance logs, go over our procedures, etc. They can even watch events if 
they want (we provide the ear protection). I don't think any facility of any 
significant size can operate differently and do it well.

This is NOT advisable to folks who do not do proper preventative maintenance on 
their transfer bus ways, PDUs, switches, batteries, transformers and of course 
generators. The goal is to identify questionable relays, switches, breakers and 
other items that may fail in an actual emergency.

On the network side, during scheduled maintenance we do live failovers -- 
sometimes as dramatic as pulling the cable without preemptively removing 
traffic. Part of *our* procedures is to make sure it reroutes and heals the way 
it is supposed to before the work actually starts. Often network and topology 
changes happen over time and no one has had a chance to actually test all the 
"glue" works right. Regular planned maintenance (if you have a fast reroute 
capability in your network) is a very good way to handle it. 

For sensitive trunk links and non-invasive maintenance, it is nice to softly 
remove traffic via local pref or whatever in advance of the maintenance to 
minimize jitter during a major event. 

As part of your plan, be prepared for things like connectors (or cables) 
breaking and have a plan for what you do if that occurs. Have a plan or a 
rain-date if a connector takes a long time to get out or the blade it sits in 
gets damaged. This stuff looks pretty while its running and you don't want 
something that has been friction-frozen to ruin your window.

All of this works swimmingly until you find a vendor (X) bug. :) Not for the 
faint-of-heart. 

Anyone who has more specific questions, I'll be glad to answer off-line. 

Deepak Jain
AiNET

> I know Peer1 in vancouver reguarly send out notifications of
> "non-impacting" generator load testing, like monthly. Also InterXion
> in Dublin, Ireland have occasionally sent me notification that there
> was a power outage of less than a minute however their backup
> successfully took the load.
> 
> I only remember one complete outage in Peer1 a few years ago... Never
> seen any outage in InterXion Dublin.
> 
> Also I don't ever remember any power failure at AiNet (Deepak will
> probably elaborate)
> 
> 2009/8/24 Dan Snyder :
> > Does any one know of any data centers that do failure testing of
> their
> > networking equipment
> > regularly? I mean to verify that everything fails over properly after
> > changes have been made over
> > time.  Is there any best practice guides for doing this?
> >
> > Thanks,
> > Dan
> >




Re: Alternatives to storm-control on Cat 6509.

2009-08-24 Thread Nick Hilliard

On 24/08/2009 19:03, Holmes,David A wrote:

Additionally, and perhaps most significantly for deterministic network
design, the copper cards share input hardware buffers for every 8 ports.
Running one port of the 8 at wire speed will cause input drops on the
other 7 ports. Also, the cards connect to the older 32 Gbps shared bus.


IMO, a more serious problem with the 6148tx and 6548tx cards is the 
internal architecture, which is effectively six internal managed gigabit 
ethernet hubs (i.e. shared bus) with a 1M buffer per hub, and each hub 
connected with a single 1G uplink to a 32 gig backplane.  Ref:



http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00801751d7.shtml#ASIC


In Cisco's own words: "These line cards are oversubscription cards that are 
designed to extend gigabit to the desktop and might not be ideal for server 
farm connectivity".   In other words, these cards are fine in their place, 
but they are not designed or suitable for data centre usage.


I don't want to sound like I'm damning this card beyond redemption - it has 
a useful place in this world - but at the expense of reliability, 
manageability and configuration control, you will get useful features 
(including broadcast/unicast flood control) and in many situations very 
significantly better performance from a recent SRW 48-port linksys gig 
switch than from one of these cards.


Nick



[NANOG-announce] Reminder - NANOG 47 is coming quickly!

2009-08-24 Thread Tom Daly
Hello Everyone,
NANOG 47 is only 7 weeks away! The PC has been busily reviewing talks in 
preparation for the meeting. If you're interested in submitting a talk, please 
see:   

- http://nanog.org/meetings/nanog47/callforpresent.php.

Also, registration is open and the hotel is ready for booking, please see:

- https://nanog.merit.edu/registration.  
- http://nanog.org/meetings/nanog47/hotel.php

Finally, NANOG is always looking for great sponsors, and we always have room 
for more:

- http://nanog.org/sponsors

Looking forward to seeing everyone in Dearborn.

Thanks,
Tom Daly (for the PC)

___
NANOG-announce mailing list
nanog-annou...@nanog.org
http://mailman.nanog.org/mailman/listinfo/nanog-announce



email address and SMTP server

2009-08-24 Thread Juraj Benak
Good morning,

I'm having a problem with my SMTP server installed on W2k3 server.

Before we used SMTP of our ISP but because messages were delayed by few
hours on regular basis I decided to install SMTP server (Exchange server
installation is busted and will have to reinstall the server to get it
work). For now I have a problem with one address which is working from my
ISP's SMTP server but not from my server. I have setup my ISP's DNS servers
and there isn't problem with other recipients, only this one. I tried to
check the domain record for that email and it was OK. I checked via MX
record database what servers they are using and the servers were fine as
well.

I got this response:

Final-Recipient: rfc822;

Action: failed

Status: 4.4.7

 

Is there any way how to check it or can you tell me what could be the
problem? 

The domain is boomerangaustralia.com.

 

 

Thanks

 

Juraj



Re: Data Center testing

2009-08-24 Thread Seth Mattinen
Deepak Jain wrote:
> Thanks for the kind words Ken.
> 
> Power failure testing and network testing are very different disciplines. 
> 
> We operate from the point of view that if a failure occurs because we have 
> scheduled testing, it is far better since we have the resources on-site to 
> address it (as opposed to an unplanned event during a hurricane). Not 
> everyone has this philosophy. 
> 
> This is one of the reasons we do monthly or bimonthly, full live load 
> transfer tests on power at every facility we own and control during the 
> morning hours (~10:00am local time on a weekday, run on gensets for up to two 
> hours). Of course there is sufficient staff and contingency planning on-site 
> to handle almost anything that comes up. The goal is to have a measurable 
> "good" outcome at our highest reasonable load levels [temperature, data load, 
> etc].
> 

At least once a year I like to go out and kick the service entrance
breaker to give the whole enchilada an honest to $diety plugs out test.
As you said, not recommenced if you don't maintain stuff, but that's how
confident I feel that my system works.

~Seth



RE: Data Center testing

2009-08-24 Thread Deepak Jain
> At least once a year I like to go out and kick the service entrance
> breaker to give the whole enchilada an honest to $diety plugs out test.
> As you said, not recommenced if you don't maintain stuff, but that's
> how
> confident I feel that my system works.

Nature has a way of testing it, even if you don't. :)

For those who haven't seen this occur, make sure you have a plan in case your 
breaker doesn't flip back to the normal position, or your transfer switch stops 
switching (in either direction -- for example, it fuses itself into the 
"generator/emergency" position).

For small supplies (say <1MW) it's not as big a deal, but when the breakers in 
a bigger facility can weigh hundreds of pounds each and can take months to 
replace, these are real issues and will test your sparing, consistency and 
other disciplines.

Deepak Jain
AiNET