Hi Ross

We've recently switched our 5100s to 18.1R3-S5. 18.1 is stable with 
BGP/OSPF/LDP/RSVP/MPLS and LACP LAG in general. We don't use STP of any kind 
with the QFXs so I can't really help there.

I was hesitant to upgrade to 18.X since the 5100 was still the only QFX not to 
have and 18 version recommended on KB21476, but recently they updated the KB to 
include that model, so I'd say it's pretty safe now. They've pushed out S6 in 
July, if I'd have to re-do it now I'd use that one instead of S5.

The kind of problem you're describing sounds like what we've lived through with 
14.X and VCF when we first started using these. We'd commit a change and some 
random ports would stop passing traffic, we'd then have to delete port config 
and re provision for traffic to resume. Lots of weird stuff like that kept 
happening until we go fed up with the architecture and moved to routed MPLS 
with almost no layer2 switching.

Good luck.

-phil




-----Original Message-----
From: juniper-nsp <[email protected]> On Behalf Of Ross 
Halliday
Sent: August 12, 2019 9:20 AM
To: [email protected]
Subject: [j-nsp] Rock-solid JUNOS for QFX5100

Dear List,

I'm curious if anybody can recommend a JUNOS release for QFX5100 that is 
seriously stable. Right now we're on the previously-recommended version 
17.3R3-S1.5. Everything's been fine in testing, and suddenly out of the blue 
there will be weird issues when I make a change. I suspect maybe they are 
related to VSTP or LAG, or both.

1. Add a VLAN to a trunk port, all the access ports on that VLAN completely 
stopped moving packets. Disable/delete disable all of the broken interfaces 
restored function. This happened during the day. I opened a JTAC ticket and 
they'd never heard of an issue like this, of course we couldn't reproduce it. I 
no longer recall with confidence, but I think the trunk port may have been a 
one-member LAG (replacement of a downstream switch).

2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for some VLANs. 
I'm not sure if it was coincidence or always broken as I had recently began 
feeding new VSTP BPDUs (thus the root bridge changed) before I even looked at 
this. Other trunk ports did not exhibit the same issue. Completely deleted the 
LAG and rolled back to fix. This was on a fresh turnup and luckily wasn't in a 
topology that could form a loop.

Features I'm using include:

- BGP
- OSPF
- PIM
- VSTP
- LACP
- VRRP
- IGMPv2 and v3
- Routing-instance
- CoS for multicast
- CoS for unicast
- CoS classification by ingress filter
- IPv4-only
- ~7k routes in FIB (total of all tables)
- ~1k multicast groups


There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN, etc. 
These switches are L3 boxes that hand off IP to an MX core. Management is in 
the default instance/table, everything else is in a routing instance.

These boxes have us scared to touch them outside of a window as seemingly basic 
changes risk blowing the whole thing up. Is this a case where an ancient 
version might be a better choice or is this release a lemon? I recall that JTAC 
used to recommend two releases, one being for if you didn't require "new 
features". I find myself stuck between the adages of "If it ain't broke, don't 
fix it" and "Software doesn't age like wine". Given how poorly multicast seems 
to be understood by JTAC I'm very hesitant to upgrade to significantly newer 
releases.

If anybody can give advice or suggestions I would appreciate it immensely!

Thanks
Ross

_______________________________________________
juniper-nsp mailing list [email protected] 
https://puck.nether.net/mailman/listinfo/juniper-nsp
_______________________________________________
juniper-nsp mailing list [email protected]
https://puck.nether.net/mailman/listinfo/juniper-nsp

Reply via email to