Re: [OSL | CCIE_SP] Carrier Supporting Carrier (CSC) - LDP peering between SPs drops every 3 minutes

Bryan Bartik Fri, 04 Sep 2009 09:48:31 -0700

Well I think Antoine was on to something...and I found another workaround
:-)


I did a packet capture in Dynamips and just before the session drops, the
TCP/LDP keepalive between R1 and R9 loop between each other. I get about 254
packets in under a sec and I can see the TTL decrement on each one! This
happens every minute and eventually the timeout expires because these
keepalive are not received and processed locally by R9 (thus every 3 minutes
it drops). R1 is sending the keepalive as a labeled packet even though R9 is
directly connected, I think it should be an imp-null label. According to the
capture R1 is putting label 20 on the keepalive. Here I verified it:

R1#sho mpls ldp bindings
  lib entry: 123.1.9.0/24, rev 2
        local binding:  label: imp-null
        remote binding: lsr: 123.1.9.9:0, label: 20

R9#sho mpls forwarding-table labels 20
Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop
Label  Label or VC   or Tunnel Id      Switched      interface
20     Pop Label     123.1.9.0/24[V]   3358266       Se1/2      point2point
R9#

R1 is learning the label 20 and using that to send packets onto the directly
connected network. R9 receives it, pops it and sends it back out of the
interface instead of receiving it locally. (I am using the interface as ldp
transport address for this example). It's hard to tell who is at fault here!


In order for the session to actually come up again, all the learned labels
have to get flushed so the TCP exchange can happen. Once it does, TCP
handshake and label exchange occurs without any label encapsulation. R9 then
advertises the vrf subnet (connected to R1) in a label mapping message with
a label of 20 (verified with capture). This label is now used by R1 and any
further TCP exchange uses that label and packets loop. It's funny to see it
happen, because as soon as R9 sends that label mapping, R1 uses the label in
the ACK! However this does not cause any problems at this point, because R1
also sends an unlabeled ACK as well, completeing the exchange (weird).

To fix this, I have configured the following on R1:

access-list 1 deny   123.1.9.0 0.0.0.255
access-list 1 permit any
mpls ldp neighbor 123.1.9.9 labels accept 1

On R3 I did similar:

access-list 1 deny   123.3.8.0 0.0.0.255
access-list 1 permit any
mpls ldp neighbor 123.3.8.8 labels accept 1

It's been up for more than 3 minutes:

R1#sho mpls ldp neighbor
    Peer LDP Ident: 123.1.9.9:0; Local LDP Ident 123.123.123.1:0
        TCP connection: 123.1.9.9.11032 - 123.1.9.1.646
        State: Oper; Msgs sent/rcvd: 13/13; Downstream
        Up time: 00:04:55
        LDP discovery sources:
          Serial1/0, Src IP addr: 123.1.9.9
        Addresses bound to peer LDP Ident:
          123.1.9.9

R3#sho mpls ldp neighbor
    Peer LDP Ident: 123.3.8.8:0; Local LDP Ident 123.123.123.3:0
        TCP connection: 123.3.8.8.11036 - 123.3.8.3.646
        State: Oper; Msgs sent/rcvd: 15/15; Downstream
        Up time: 00:06:13
        LDP discovery sources:
          Serial1/0, Src IP addr: 123.3.8.8
        Addresses bound to peer LDP Ident:
          123.3.8.8

Can't believe that is they way it should be designed (I still think bug),
but for now it is stable :-)

thanks all,

On Fri, Sep 4, 2009 at 8:47 AM, Rick Mur <[email protected]> wrote:

> True, the CSC code can be buggy sometimes. As this 3 minute marker is
> definitely a bug. I recall I also had some issues where it worked only some
> period of time, so maybe I ran into that same issue. Still it's sometimes
> buggy and I don't think that will happen on your lab.
>
> --
> Regards,
>
> Rick Mur
> CCIE2 #21946 (R&S / Service Provider)
> Sr. Support Engineer – IPexpert, Inc.
> URL: http://www.IPexpert.com
>
> On 4 sep 2009, at 14:11, Bryan Bartik wrote:
>
> Thanks Guys, I am using 7200s in dynamips with 12.2S code. In fact, this is
> a mock set up of vol 2 lab 4, I just took everything else out of the
> equation. I started from scratch just doing the MPLS VPN scenarios. Even R2
> is not doing anything (ospf or ldp) right now. I am also just using physical
> interfaces on my frame relay cloud :) I don't see the client tagging the
> packets but I could look deeper into this. It's just amazing that is every 3
> minutes pretty much on the dot! I wonder if it's some type of CSC bug having
> to do with MPLS on the VRF interface...I will do some more testing and let
> you know.
>
> On Fri, Sep 4, 2009 at 3:22 AM, Antonie Henning - MWEB 
> <[email protected]>wrote:
>
>> Had a similar issue. I narrowed it down to point to point frame-relay
>> subinterface with csc igp (ospf) and ldp enabled.
>>
>> What I saw was a loop. The client would tag packets to the carrier for the
>> directly connected subnet. I was expecting to see it pop the label:
>>
>> R2(config-subif)#do sh ip cef 123.2.6.6
>> 123.2.6.0/24
>>  attached to Serial4/0.206 label 614
>>
>> The 614 label then sends the packet back to the client and a loop forms:
>>
>> R2(config)#do trace 123.2.6.6
>>
>> Type escape sequence to abort.
>> Tracing the route to 123.2.6.6
>>
>>  1 123.2.6.6 [MPLS: Label 614 Exp 0] 8 msec 32 msec 12 msec
>>  2 123.2.6.2 20 msec 40 msec 40 msec
>>  3 123.2.6.6 [MPLS: Label 614 Exp 0] 44 msec 24 msec 44 msec
>>  4 123.2.6.2 20 msec 48 msec 64 msec
>>  5 123.2.6.6 [MPLS: Label 614 Exp 0] 40 msec 24 msec 40 msec
>>  6 123.2.6.2 48 msec 72 msec 60 msec
>>  7 123.2.6.6 [MPLS: Label 614 Exp 0] 60 msec 36 msec 64 msec
>>  8 123.2.6.2 56 msec 60 msec 88 msec
>>  9 123.2.6.6 [MPLS: Label 614 Exp 0] 60 msec 108 msec 60 msec
>>
>> Changing the frame-relay config to use the main interface solved the
>> problem.
>>
>> Hth
>> 21500.net
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of
>> Francisco Baena
>> Sent: 04 September 2009 08:11 AM
>> To: 'Bryan Bartik'; [email protected]; [email protected]
>> Subject: RE: Carrier Supporting Carrier (CSC) - LDP peering between SPs
>> drops every 3 minutes
>>
>> Make it two. I had the same problem with vol II - Lab 4 (I think), but
>> from
>> R2 to R6.
>>
>> At that point I blamed on dynamips being buggy as all the routing/MPLS
>> tables seem fine.
>>
>> I look forward to a resolution on this. The interesting thing is that when
>> I
>> shut down the connection from R1 to R2, the problem went away, so it
>> sounds
>> like an IGP issue. However even making a Sham link between r6 and r9 and
>> increasing the OSPF cost from R1 to R2 (to ensure AS200 was the exit
>> point),
>> made no difference.
>>
>> I would say that a possible workaround could be to make the R1-R9 LDP
>> sessions targeted, but obviously what we all want to know is what the heck
>> happened there in the first place.
>>
>> During my testing I disabled MPLS TE too in AS200, just in case, but no
>> cigar....
>>
>> Cheers,
>> Francisco
>> http://www.linkedin.com/in/fbaena
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of
>> Bryan Bartik
>> Sent: 04 September 2009 05:14
>> To: [email protected]; [email protected]
>> Subject: Carrier Supporting Carrier (CSC) - LDP peering between SPs drops
>> every 3 minutes
>>
>> Hello,
>>
>> I am running an inter-as + CSC scenario with OSPF and MPLS between the
>> SPs.
>> The LDP peering is dropping exactly every 3 minutes and coming back up.
>> Connectivity is fine throughout the VPN when the session is up. I don't
>> see
>> any route flapping in the debugs.
>>
>> AS65123[R3]---------[R8]AS100---------AS200[R9]---------[R1]AS65123
>>
>> AS100 and AS200 have an inter-as VPN supporting the 2nd level carrier
>> AS65123.
>> R9 and R8 have vrf interfaces connected to R1 and R3 respectively.
>> R1 has an LDP/OSPF peering with R9 (in the vrf) in AS100
>> R3 has an LDP/OSPF peering with R8 (in the vrf) in AS200
>>
>> Both LDP sessions are bouncing on queue every 3 minutes! Here is R8 for
>> example:
>>
>> R8#
>> *Sep  3 21:50:49.575: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.3:0 is DOWN
>> *Sep  3 21:50:59.475: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.3:0 is UP
>> *Sep  3 21:53:59.491: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.3:0 is DOWN
>> *Sep  3 21:54:09.291: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.3:0 is UP
>> *Sep  3 21:57:09.311: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.3:0 is DOWN
>> *Sep  3 21:57:19.331: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.3:0 is UP
>> *Sep  3 22:00:19.351: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.3:0 is DOWN
>> *Sep  3 22:00:29.239: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.3:0 is UP
>>
>> Now just before the session dies I get the following from "debug mpls ldp
>> transport events interface x/x". This is from R9:
>>
>> R9#
>> *Sep  3 22:09:27.295: ldp: Send ldp hello; Serial1/2, src/dst
>> 123.1.9.9/224.0.0.2, inst_id 0
>> *Sep  3 22:09:28.787: ldp: Rcvd ldp hello; Serial1/2, from 123.1.9.1 (
>> 123.123.123.1:0), intf_id 0, opt 0xC
>> *Sep  3 22:09:32.127: ldp: Send ldp hello; Serial1/2, src/dst
>> 123.1.9.9/224.0.0.2, inst_id 0
>> *Sep  3 22:09:33.239: ldp: Rcvd ldp hello; Serial1/2, from 123.1.9.1 (
>> 123.123.123.1:0), intf_id 0, opt 0xC
>> *Sep  3 22:09:36.111: ldp: Send ldp hello; Serial1/2, src/dst
>> 123.1.9.9/224.0.0.2, inst_id 0
>> *Sep  3 22:09:36.907: tagcon: Session KeepAlive timer expired, peer
>> 123.123.123.1:0 (pp 0x64027BD8)
>> *Sep  3 22:09:36.911: ldp: Close LDP transport conn for adj 0x63486648
>> *Sep  3 22:09:36.915: ldp: Closing ldp conn 123.1.9.9:646 <->
>> 123.123.123.1:11015, adj 0x63486648
>> *Sep  3 22:09:36.919: ldp: Adj 0x63486648; state set to closed
>> *Sep  3 22:09:36.923: %LDP-5-NBRCHG: LDP Neighbor 123.123.123.1:0 is DOWN
>>
>> Even though I am sending and receiving hellos, I am getting the session
>> keepalive timer expired. Any ideas?
>>
>> Thanks!
>> --
>> Bryan Bartik
>> CCIE #23707 (R&S), CCNP
>> Sr. Support Engineer - IPexpert, Inc.
>> URL: http://www.IPexpert.com
>>
>> _____________________________________________________________________
>> Subscription information: http://www.groupstudy.com/list/comserv.html
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.409 / Virus Database: 270.13.76/2343 - Release Date: 09/03/09
>> 05:50:00
>>
>> _____________________________________________________________________
>> Subscription information: http://www.groupstudy.com/list/comserv.html
>> Connect with South Africa’s leading Internet Service Provider and discover
>> the magic of the Internet and all its possibilities.
>> Call 08600 32000 or click here(http://www.mweb.co.za/productsservices/)
>> for more.
>>
>> MWEB :-)  CONNECT AND YOU CAN.
>>
>> This electronic communication and the attached file(s) are subject to a
>> disclaimer which can be accessed on the following link:
>> Disclaimer - or copy the following URL into your browser -
>> http://www.mweb.co.za/disclaimer.
>> If you are unable to view the disclaimer, please contact [email protected] a 
>> copy.
>>
>
>
>
> --
> Bryan Bartik
> CCIE #23707 (R&S), CCNP
> Sr. Support Engineer - IPexpert, Inc.
> URL: http://www.IPexpert.com
> _______________________________________________
> For more information regarding industry leading CCIE Lab training, please
> visit www.ipexpert.com
>
>
>


-- 
Bryan Bartik
CCIE #23707 (R&S), CCNP
Sr. Support Engineer - IPexpert, Inc.
URL: http://www.IPexpert.com

_______________________________________________
For more information regarding industry leading CCIE Lab training, please visit 
www.ipexpert.com

Re: [OSL | CCIE_SP] Carrier Supporting Carrier (CSC) - LDP peering between SPs drops every 3 minutes

Reply via email to