Re: Patch: RFC2863 #1 (incomplete)

Stefan Rompf Mon, 07 Nov 2005 11:27:30 -0800

Am Montag 07 November 2005 18:45 schrieb Thomas Graf:

> I'm sorry, I don't get the point. I assume the above is not a carrier
> failure but something else to be detected by a keepalive protocol.


No, it's a carrier failure. Try this, eth1 is an ethernet interface without 
the cable connected:

dose:~ # ip addr add 192.168.200.1/24 dev eth1
dose:~ # ip link set eth1 up
dose:~ # ifconfig eth1 | fgrep UP
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
dose:~ # ip route show 192.168.200.0/24
192.168.200.0/24 dev eth1  proto kernel  scope link  src 192.168.200.1
dose:~ # ping 192.168.200.10
PING 192.168.200.10 (192.168.200.10) 56(84) bytes of data.
>From 192.168.200.1: icmp_seq=2 Destination Host Unreachable
dose:~ # arp
192.168.200.10                   (incomplete)                     eth1

Even though the interface is not IFF_RUNNING and queueing is therefore 
disabled, the kernel created a route pointing to it and uses it.

> Isn't it the routing daemon's fault when preferring a route 
> which has the IFF_RUNNING flag cleared? I'm sorry fot not getting it. ;-)

As the kernel maintains this connected route, userspace is IMHO not 
responsible. Quagga could add its routes with a higher metric (administrative 
distance in cisco terms), kernel should use them as soon as the connected 
route becomes unavailable due to carrier failure (or dormant for wireless 
interfaces).

And now if we support this, we'd need a dormant state with the route disabled 
and one with it enabled.

Btw, adding a userspace workaround would be dangerous. If the routing daemon 
crashes while a link is down, the userspace removed connected route would not 
come back, leaving the router unreachable on this interface. *Very* bad.

> > OTOH, we don't need to be completely atomic as the
> > netif_carrier_*-functions already require driver controlled
> > synchronisation. We just need to make sure that the caches are coherent
> > before linkwatch kernel thread runs.
>
> The reason we now have netif_carrier_* instead of |= IFF_RUNNING as it
> used to be is exactly atomicty. Since you write to oper_state from
> various locations the operation must be atomic to avoid corruption.

No, it's because the IFF_* flags are a bit field that can only be changed from 
process context protected by rtnl/dev_base_lock, different to 
__LINK_STATE_NO_CARRIER (or a operational state field) that must be 
accessible from interrupts.

> > No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag.
>
> Right but the RFC specifies that admin. down implies operational down:

Indeed, but we should not set OPER_DOWN if and only if admin is down ;-)

> This also describes why your scheme cannot work, we have to memorize
> the status of for example carrier state.

Can't see that requirement from RFC, and devices are normally initialized in 
their open()-"methods".

> We're not far apart, the difference is for the additional L3 disabled
> dormant state which I don't understand yet and secondaly that I continue
> to keep the current link states with the addition of a new dormant state
> and then translate it into the RFC2863 operational status.

Ok, so I hope a finally managed to make my point clear about the different 
dormant states ;-)

Stefan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Patch: RFC2863 #1 (incomplete)

Reply via email to