On 07.01.2016 19:40, Thomas Graf wrote:
On 01/07/16 at 06:50pm, Hannes Frederic Sowa wrote:
On 07.01.2016 18:21, Thomas Graf wrote:
On 01/07/16 at 08:35am, Jesse Gross wrote:
On Thu, Jan 7, 2016 at 3:49 AM, Thomas Graf <tg...@suug.ch> wrote:
A simple start could be to add a new return code for > MTU drops in
the dev_queue_xmit() path and check for NET_XMIT_DROP_MTU in
ovs_vport_send() and emit proper ICMPs.

That could be interesting. The problem in the past was making sure
that ICMPs that are generated fit in the virtual network appropriately
- right addresses, etc. This requires either spoofing addresses or
some additional knowledge about the topology that we don't currently
have in the kernel.

Are you worried about emitting an ICMP with a source which is not
a local host address?

We have uRPF enabled for IPv4 by default on all kernels. Thus if we generate
an IPv4 ICMP packet back with an error message it must have a source address
which the receiving kernel considers valid. Valid means that sending to the
source address would have used the same outgoing interface the ICMP error
came in from.

Agreed. I think this is given though as we would reverse the addresses
as icmp_send() already does:

         saddr = iph->daddr;

Can't we just use icmp_send() in the context of the inner header and
feed it to the flow table to send it back? It should be the same as
for ip_forward().

The bridge's ip address often has no valid path as seen from the end host
system receiving the icmp error, because the openvswitch is not really part
of the L3 forwarding chain.

I don't think the IP of the bridge ever comes into play. It shouldn't.
I'm not even sure what could be considered the address of the bridge
;-)

Yes, exactly. :)


Faking the address from the packet (e.g. using the destination address of
the original packet) will make traceroute go nuts.

I think you are worried about an ICMP error from a hop which does not
decrement TTL. I think that's a good point and I think we should only
send an ICMP error if the TTL is decremented in the action list of
the flow for which we have seen a MTU based drop (or TTL=0).

Also agreed, ovs must act in routing mode but at the same time must have an IP address on the path. I think this is actually the problem.

Currently we have no way to feedback an error in current configurations with ovs sitting in another namespace for e.g. docker containers:

We traverse a net namespace so we drop skb->sk, we don't hold any socket reference to enqueue an PtB error to the original socket.

We mostly use netif_rx_internal queues the socket on the backlog, so we can't signal an error over the callstack either.

And ovs does not necessarily have an ip address as the first hop of the namespace or the virtual machine, so it cannot know a valid ip address with which to reply, no?

I don't really see a difference between ip_forward(), some
sophisticated tc action or OVS. As soon as they decremented TTL and
perform L3 forwarding, then they should send out ICMP errors to allow
for proper PMTU.

Yes, but depending on the ip configuration, those icmps will then be dropped in the reverse path filter.

Normally ethernet devices don't return icmp error messages. E.g. broken
jumbo frame configuration just leads to silent packet loss because the
packet is discarded before a router can handle it. Thus it would be best in
case of local ovs installation if the error is already transported back to
the client application via the network call stack. This might be very
difficult in case we enqueue the packet to a backlog queue and reschedule
softirqs. Probably we need some way of faking source addresses from bridges
now.... :/

I think the major complications comes from the assumption that OVS is
a bridge. This is not necessarily the case as stated above. If a flow
is doing L3 forwarding, we should send ICMPs as expected from a
router.

If we are doing L3 forwarding into a tunnel, this is absolutely correct and can be easily done.

Bye,
Hannes


_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to