I posted a fix. Please review it: http://openvswitch.org/pipermail/dev/2013-April/026864.html
On Mon, Apr 22, 2013 at 05:13:28PM -0700, Ben Pfaff wrote: > I think I see the problem. It is subtle. I'll write up a fix > tomorrow morning. > > On Mon, Apr 22, 2013 at 08:45:52PM +0100, Zoltan Kiss wrote: > > I found one thing which might be related: these flow_del's are > > called by facet_remove, and before them there are usually an another > > flow_del called by facet_unexpected, which looks almost the same, > > except the hardware address look like: > > sha=01:0a:00:4e:00:00,tha=00:00:ff:ff:ff:ff > > But the captures never show such ARP packets. Is it possible that > > these fields get corrupted when they were installed to the datapath? > > This src-target hw address combination are found often by > > facet_unexpected, and never appears in the captures. > > > > Regards, > > > > Zoli > > > > On 22/04/13 19:13, Zoltan Kiss wrote: > > >Hi, > > > > > >On 16/04/13 23:53, Zoltan Kiss wrote: > > >>On 15/04/13 18:15, Ben Pfaff wrote: > > >>>On Mon, Apr 15, 2013 at 03:59:52PM +0100, Zoltan Kiss wrote: > > >>>>When the packet is sent to the controller due to an userspace rule > > >>>>(and not > > >>>>a kernel-space flow), execute_controller_action is invoked with > > >>>>clone=true, > > >>>>so handle_flow_miss retains ownership of the packet buffer. But if it > > >>>>returns > > >>>>true (which means the packet had only a PACKET_IN action), nothing > > >>>>frees up > > >>>>the buffer. > > >>> > > >>>I think you're right. But in that case, wouldn't it solve the problem > > >>>in a better way (doing less memory allocation and copying) by passing > > >>>clone=false, instead of passing clone=true and then freeing the packet > > >>>in the caller? > > >> > > >>It sounds reasonable, and I was thinking about that, but I was worried > > >>about the side-effects. Now I've tried it, and it seems it cause > > >>problems indeed. Broadcast ARP packets are causing problem here: > > >> > > >>dpif|WARN|Dropped 26 log messages in last 1 seconds (most recently, 1 > > >>seconds ago) due to excessive rate > > >>dpif|WARN|system@xenbr0: failed to flow_del (No such file or directory) > > >>in_port(1),eth(src=ab:cd:ef:12:34:56,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.0.0.1,tip=10.0.0.2,op=1,sha=ab:cd:ef:12:34:56,tha=00:00:00:00:00:00) > > >> > > >> > > >> > > >>These messages are coming continuously if I install that joker rule to > > >>userspace. And it doesn't happen with the original clone=true version. I > > >>haven't found out yet why this happens, it shouldn't really change > > >>anything but the time when the packet is freed. > > > > > >I've tried to find out why these warnings come with clone=false, but I > > >didn't succeed yet. I've checked this code path: > > > > > >handle_flow_miss > > > execute_controller_action > > > send_packet_in_action > > > connmgr_send_packet_in (this is where clone makes difference as we > > >pass rw_packet) > > > schedule_packet_in > > > ofputil_encode_packet_in (this is where we either dupe the buffer > > >or use the original) > > > pinsched_send (in my tests there were no pinscheduler involved > > > do_send_packet_in (after this the code won't know about the > > >content of the packet) > > > > > >But I couldn't find any place where it did matter whether we clone and > > >pass a copy (and free the original immediately after > > >execute_controller_action), or just give away the original buffer. And > > >frankly, I'm running out of ideas where else to check. Does anyone touch > > >that buffer in parallel? Any ideas? > > > > > >Regards, > > > > > >Zoli > > > > > >_______________________________________________ > > >dev mailing list > > >dev@openvswitch.org > > >http://openvswitch.org/mailman/listinfo/dev > > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev