On 14/05/2024 11:38, Dumitru Ceara wrote:
On 5/14/24 12:06, Brendan Doyle via discuss wrote:On 14/05/2024 09:50, Dumitru Ceara wrote:On 5/7/24 12:38, Brendan Doyle via discuss wrote:Hi, Seems there is a regression with the latest LTS release in terms of Port Group ACLs when ports are in multiple Port Groups. As an example I have 3 ports in a Port Group, and two of them in another Port Group, that has an ACL to allow IP protocol 112, and this use to work, but now I am seeing lots of: 2024-05-07T05:24:06.692Z|01357|acl_log(ovn_pinctrl0)|INFO|name="def-10", verdict=drop, severity=info, direction=to-lport: ip,vlan_tci=0x0000,dl_src=00:13:97:ec:6a:3d,dl_dst=01:00:5e:00:00:12,nw_src=10.1.2.21,nw_dst=224.0.0.18,nw_proto=112,nw_tos=192,nw_ecn=0,nw_ttl=255,nw_frag=no If I place the ACL in the PG that has all ports then the 112 pkts are allowed.Hi Brendan, Thanks for reporting this! I tried in a simplified setup and I can't really reproduce the problem. We probably need some more info to debug this, please see below.Hi Dumitru, I'll need to reproduce it myself, I've noticed that it seems to be intermittent, and sometimes works and other times does not. But yes, once I reproduceHmm, that's interesting information though. Can it be an incremental processing bug in ovn-northd or ovn-controller? If you reproduce the issue the following additional info might help us debug: # Enable northd jsonrpc and i-p logs. ovn-appctl -t ovn-northd vlog/disable-rate-limit ovn-appctl -t ovn-northd vlog/set jsonrpc:DBG ovn-appctl -t ovn-northd vlog/set inc_proc_eng:DBG ovn-appctl -t ovn-northd vlog/set inc_proc_northd:DBG # Enable ovn-controller jsonrpc and i-p logs. ovn-appctl vlog/disable-rate-limit ovn-appctl vlog/set jsonrpc:DBG ovn-appctl vlog/set inc_proc_eng:DBG # Trigger an ovn-controller recompute. ovn-appctl inc-engine/recompute # Check if traffic is allowed properly. # If not, collect northd and ovn-controller logs and share them. # Trigger an ovn-northd recompute. ovn-appctl -t ovn-northd inc-engine/recompute # Check if traffic is allowed properly. # If not, collect northd and ovn-controller logs and share them.
OK, so I have reproduced it (eventually) and now I don't think it is intermittent, it
more depends on the members of the Port Groups (PG).We have a per subnet PG (call it pg_sn) that contains an entry for all NICs in the subnet, and we have a Load Balancer PG (Not OVN LB, this is two haproxy services running in a containers and connected to br-int via a veth pipe) call that pg_lb (this is the PG that allows
proto 112).So If I just create an LB, then its ports (one in each container) are added to both PGs. And all is ok proto 112 is allowed through. But then I deploy a VM in the subnet so that pg_sn now has three ports the VMs and the two LB ones, and pg_lb just has the LB
ones, then suddenly proto 112 starts to get dropped.I'll get the ovn-trace and ND DB dump in a bit, but just wanted to update you
on the further characterization of the problem. Brendan
I'll get an ovn-trace and full NB DB dump. ThanksThanks!Here are the details (an example of just one) The PG with all ports: ovn-nbctl list Port_Group pg_vcn4958117_net72295_sl42074 _uuid : a928b8f5-3fce-4d32-afaa-8a9cc46ac902 acls : [29cd825f-73c5-461d-8603-c78a1e89e799, 30971674-3471-49aa-ab55-8df17528b250, 4f311ad6-0eae-4e23-aea8-a48ac8d503af, 6d142bf8-30fc-46df-adde-2318c55a62f2, 6d448692-ac65-4d40-aa30-c87fa478ed96, a647d19a-eaf6-4240-a323-0065d3fc8b4e, c323482c-b748-41f3-85a9-de67aa541ff5, d3d923d1-0e89-4165-a747-05a5908e5f08] external_ids : {} name : pg_vcn4958117_net72295_sl42074 ports : [192c7921-8091-4008-878a-5b2f6ae7aadf, lbv4958117L650B a3e67f89-a989-469d-b7f2-a01631ad4f46, 766e9eda-0943-11ef-9ba6-0010e0daa67a ee13c9d9-576b-4fa7-9c5b-c2accbc0a811] lbv4958117L650A The Port Group that has just two Ports (which are also in the other PG) ovn-nbctl list Port_Group lb_pg_vcn4958117_L650 _uuid : 4b04b824-7d27-4a9d-addc-818919544b0f acls : [444c209a-8ef6-41c7-97d7-aa66a9c38d66, 6b76680a-395c-4be1-80df-dcde36426acd] external_ids : {} name : lb_pg_vcn4958117_L650 ports : [192c7921-8091-4008-878a-5b2f6ae7aadf, lbv4958117L650B ee13c9d9-576b-4fa7-9c5b-c2accbc0a811] lbv4958117L650A The ACL associated with the PG: ovn-nbctl acl-list lb_pg_vcn4958117_L650 from-lport 32000 (inport == @lb_pg_vcn4958117_L650 && (ip4.dst == 224.0.0.18 && ip.proto == 112)) allow-related to-lport 32000 (outport == @lb_pg_vcn4958117_L650 && (ip4.dst == 224.0.0.18 && ip.proto == 112)) allow-related That should allow IP 112, but it does not (anymore, like I said it use to work)! The ACL associated with pg_vcn4958117_net72295_sl42074: from-lport 32767 (inport == @pg_vcn4958117_net72295_sl42074 && (arp || udp.dst == 67 || udp.dst == 68)) allow-related from-lport 32767 (inport == @pg_vcn4958117_net72295_sl42074 && (ip4.dst == 169.254.0.2 && tcp.dst == 3260)) reject log(name=pg_vcn4958117_net72295_sl42074_reject,severity=info) from-lport 32766 (inport == @pg_vcn4958117_net72295_sl42074 && (ip4.src == 169.254.0.0/16 ||ip4.dst == 169.254.0.0/16)) allow-related from-lport 16000 (inport == @pg_vcn4958117_net72295_sl42074) allow-related from-lport 0 (inport == @pg_vcn4958117_net72295_sl42074) drop log(name=def-4,severity=info) to-lport 32767 (outport == @pg_vcn4958117_net72295_sl42074 && (arp || udp.dst == 67 || udp.dst == 68)) allow-related to-lport 32767 (outport == @pg_vcn4958117_net72295_sl42074 && (ip4.src == 169.254.0.0/16 ||ip4.dst == 169.254.0.0/16)) allow-related to-lport 0 (outport == @pg_vcn4958117_net72295_sl42074) drop log(name=def-10,severity=info) So Even though the lb_pg_vcn4958117_L650 allows 112, we are hitting the drop in the ACL above! I looked at the sbflows and it seems to have flows for the 112 ACL entry: ovn-sbctl lflow-list -------------------- Datapath: "ls_vcn4958117_net72295" (7bba2e8d-6612-487b-8f1e-f2e98d281a85) Pipeline: ingress table=9 (ls_in_acl ), priority=33000, match=(reg0[7] == 1 && (inport == @lb_pg_vcn4958117_L650 && (ip4.dst == 224.0.0.18 && ip.proto == 112))), action=(reg0[1] = 1; next;) table=9 (ls_in_acl ), priority=33000, match=(reg0[8] == 1 && (inport == @lb_pg_vcn4958117_L650 && (ip4.dst == 224.0.0.18 && ip.proto == 112))), action=(next;) Datapath: "ls_vcn4958117_net72295" (7bba2e8d-6612-487b-8f1e-f2e98d281a85) Pipeline: egress table=4 (ls_out_acl ), priority=33000, match=(reg0[7] == 1 && (outport == @lb_pg_vcn4958117_L650 && (ip4.dst == 224.0.0.18 && ip.proto == 112))), action=(reg0[1] = 1; next;) table=4 (ls_out_acl ), priority=33000, match=(reg0[8] == 1 && (outport == @lb_pg_vcn4958117_L650 && (ip4.dst == 224.0.0.18 && ip.proto == 112))), action=(next;)Would it be possible to try an ovn-trace for the packets? The output should be quite accurate, VRRP (IP multicast packets will always be considered as having ct_state=+trk+new if I'm not wrong). An example of ovn-trace invocation (you need to replace macs, IPs and ports to match your setup): ovn-trace 'inport=="vm1" && eth.src == 00:00:00:00:00:01 && eth.dst == 01:00:5e:00:00:12 && ip4.src == 42.42.42.2 && ip4.dst == 224.0.0.18 && ip.proto == 112'And OVS flows also seem to have entries: ovs-ofctl dump-flows br-int cookie=0x48b6ec2c, table=17, priority=33000,ip,reg0=0x100/0x100,reg14=0x5,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,18) cookie=0xbb05b3e, table=17, priority=33000,ip,reg0=0x100/0x100,reg14=0x26,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,18) cookie=0x45ade6ea, table=17, priority=33000,ip,reg0=0x100/0x100,reg14=0x28,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,18) cookie=0x9ea8bc7e, table=17, priority=33000,ip,reg0=0x100/0x100,reg14=0x7,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,18) cookie=0xc3b6660c, table=17, priority=33000,ip,reg0=0x100/0x100,reg14=0x31,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,18) cookie=0x1574390e, table=17, priority=33000,ip,reg0=0x100/0x100,reg14=0x32,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,18) cookie=0xba0820b4, table=17, priority=33000,ip,reg0=0x100/0x100,reg14=0x4,metadata=0x61,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,18) cookie=0x3ab5481d, table=17, priority=33000,ip,reg0=0x80/0x80,reg14=0x5,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,18) cookie=0x246464a0, table=17, priority=33000,ip,reg0=0x80/0x80,reg14=0x26,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,18) cookie=0xb7597534, table=17, priority=33000,ip,reg0=0x80/0x80,reg14=0x28,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,18) cookie=0xa39db4b2, table=17, priority=33000,ip,reg0=0x80/0x80,reg14=0x7,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,18) cookie=0x1ba59849, table=17, priority=33000,ip,reg0=0x80/0x80,reg14=0x31,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,18) cookie=0x845680ed, table=17, priority=33000,ip,reg0=0x80/0x80,reg14=0x32,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,18) cookie=0x5b88a1c8, table=17, priority=33000,ip,reg0=0x80/0x80,reg14=0x4,metadata=0x61,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,18) cookie=0x43eec225, table=44, priority=33000,ip,reg0=0x80/0x80,reg15=0x5,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,45) cookie=0xb2e98e98, table=44, priority=33000,ip,reg0=0x80/0x80,reg15=0x26,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,45) cookie=0x22d2bdce, table=44, priority=33000,ip,reg0=0x80/0x80,reg15=0x28,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,45) cookie=0xcbc86a80, table=44, priority=33000,ip,reg0=0x80/0x80,reg15=0x7,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,45) cookie=0x788cf92d, table=44, priority=33000,ip,reg0=0x80/0x80,reg15=0x31,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,45) cookie=0x93fc9ed, table=44, priority=33000,ip,reg0=0x80/0x80,reg15=0x32,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,45) cookie=0xa41bd3e7, table=44, priority=33000,ip,reg0=0x80/0x80,reg15=0x4,metadata=0x61,nw_dst=224.0.0.18,nw_proto=112 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,45) cookie=0xb0c3031a, table=44, priority=33000,ip,reg0=0x100/0x100,reg15=0x5,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,45) cookie=0x13552aa4, table=44, priority=33000,ip,reg0=0x100/0x100,reg15=0x26,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,45) cookie=0xda88db5f, table=44, priority=33000,ip,reg0=0x100/0x100,reg15=0x28,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,45) cookie=0xf29c5f9c, table=44, priority=33000,ip,reg0=0x100/0x100,reg15=0x7,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,45) cookie=0x73e20386, table=44, priority=33000,ip,reg0=0x100/0x100,reg15=0x31,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,45) cookie=0xe72b74cd, table=44, priority=33000,ip,reg0=0x100/0x100,reg15=0x32,metadata=0x7,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,45) cookie=0x302c1113, table=44, priority=33000,ip,reg0=0x100/0x100,reg15=0x4,metadata=0x61,nw_dst=224.0.0.18,nw_proto=112 actions=resubmit(,45) And as I said if I add the rule to the PG that has all the ports in it, then things work: ovn-nbctl acl-list pg_vcn4958117_net72295_sl42074 from-lport 32767 (inport == @pg_vcn4958117_net72295_sl42074 && (arp || udp.dst == 67 || udp.dst == 68)) allow-related from-lport 32767 (inport == @pg_vcn4958117_net72295_sl42074 && (ip4.dst == 169.254.0.2 && tcp.dst == 3260)) reject log(name=pg_vcn4958117_net72295_sl42074_reject,severity=info) from-lport 32766 (inport == @pg_vcn4958117_net72295_sl42074 && (ip4.src == 169.254.0.0/16 ||ip4.dst == 169.254.0.0/16)) allow-related from-lport 32000 (inport == @lb_pg_vcn4958117_L650 && (ip.proto == 112)) allow-related log(name=BJD) from-lport 32000 (inport == @pg_vcn4958117_net72295_sl42074 && (ip4.dst == 224.0.0.18 && ip.proto == 112)) allow-related log(name=BJD) from-lport 16000 (inport == @pg_vcn4958117_net72295_sl42074) allow-related from-lport 0 (inport == @pg_vcn4958117_net72295_sl42074) drop log(name=def-4,severity=info) to-lport 32767 (outport == @pg_vcn4958117_net72295_sl42074 && (arp || udp.dst == 67 || udp.dst == 68)) allow-related to-lport 32767 (outport == @pg_vcn4958117_net72295_sl42074 && (ip4.src == 169.254.0.0/16 ||ip4.dst == 169.254.0.0/16)) allow-related to-lport 32000 (outport == @lb_pg_vcn4958117_L650 && (ip.proto == 112)) allow-related log(name=BJD) to-lport 32000 (outport == @pg_vcn4958117_net72295_sl42074 && (ip4.dst == 224.0.0.18 && ip.proto == 112)) allow-related log(name=BJD) to-lport 0 (outport == @pg_vcn4958117_net72295_sl42074) drop log(name=def-10,severity=info)An ovn-trace in the working scenario might help us spot the difference. Ideally, if you could share the whole northbound database content that would make it easier to debug. Best regards, Dumitru_______________________________________________ discuss mailing list disc...@openvswitch.org https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!MNu3KG-dhcR3OvG9l4OHEvHUau7EGmXYnHC-AhsZNWmd7Ce2cHOrXo-JocosHHo2hsxOKxE75xGxw03mCg$
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss