On 5/26/25 12:31 PM, Q Kay wrote:
> Hi Dumitru,
> 

Hi Ice Bear,

> I think this is the file you want.


Yes, that's it, thanks!

> Thanks for guiding me.

No problem.

So, after looking at the DB contents I see that logical switch 1
(70974da0-2e9d-469a-9782-455a0380ab95) has no ACLs applied (directly or
indirectly through port groups).

On the other hand, for logical switch 2:

> ovn-nbctl show neutron-6aba7876-b3bc-4d71-99bc-7b2644f326e9
switch ec22da44-9964-49ff-9c29-770a26794ba4 
(neutron-6aba7876-b3bc-4d71-99bc-7b2644f326e9) (aka Logical_switch_2)
    port b8f1e947-7d06-4899-8c1c-206e81e70e74
        type: localport
        addresses: ["fa:16:3e:55:88:90 10.10.20.2"]
    port a2b9537d-d8a1-4cb9-9582-f41e49ed22a3
        addresses: ["fa:16:3e:9e:4d:93 10.10.20.137"]
    port 97f2c854-44e9-4558-a0ef-81e42a08f414
        addresses: ["fa:16:3e:81:ed:92 10.10.20.102", "unknown"]
    port 4b7aa4f3-d126-41b6-9f0e-591c6921698b
        addresses: ["fa:16:3e:72:fd:e5 10.10.20.41", "unknown"]
    port 43888846-637f-46e6-ad5d-0acd5e6d6064
        addresses: ["unknown"]

The a2b9537d-d8a1-4cb9-9582-f41e49ed22a3 logical switch port is part of
the following port group:

> ovn-nbctl list logical_switch_port 12869fa4-2f1f-4c2f-bf65-60ce796a1d51
_uuid               : 12869fa4-2f1f-4c2f-bf65-60ce796a1d51    <<<<<< UUID
addresses           : ["fa:16:3e:9e:4d:93 10.10.20.137"]
dhcpv4_options      : 159d49d0-964f-4ba6-aa58-dfbb8bfeb463
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="10.10.20.137/24", 
"neutron:device_id"="1cda8c1a-b594-4942-8273-557c1e88c666", 
"neutron:device_owner"="compute:nova", 
"neutron:host_id"=khangtt-osp-compute-01-84, "neutron:mtu"="", 
"neutron:network_name"=neutron-6aba7876-b3bc-4d71-99bc-7b2644f326e9, 
"neutron:port_capabilities"="", "neutron:port_name"="", 
"neutron:project_id"="7f19299bb3bd43d4978fff45783e4346", 
"neutron:revision_number"="4", 
"neutron:security_group_ids"="940e2484-bb38-463b-a15f-d05b9dc9f5f0", 
"neutron:subnet_pool_addr_scope4"="", "neutron:subnet_pool_addr_scope6"="", 
"neutron:vnic_type"=normal}
ha_chassis_group    : []
mirror_rules        : []
name                : "a2b9537d-d8a1-4cb9-9582-f41e49ed22a3"
options             : {requested-chassis=khangtt-osp-compute-01-84}
parent_name         : []
peer                : []
port_security       : ["fa:16:3e:9e:4d:93 10.10.20.137"]
tag                 : []
tag_request         : []
type                : ""
up                  : false

> ovn-nbctl list port_group pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0
_uuid               : 6d232961-a51c-48cb-aa4f-84eb3108c71f
acls                : [d7e20fdb-f613-4147-b605-64b8ffbe9742, 
dcae0790-6c86-4e4d-8f01-d9be12d26c48]
external_ids        : 
{"neutron:security_group_id"="940e2484-bb38-463b-a15f-d05b9dc9f5f0"}
name                : pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0
ports               : [12869fa4-2f1f-4c2f-bf65-60ce796a1d51, 
1972206b-327a-496b-88fc-d17625d013e1, 2fb22d1a-bbfc-4173-b6fc-1ae3adc5ddcd, 
3947661b-4deb-4aed-bd15-65839933fea3, caf0fe63-61be-4b1a-b306-ff00fa578982, 
fbfaeb2b-6e42-458a-a65f-8d2ef29b8b69, fd662347-4013-4306-b222-e29545f866ec]

And this port group does have allow-related (stateful) ACLs that require
conntrack:

> ovn-nbctl acl-list pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0
from-lport  1002 (inport == @pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 && ip4) 
allow-related
  to-lport  1002 (outport == @pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 && ip4 && 
ip4.src == 0.0.0.0/0) allow-related

So, as suspected before this explains why traffic works in one direction
and doesn't work in the other direction.  Only one logical switch has
stateful ACLs and needs conntrack.

This is an unsupported configuration (so not a bug).  The only way to make
it work is to set the use_ct_inv_match=false option in the NB.

Just mentioning it again here to make sure it's not lost in the thread:
"asymmetric conntrack" and use_ct_inv_match=false means the datapath might
forward traffic with ct_state=+trk+inv and might cause HW offload to not work.

If that's OK for the use case then it's fine to set the option in the NB
database.

Best regards,
Dumitru

> 
> Best regards,
> Ice Bear
> 
> Vào Th 2, 26 thg 5, 2025 vào lúc 17:05 Dumitru Ceara <dce...@redhat.com>
> đã viết:
> 
>> On 5/26/25 11:38 AM, Q Kay wrote:
>>> Hi Dumitru,
>>>
>>
>> Hi Ice Bear,
>>
>>> Here is the NB DB in JSON format (attachment).
>>>
>>
>> Sorry, I think my request might have been confusing.
>>
>> I didn't mean running something like:
>> ovsdb-client -f json dump <path-to-database-socket>
>>
>> Instead I meant just attaching the actual database file.  That's a file
>> (in json format) usually stored in /etc/ovn/ovnnb_db.db.  For OpenStack
>> that might be /var/lib/openvswitch/ovn/ovnnb_db.db on controller nodes.
>>
>> Hope that helps.
>>
>> Regards,
>> Dumitru
>>
>>> Best regards,
>>> Ice Bear
>>>
>>> Vào Th 2, 26 thg 5, 2025 vào lúc 16:10 Dumitru Ceara <
>> dce...@redhat.com>
>>> đã viết:
>>>
>>>> On 5/22/25 9:05 AM, Q Kay wrote:
>>>>> Hi Dumitru,
>>>>>
>>>>
>>>> Hi Ice Bear,
>>>>
>>>> Please keep the ovs-discuss mailing list in CC.
>>>>
>>>>> I am very willing to provide NB DB file for you (attached).
>>>>> I will provide more information about the ports for you to check.
>>>>>
>>>>> Logical switch 1 id: 70974da0-2e9d-469a-9782-455a0380ab95
>>>>> Logical switch 2 id: ec22da44-9964-49ff-9c29-770a26794ba4
>>>>>
>>>>> Instance A:
>>>>> port 1 (connect to ls1): 61a871bc-7709-4072-9991-8e3a1096b02a
>>>>> port 2 (connect to ls2): 63d76c2b-2960-4a89-97ac-9f7a7d4bb718
>>>>>
>>>>>
>>>>> Instance B:
>>>>> port 1: 46848e3c-7a73-46ce-8b3a-b6331e14fc74
>>>>> port 2: 7d39750a-29d6-40df-b42b-54a17efcc423
>>>>>
>>>>
>>>> Thanks for the info.  However, it's easier to investigate if you just
>>>> share the actual NB DB (json) file instead of the ovsdb-client dump.
>>>> It's probably located in a path similar to /etc/ovn/ovnnb_db.db.
>>>>
>>>> Like that I could just load it in a sandbox and run ovn-nbctl commands
>>>> against it directly.
>>>>
>>>> Regards,
>>>> Dumitru
>>>>
>>>>>
>>>>> Best regards,
>>>>> Ice Bear
>>>>> Vào Th 4, 21 thg 5, 2025 vào lúc 16:19 Dumitru Ceara <
>>>> dce...@redhat.com>
>>>>> đã viết:
>>>>>
>>>>>> On 5/21/25 5:16 AM, Q Kay wrote:
>>>>>>> Hi Dumitru,
>>>>>>
>>>>>> Hi Ice Bear,
>>>>>>
>>>>>> CC: ovs-discuss@openvswitch.org
>>>>>>
>>>>>>> Thanks for your answer. First, I will address some of your questions.
>>>>>>>
>>>>>>>>> The critical evidence is in the failed flow, where we see:
>>>>>>>>>
>>>>>>
>>>>
>> 'recirc_id(0x3d77),in_port(28),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no),
>>>>>>>>> packets:48, bytes:4704, used:0.940s, actions:drop'
>>>>>>>>> The packet is being marked as invalid (+inv) and subsequently
>>>> dropped.
>>>>>>>>> It's a bit weird though that this isn't a +rpl traffic.  Is this
>> hit
>>>> by
>>>>>> the ICMP echo or by the ICMP echo-reply packet?
>>>>>>>
>>>>>>> This recirc hit by icmp echo reply packet.
>>>>>>>
>>>>>>
>>>>>> OK, that's good.
>>>>>>
>>>>>>> I understand what you mean. The outgoing and return traffic from
>>>>>>> different logical switches will be flagged as inv. If that's the
>> case,
>>>>>>> it will work correctly with TCP (both are dropped). But for ICMP, I
>>>>>>> notice something a bit strange.
>>>>>>>
>>>>>>>>> My hypothesis is that the handling of ct_state flags is causing the
>>>>>> return
>>>>>>>>> traffic to be dropped. This may be because the outgoing and return
>>>>>>>>> connections do not share the same logical_switch datapath.
>>>>>>>
>>>>>>> According to your reasoning, ICMP reply packets from a different
>>>> logical
>>>>>>> switch than the request packets will be dropped. However, in
>> practice,
>>>>>>> when I initiate an ICMP request from 6.6.6.6 <https://6.6.6.6> to
>>>>>>> 5.5.5.5 <https://5.5.5.5>, the result I get is success (note that
>> echo
>>>>>>> request and reply come from different logical switches regardless of
>>>>>>> whether they are initiated by 5.5.5.5 <https://5.5.5.5> or 6.6.6.6
>>>>>>> <https://6.6.6.6>). You can compare the two recirculation flows to
>> see
>>>>>>> this oddity. You can take a look at the attached image for better
>>>>>>> visualization.
>>>>>>>
>>>>>>
>>>>>> OK.  From the ovn-trace command you shared
>>>>>>
>>>>>>> 2. Using OVN trace:
>>>>>>> ovn-trace --no-leader-only 70974da0-2e9d-469a-9782-455a0380ab95
>> 'inport
>>>>>> ==
>>>>>>> "319cd637-10fb-4b45-9708-d02beefd698a" && eth.src==fa:16:3e:ea:67:18
>> &&
>>>>>>> eth.dst==fa:16:3e:04:28:c7 && ip4.src==6.6.6.6 && ip4.dst==5.5.5.5 &&
>>>>>>> ip.proto==1 && ip.ttl==64'
>>>>>>
>>>>>> I'm guessing the fa:16:3e:ea:67:18 MAC is the one owned by 6.6.6.6.
>>>>>>
>>>>>> Now, after filtering only the ICMP ECHO reply flows in your initial
>>>>>> datapath
>>>>>> flow dump:
>>>>>>
>>>>>>> *For successful ping flow: 5.5.5.5 -> 6.6.6.6*
>>>>>>
>>>>>> Note: ICMP reply comes from 6.6.6.6 to 5.5.5.5 (B -> A).
>>>>>>
>>>>>>> *- On Compute 1 (containing source instance): *
>>>>>>>
>>>>>>
>>>>
>> 'recirc_id(0),tunnel(tun_id=0x2,src=10.10.10.85,dst=10.10.10.84,geneve({class=0x102,type=0x80,len=4,0xb000a/0x7fffffff}),flags(-df+csum+key)),in_port(9),eth(src=fa:16:3e:ea:67:18,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=0/0xfe),
>>>>>>> packets:55, bytes:5390, used:0.204s, actions:29'
>>>>>>
>>>>>> We see no conntrack fields in the match.  So, based on the diagram you
>>>>>> shared,
>>>>>> I'm guessing there's no allow-related ACL or load balancer on logical
>>>>>> switch 2.
>>>>>>
>>>>>> But then for the failed ping flow:
>>>>>>
>>>>>>> *For failed ping flow: 6.6.6.6 -> 5.5.5.5*
>>>>>>
>>>>>> Note: ICMP reply comes from 5.5.5.5 to 6.6.6.6 (A -> B).
>>>>>>
>>>>>>> *- On Compute 1: *
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> 'recirc_id(0),in_port(28),eth(src=fa:16:3e:81:ed:92,dst=fa:16:3e:72:fd:e5),eth_type(0x0800),ipv4(proto=1,frag=no),
>>>>>>> packets:48, bytes:4704, used:0.940s,
>>>> actions:ct(zone=87),recirc(0x3d77)'
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> 'recirc_id(0x3d77),in_port(28),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no),
>>>>>>> packets:48, bytes:4704, used:0.940s, actions:drop'
>>>>>>
>>>>>> In this case we _do_ have conntrack fields in the match/actions.
>>>>>> Is it possible that logical switch 1 has allow-related ACLs or LBs?
>>>>>>
>>>>>> On the TCP side of things: it's kind of hard to tell what's going on
>>>>>> without having the complete configuration of your OVN deployment.
>>>>>>
>>>>>> NOTE: if an ACL is applied to a port group, that is equivalent to
>>>> applying
>>>>>> the ACL to all logical switches that have ports in that port group.
>>>>>>
>>>>>>>>> I'd say it's not a bug.  However, if you want to change the default
>>>>>>>>> behavior you can use the NB_Global.options:use_ct_inv_match=true
>> knob
>>>>>> to
>>>>>>>>> allow +inv packets in the logical switch pipeline.
>>>>>>>
>>>>>>> I tried setting the option use_ct_inv_match=. The result is just as
>> you
>>>>>>> said, everything works successfully with both ICMP and TCP.
>>>>>>> Based on this experiment, I suspect there might be a small bug when
>> OVN
>>>>>>> handles ICMP packets. Could you please let me know if my experiment
>> and
>>>>>>> reasoning are correct?
>>>>>>>
>>>>>>
>>>>>> As said above, it really depends on the full configuration.  Maybe we
>>>> can
>>>>>> tell more if you can share the NB database?  Or at least if you share
>>>> the
>>>>>> ACLs applied on the two logical switches (or port groups).
>>>>>>
>>>>>>>
>>>>>>> Thanks for your support.
>>>>>>>
>>>>>>
>>>>>> No problem.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Ice Bear
>>>>>>
>>>>>> Regards,
>>>>>> Dumitru
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to