On 4/23/24 17:39, Gavin McKee wrote:
If you look at both traces (non working and working) the thing that
stands out to me is this

At line 10 in the working file the following entry exists
     ct_state NEW tcp (SYN_SENT) orig [172.27.16.11.38793 >
172.27.31.189.9100] reply [172.27.31.189.9100 > 172.27.16.11.38793]
zone 195


That's quite interesting, indeed. It means that in the working scenario, there is already a conntrack entry for the connection even though it's the TCP SYN and there is no flow in the kernel cache (the packet get's upcalled).

Is the client reusing source ports by any chance?

The fact that the conntrack info is already available makes the working case skip an upcall.

Also, I'm curious about the drop in the TC_INGRESS. Are you using tc-offload? Can you dump the tc filter rules for your port?


his doesn't happen in the non working file - I just see the following

3904992932904126 [swapper/125] 0 [kr] queue_userspace_packet
#ddf92049d47aeff1d0b6625620000 (skb 18382861792850905088) n 4
     if 38 (enp148s0f0_1) rxif 38 172.27.16.11.42303 >
172.27.31.189.9100 ttl 64 tos 0x0 id 36932 off 0 [DF] len 52 proto TCP
(6) flags [S] seq 2266263186 win 42340
     upcall_enqueue (miss) (125/3904992932890052) q 3682874017 ret 0
   + 3904992932907247 [swapper/125] 0 [kr] ovs_dp_upcall
#ddf92049d47aeff1d0b6625620000 (skb 18382861792850905088) n 5
     if 38 (enp148s0f0_1) rxif 38 172.27.16.11.42303 >
172.27.31.189.9100 ttl 64 tos 0x0 id 36932 off 0 [DF] len 52 proto TCP
(6) flags [S] seq 2266263186 win 42340
     upcall_ret (125/3904992932890052) ret 0

I am wondering if a failure to track the ct_state SYN is causing the
returning ACK to drop ?

   + 3904992936344421 [swapper/125] 0 [tp] skb:kfree_skb
#ddf9204d21c48ff1d0b676c330c00 (skb 18382861792850913792) n 3 drop
(TC_INGRESS)
     if 33 (genev_sys_6081) rxif 33 172.27.31.189.9100 >
172.27.16.11.42303 ttl 64 tos 0x0 id 0 off 0 [DF] len 52 proto TCP (6)
flags [S.] seq 605271182 ack 2266263187 win 42340

On Mon, 22 Apr 2024 at 18:54, Gavin McKee <gavmcke...@googlemail.com> wrote:

Ok @Adrian Moreno @Flavio Leitner

Two more detailed Retis traces attached.  One is not working - the
same session that I can't establish a TCP session to on port 9010
172.27.16.11.42303 > 172.27.31.189.9100

Then I restart Open vSwtich and try again
172.27.16.11.38793 > 172.27.31.189.9100 (this works post restart)

It looks to me in the non working example that we -
SEND SYN -> exits the tunnel interface genev_sys_6081 via
enp148s0f0np0 - exactly as expected
RECV ACK -> tcp_gro_receive then -> net:netif_receive_skb where we hit
drop (TC_INGRESS)

After a restart things seem to be very different

Any ideas where to look next ? >>
On Mon, 22 Apr 2024 at 12:49, Gavin McKee <gavmcke...@googlemail.com> wrote:

Hi Guys,


We have had another occurrence of the issue today.

In short - we try to open a connection from 172.27.22.90 ->
172.27.31.189 on port 9100.  We see the SYN being received and the ACK
being sent from the server back to the client 172.27.22.90.  The
server retransmits the ACK as it's never received by the server.

When we run retis on the server side the ACK is being dropped - see
(drop (TC_INGRESS) below)

retis -p generic collect -c ovs,ct --ovs-track -f 'port 9100'

(filtered on the ephemeral TCP port 40735)

3879878550398423 [swapper/125] 0 [tp] net:napi_gro_receive_entry
#dc8ba9ec4a5d7ff4167e4dd466000 (skb 18393096588613658112) n 0
   if 37 (enp148s0f0_1)
   + 3879878550401407 [swapper/125] 0 [k] tcp_gro_receive
#dc8ba9ec4a5d7ff4167e4dd466000 (skb 18393096588613658112) n 1
     if 37 (enp148s0f0_1) 172.27.22.90.40735 > 172.27.31.189.9100 ttl
64 tos 0x0 id 31996 off 0 [DF] len 52 proto TCP (6) flags [S] seq
1864480616 win 42340
   + 3879878550402477 [swapper/125] 0 [tp] net:netif_receive_skb
#dc8ba9ec4a5d7ff4167e4dd466000 (skb 18393096588613658112) n 2
     if 37 (enp148s0f0_1) 172.27.22.90.40735 > 172.27.31.189.9100 ttl
64 tos 0x0 id 31996 off 0 [DF] len 52 proto TCP (6) flags [S] seq
1864480616 win 42340
   + 3879878550409874 [swapper/125] 0 [tp] net:net_dev_queue
#dc8ba9ec4a5d7ff4167e4dd466000 (skb 18393096588613658112) n 3
     if 43 (genev_sys_6081) rxif 37 172.27.22.90.40735 >
172.27.31.189.9100 ttl 64 tos 0x0 id 31996 off 0 [DF] len 52 proto TCP
(6) flags [S] seq 1864480616 win 42340
   + 3879878550411535 [swapper/125] 0 [tp] net:net_dev_start_xmit
#dc8ba9ec4a5d7ff4167e4dd466000 (skb 18393096588613658112) n 4
     if 43 (genev_sys_6081) rxif 37 172.27.22.90.40735 >
172.27.31.189.9100 ttl 64 tos 0x0 id 31996 off 0 [DF] len 52 proto TCP
(6) flags [S] seq 1864480616 win 42340
   + 3879878550419012 [swapper/125] 0 [k] skb_scrub_packet
#dc8ba9ec4a5d7ff4167e4dd466000 (skb 18393096588613658112) n 5
     if 43 (genev_sys_6081) rxif 37 172.27.22.90.40735 >
172.27.31.189.9100 ttl 64 tos 0x0 id 31996 off 0 [DF] len 52 proto TCP
(6) flags [S] seq 1864480616 win 42340
   + 3879878550421222 [swapper/125] 0 [k] skb_scrub_packet
#dc8ba9ec4a5d7ff4167e4dd466000 (skb 18393096588613658112) n 6
     if 43 (genev_sys_6081) 172.27.22.90.39026 > 172.27.31.189.6081 ttl
64 tos 0x0 id 31996 off 0 [DF] len 52 proto TCP (6) flags [] seq
5916145:5916177 win 62976
   + 3879878550424101 [swapper/125] 0 [k] ip_output
#dc8ba9ec4a5d7ff4167e4dd466000 (skb 18393096588613658112) n 7
     if 43 (genev_sys_6081) 10.25.24.43.39026 > 10.25.25.41.6081 ttl 64
tos 0x0 id 53033 off 0 len 110 proto UDP (17) len 82
     ct_state NEW udp orig [10.25.24.43.39026 > 10.25.25.41.6081] reply
[10.25.25.41.6081 > 10.25.24.43.39026] zone 0



3879878550702925 [swapper/125] 0 [tp] net:napi_gro_receive_entry
#dc8ba9ec94b4dff4167e881720000 (skb 18393096588613649408) n 0
   if 43 (genev_sys_6081)
   + 3879878550705930 [swapper/125] 0 [k] tcp_gro_receive
#dc8ba9ec94b4dff4167e881720000 (skb 18393096588613649408) n 1
     if 43 (genev_sys_6081) 172.27.31.189.9100 > 172.27.22.90.40735 ttl
64 tos 0x0 id 0 off 0 [DF] len 52 proto TCP (6) flags [S.] seq
3930851750 ack 1864480617 win 42340
   + 3879878550706899 [swapper/125] 0 [tp] net:netif_receive_skb
#dc8ba9ec94b4dff4167e881720000 (skb 18393096588613649408) n 2
     if 43 (genev_sys_6081) 172.27.31.189.9100 > 172.27.22.90.40735 ttl
64 tos 0x0 id 0 off 0 [DF] len 52 proto TCP (6) flags [S.] seq
3930851750 ack 1864480617 win 42340
   + 3879878550708324 [swapper/125] 0 [tp] skb:kfree_skb
#dc8ba9ec94b4dff4167e881720000 (skb 18393096588613649408) n 3 drop
(TC_INGRESS)
     if 43 (genev_sys_6081) rxif 43 172.27.31.189.9100 >
172.27.22.90.40735 ttl 64 tos 0x0 id 0 off 0 [DF] len 52 proto TCP (6)
flags [S.] seq 3930851750 ack 1864480617 win 42340

3879879581610887 [swapper/125] 0 [tp] net:napi_gro_receive_entry
#dc8badc3bb387ff4167e881720400 (skb 18393097140186661632) n 0
   if 43 (genev_sys_6081)
   + 3879879581614196 [swapper/125] 0 [k] tcp_gro_receive
#dc8badc3bb387ff4167e881720400 (skb 18393097140186661632) n 1
     if 43 (genev_sys_6081) 172.27.31.189.9100 > 172.27.22.90.40735 ttl
64 tos 0x0 id 0 off 0 [DF] len 52 proto TCP (6) flags [S.] seq
3930851750 ack 1864480617 win 42340
   + 3879879581615551 [swapper/125] 0 [tp] net:netif_receive_skb
#dc8badc3bb387ff4167e881720400 (skb 18393097140186661632) n 2
     if 43 (genev_sys_6081) 172.27.31.189.9100 > 172.27.22.90.40735 ttl
64 tos 0x0 id 0 off 0 [DF] len 52 proto TCP (6) flags [S.] seq
3930851750 ack 1864480617 win 42340
   + 3879879581617380 [swapper/125] 0 [tp] skb:kfree_skb
#dc8badc3bb387ff4167e881720400 (skb 18393097140186661632) n 3 drop
(TC_INGRESS)
     if 43 (genev_sys_6081) rxif 43 172.27.31.189.9100 >
172.27.22.90.40735 ttl 64 tos 0x0 id 0 off 0 [DF] len 52 proto TCP (6)
flags [S.] seq 3930851750 ack 1864480617 win 42340


When doing a tc monitor I am not even seeing the tc monitor entry for
this session (tcp src port 40735)

Again a restart of Open vSwitch fixes the issue - but this is a heavy hammer!

Gav

On Thu, 18 Apr 2024 at 17:55, Gavin McKee <gavmcke...@googlemail.com> wrote:

I thought ct was in that retis trace . I’ll need to capture these events when 
they occur again .

I’m also having some issues with huge amounts of loss in the kernal pipeline 
that I can’t explain .

When sending traffic VM to VM within a logical switch but going across GENEVE 
tunnel I get 30 - 40 Gbps on iperf . When I use the DAT (by sending traffic to 
the VM DNAT ) I get huge amounts of loss in the sender kernal , retis is saying 
sk_bff drops .

3.2.2 is absolutely killing me right now. Should I open another thread on this ?

On Thu, Apr 18, 2024 at 00:15 Adrian Moreno <amore...@redhat.com> wrote:

Hi Gavin

On 4/18/24 02:38, Gavin McKee via discuss wrote:
This is an example.

Again the TCP 3 handshake completes , but the next packet fails to NAT
and goes out onto the physical network using the private address .  An
example of this is in the packet trace I provided.


Given you were using retis in your initial troubleshooting, you can use it with
the additional "ct" collector to see if the kernel datapath is retrieving the
right conntrack entry and what's its state. If that shows some unexpected
conntrack entry change, it can be confirmed by monitoring "conntrack -E".

Additionally, a dp flow dump (ovs-appctl dpctl/dump-flows -m) when the problem
is happening might be also useful.

--
Adrián


ovs-appctl ofproto/trace br-int
in_port=7753,dl_src=4e:42:14:a1:2a:fb,dl_dst=1a:16:b1:58:e1:cd,tcp,nw_src=172.27.18.244,nw_dst=104.18.3.35,nw_ttl=32,tcp_src=52776,tcp_dst=443,tcp_flags=2
Flow: 
tcp,in_port=7753,vlan_tci=0x0000,dl_src=4e:42:14:a1:2a:fb,dl_dst=1a:16:b1:58:e1:cd,nw_src=172.27.18.244,nw_dst=104.18.3.35,nw_tos=0,nw_ecn=0,nw_ttl=32,nw_frag=no,tp_src=52776,tp_dst=443,tcp_flags=syn

bridge("br-int")
----------------
   0. in_port=7753, priority 100, cookie 0xc33f39c4
      set_field:0x16d->reg13
      set_field:0x155->reg11
      set_field:0x1cf->reg12
      set_field:0x12->metadata
      set_field:0xca->reg14
      resubmit(,8)
   8. metadata=0x12, priority 50, cookie 0x1645d3f2
      set_field:0/0x1000->reg10
      resubmit(,73)
      73. 
ip,reg14=0xca,metadata=0x12,dl_src=4e:42:14:a1:2a:fb,nw_src=172.27.18.244,
priority 90, cookie 0xc33f39c4
              set_field:0/0x1000->reg10
      move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
       -> NXM_NX_XXREG0[111] is now 0
      resubmit(,9)
   9. metadata=0x12, priority 0, cookie 0xcc4fd106
      resubmit(,10)
10. metadata=0x12, priority 0, cookie 0x9e10ad0e
      resubmit(,11)
11. metadata=0x12, priority 0, cookie 0x557f3249
      resubmit(,12)
12. ip,metadata=0x12, priority 100, cookie 0x14131a67
      set_field:0x1000000000000000000000000/0x1000000000000000000000000->xxreg0
      resubmit(,13)
13. metadata=0x12, priority 0, cookie 0x85f9ed4f
      resubmit(,14)
14. ip,reg0=0x1/0x1,metadata=0x12, priority 100, cookie 0x279651c
      ct(table=15,zone=NXM_NX_REG13[0..15])
      drop
       -> A clone of the packet is forked to recirculate. The forked
pipeline will be resumed at table 15.
       -> Sets the packet to an untracked state, and clears all the
conntrack fields.

Final flow: 
tcp,reg0=0x1,reg11=0x155,reg12=0x1cf,reg13=0x16d,reg14=0xca,metadata=0x12,in_port=7753,vlan_tci=0x0000,dl_src=4e:42:14:a1:2a:fb,dl_dst=1a:16:b1:58:e1:cd,nw_src=172.27.18.244,nw_dst=104.18.3.35,nw_tos=0,nw_ecn=0,nw_ttl=32,nw_frag=no,tp_src=52776,tp_dst=443,tcp_flags=syn
Megaflow: 
recirc_id=0,eth,tcp,in_port=7753,dl_src=4e:42:14:a1:2a:fb,dl_dst=1a:16:b1:58:e1:cd,nw_src=172.27.18.244,nw_frag=no
Datapath actions: ct(zone=365),recirc(0x13b5a)

===============================================================================
recirc(0x13b5a) - resume conntrack with default ct_state=trk|new (use
--ct-next to customize)
===============================================================================

Flow: 
recirc_id=0x13b5a,ct_state=new|trk,ct_zone=365,eth,tcp,reg0=0x1,reg11=0x155,reg12=0x1cf,reg13=0x16d,reg14=0xca,metadata=0x12,in_port=7753,vlan_tci=0x0000,dl_src=4e:42:14:a1:2a:fb,dl_dst=1a:16:b1:58:e1:cd,nw_src=172.27.18.244,nw_dst=104.18.3.35,nw_tos=0,nw_ecn=0,nw_ttl=32,nw_frag=no,tp_src=52776,tp_dst=443,tcp_flags=syn

bridge("br-int")
----------------
      thaw
          Resuming from table 15
15. ct_state=+new-est+trk,metadata=0x12, priority 7, cookie 0xa9e0ee6f
      
set_field:0x80000000000000000000000000/0x80000000000000000000000000->xxreg0
      
set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0
      resubmit(,16)
16. conj_id=2865573479,tcp,reg0=0x80/0x80,reg14=0xca,metadata=0x12,
priority 3000, cookie 0xabdec111
      set_field:0x1000000000000/0x1000000000000->xreg4
      set_field:0x2000000000000000000000000/0x2000000000000000000000000->xxreg0
      resubmit(,17)
17. reg8=0x10000/0x10000,metadata=0x12, priority 1000, cookie 0x8171c04a
      set_field:0/0x1000000000000->xreg4
      set_field:0/0x2000000000000->xreg4
      set_field:0/0x4000000000000->xreg4
      resubmit(,18)
18. metadata=0x12, priority 0, cookie 0x62454929
      resubmit(,19)
19. metadata=0x12, priority 0, cookie 0x3bb47080
      resubmit(,20)
20. metadata=0x12, priority 0, cookie 0x73face9d
      resubmit(,21)
21. metadata=0x12, priority 0, cookie 0x8e46634d
      resubmit(,22)
22. metadata=0x12, priority 0, cookie 0xbddac461
      resubmit(,23)
23. metadata=0x12, priority 0, cookie 0x9a32c0de
      resubmit(,24)
24. metadata=0x12, priority 0, cookie 0x79b1c074
      resubmit(,25)
25. metadata=0x12, priority 0, cookie 0xc02374d3
      resubmit(,26)
26. metadata=0x12, priority 0, cookie 0x218dc750
      resubmit(,27)
27. metadata=0x12, priority 0, cookie 0xf6943631
      set_field:0/0x1000000000000->xreg4
      set_field:0/0x2000000000000->xreg4
      set_field:0/0x4000000000000->xreg4
      resubmit(,28)
28. ip,reg0=0x2/0x2002,metadata=0x12, priority 100, cookie 0x7fdca4fb
      
ct(commit,zone=NXM_NX_REG13[0..15],nat(src),exec(set_field:0/0x1->ct_mark))
      nat(src)
      set_field:0/0x1->ct_mark
       -> Sets the packet to an untracked state, and clears all the
conntrack fields.
      resubmit(,29)
29. metadata=0x12, priority 0, cookie 0x34e5fac7
      resubmit(,30)
30. metadata=0x12, priority 0, cookie 0xac3f53bd
      resubmit(,31)
31. metadata=0x12, priority 0, cookie 0x54000bd5
      resubmit(,32)
32. metadata=0x12, priority 0, cookie 0x29ab49f3
      resubmit(,33)
33. metadata=0x12, priority 0, cookie 0xae6aefcb
      resubmit(,34)
34. metadata=0x12, priority 0, cookie 0x632dc93b
      resubmit(,35)
35. metadata=0x12,dl_dst=1a:16:b1:58:e1:cd, priority 50, cookie 0x3fa33935
      set_field:0x1->reg15
      resubmit(,37)
37. priority 0
      resubmit(,39)
39. priority 0
      resubmit(,40)
40. reg15=0x1,metadata=0x12, priority 100, cookie 0x95c785db
      set_field:0x155->reg11
      set_field:0x1cf->reg12
      resubmit(,41)
41. priority 0
      set_field:0->reg0
      set_field:0->reg1
      set_field:0->reg2
      set_field:0->reg3
      set_field:0->reg4
      set_field:0->reg5
      set_field:0->reg6
      set_field:0->reg7
      set_field:0->reg8
      set_field:0->reg9
      resubmit(,42)
42. ip,reg15=0x1,metadata=0x12, priority 110, cookie 0xc9a79824
      resubmit(,43)
43. ip,reg15=0x1,metadata=0x12, priority 110, cookie 0xac7d5d78
      resubmit(,44)
44. metadata=0x12, priority 0, cookie 0xa1ddb4f6
      resubmit(,45)
45. ct_state=-trk,metadata=0x12, priority 5, cookie 0xb2622a65
      
set_field:0x100000000000000000000000000/0x100000000000000000000000000->xxreg0
      
set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0
      resubmit(,46)
46. metadata=0x12, priority 0, cookie 0x267ae0e3
      resubmit(,47)
47. metadata=0x12, priority 0, cookie 0x914392c3
      set_field:0/0x1000000000000->xreg4
      set_field:0/0x2000000000000->xreg4
      set_field:0/0x4000000000000->xreg4
      resubmit(,48)
48. metadata=0x12, priority 0, cookie 0xc4f6a97f
      resubmit(,49)
49. metadata=0x12, priority 0, cookie 0x8ebb0c71
      resubmit(,50)
50. metadata=0x12, priority 0, cookie 0x61e385f4
      resubmit(,51)
51. metadata=0x12, priority 0, cookie 0x4c722ce1
      set_field:0/0x1000->reg10
      resubmit(,75)
      75. No match.
              drop
      move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
       -> NXM_NX_XXREG0[111] is now 0
      resubmit(,52)
52. metadata=0x12, priority 0, cookie 0x3c01b89d
      resubmit(,64)
64. priority 0
      resubmit(,65)
65. reg15=0x1,metadata=0x12, priority 100, cookie 0x95c785db
      
clone(ct_clear,set_field:0->reg11,set_field:0->reg12,set_field:0->reg13,set_field:0x4cc->reg11,set_field:0x383->reg12,set_field:0x13->metadata,set_field:0x1->reg14,set_field:0->reg10,set_field:0->reg15,set_field:0->reg0,set_field:0->reg1,set_field:0->reg2,set_field:0->reg3,set_field:0->reg4,set_field:0->reg5,set_field:0->reg6,set_field:0->reg7,set_field:0->reg8,set_field:0->reg9,resubmit(,8))
      ct_clear
      set_field:0->reg11
      set_field:0->reg12
      set_field:0->reg13
      set_field:0x4cc->reg11
      set_field:0x383->reg12
      set_field:0x13->metadata
      set_field:0x1->reg14
      set_field:0->reg10
      set_field:0->reg15
      set_field:0->reg0
      set_field:0->reg1
      set_field:0->reg2
      set_field:0->reg3
      set_field:0->reg4
      set_field:0->reg5
      set_field:0->reg6
      set_field:0->reg7
      set_field:0->reg8
      set_field:0->reg9
      resubmit(,8)
   8. reg14=0x1,metadata=0x13,dl_dst=1a:16:b1:58:e1:cd, priority 50,
cookie 0x8522832c
      
set_field:0x1a16b158e1cd0000000000000000/0xffffffffffff0000000000000000->xxreg0
      resubmit(,9)
   9. metadata=0x13, priority 0, cookie 0xef4dd8a9
      set_field:0x4/0x4->xreg4
      resubmit(,10)
10. reg9=0x4/0x4,metadata=0x13, priority 100, cookie 0x37718dd7
      resubmit(,79)
      79. 
ip,reg14=0x1,metadata=0x13,dl_src=4e:42:14:a1:2a:fb,nw_src=172.27.18.244,
priority 100, cookie 0x2e544767
              drop
      resubmit(,11)
11. metadata=0x13, priority 0, cookie 0x3c2eb1b
      resubmit(,12)
12. metadata=0x13, priority 0, cookie 0x44f62446
      resubmit(,13)
13. metadata=0x13, priority 0, cookie 0x3842bec6
      resubmit(,14)
14. metadata=0x13, priority 0, cookie 0xfd7b2ab9
      resubmit(,15)
15. metadata=0x13, priority 0, cookie 0xefbd7e27
      resubmit(,16)
16. metadata=0x13, priority 0, cookie 0xf439d853
      resubmit(,17)
17. metadata=0x13, priority 0, cookie 0x123f01f0
      resubmit(,18)
18. metadata=0x13, priority 0, cookie 0x142cd59b
      resubmit(,19)
19. metadata=0x13, priority 0, cookie 0x297e0190
      resubmit(,20)
20. metadata=0x13, priority 0, cookie 0x61e9e3c7
      set_field:0/0xffffffff->xxreg1
      resubmit(,21)
21. ip,reg7=0,metadata=0x13, priority 1, cookie 0x9f114f3d
      dec_ttl()
      set_field:0/0xffff00000000->xreg4
      
set_field:0x64646401000000000000000000000000/0xffffffff000000000000000000000000->xxreg0
      set_field:0x646464030000000000000000/0xffffffff0000000000000000->xxreg0
      set_field:2a:d5:d1:e8:89:cc->eth_src
      set_field:0x2->reg15
      set_field:0x1/0x1->reg10
      resubmit(,22)
22. reg8=0/0xffff,metadata=0x13, priority 150, cookie 0x6ece899a
      resubmit(,23)
23. metadata=0x13, priority 0, cookie 0x72437536
      set_field:0/0xffff00000000->xreg4
      resubmit(,24)
24. reg8=0/0xffff,metadata=0x13, priority 150, cookie 0xce06f1
      resubmit(,25)
25. ip,metadata=0x13, priority 1, cookie 0xf690342c
      push:NXM_NX_REG0[]
      push:NXM_NX_XXREG0[96..127]
      pop:NXM_NX_REG0[]
       -> NXM_NX_REG0[] is now 0x64646401
      set_field:00:00:00:00:00:00->eth_dst
      resubmit(,66)
      66. reg0=0x64646401,reg15=0x2,metadata=0x13, priority 100, cookie 
0x97b4353d
              set_field:00:00:5e:00:01:ff->eth_dst
              set_field:0x40/0x40->reg10
      pop:NXM_NX_REG0[]
       -> NXM_NX_REG0[] is now 0x64646401
      resubmit(,26)
26. metadata=0x13, priority 0, cookie 0x6c2ad4cc
      resubmit(,27)
27. metadata=0x13, priority 0, cookie 0xadc9fb4f
      resubmit(,28)
28. ip,reg15=0x2,metadata=0x13,nw_src=172.27.18.244, priority 100,
cookie 0xfa923492
      set_field:4e:42:14:a1:2a:fb->eth_src
      set_field:0xcc3418740000000000000000/0xffffffff0000000000000000->xxreg0
      resubmit(,29)
29. metadata=0x13, priority 0, cookie 0x571bbf77
      resubmit(,37)
37. priority 0
      resubmit(,39)
39. priority 0
      resubmit(,40)
40. reg15=0x2,metadata=0x13, priority 100, cookie 0x407086d0
      set_field:0x4cc->reg11
      set_field:0x383->reg12
      resubmit(,41)
41. priority 0
      set_field:0->reg0
      set_field:0->reg1
      set_field:0->reg2
      set_field:0->reg3
      set_field:0->reg4
      set_field:0->reg5
      set_field:0->reg6
      set_field:0->reg7
      set_field:0->reg8
      set_field:0->reg9
      resubmit(,42)
42. metadata=0x13, priority 0, cookie 0x4c3b7fcc
      set_field:0/0x10->xreg4
      resubmit(,43)
43. ip,reg15=0x2,metadata=0x13,nw_src=172.27.18.244, priority 100,
cookie 0x540f90b2
      set_field:4e:42:14:a1:2a:fb->eth_src
      ct(table=44,zone=NXM_NX_REG11[0..15],nat)
      nat
       -> A clone of the packet is forked to recirculate. The forked
pipeline will be resumed at table 44.
       -> Sets the packet to an untracked state, and clears all the
conntrack fields.

Final flow: 
recirc_id=0x13b5a,eth,tcp,reg0=0x300,reg11=0x155,reg12=0x1cf,reg13=0x16d,reg14=0xca,reg15=0x1,metadata=0x12,in_port=7753,vlan_tci=0x0000,dl_src=4e:42:14:a1:2a:fb,dl_dst=1a:16:b1:58:e1:cd,nw_src=172.27.18.244,nw_dst=104.18.3.35,nw_tos=0,nw_ecn=0,nw_ttl=32,nw_frag=no,tp_src=52776,tp_dst=443,tcp_flags=syn
Megaflow: 
recirc_id=0x13b5a,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,tcp,in_port=7753,dl_src=4e:42:14:a1:2a:fb,dl_dst=1a:16:b1:58:e1:cd,nw_src=172.27.18.244,nw_dst=104.0.0.0/5,nw_ttl=32,nw_frag=no,tp_src=0x8000/0x8000,tp_dst=0x180/0xffc0
Datapath actions:
ct(commit,zone=365,mark=0/0x1,nat(src)),set(eth(dst=00:00:5e:00:01:ff)),set(ipv4(ttl=31)),ct(zone=1228,nat),recirc(0x13b5c)

===============================================================================
recirc(0x13b5c) - resume conntrack with default ct_state=trk|new (use
--ct-next to customize)
Replacing src/dst IP/ports to simulate NAT:
   Initial flow:
   Modified flow:
===============================================================================

Flow: 
recirc_id=0x13b5c,ct_state=new|trk,ct_zone=1228,eth,tcp,reg10=0x41,reg11=0x4cc,reg12=0x383,reg14=0x1,reg15=0x2,metadata=0x13,in_port=7753,vlan_tci=0x0000,dl_src=4e:42:14:a1:2a:fb,dl_dst=00:00:5e:00:01:ff,nw_src=172.27.18.244,nw_dst=104.18.3.35,nw_tos=0,nw_ecn=0,nw_ttl=31,nw_frag=no,tp_src=52776,tp_dst=443,tcp_flags=syn

bridge("br-int")
----------------
      thaw
          Resuming from table 44
44. metadata=0x13, priority 0, cookie 0x48ef0c42
      resubmit(,45)
45. ct_state=-rpl+trk,ip,reg15=0x2,metadata=0x13,nw_src=172.27.18.244,
priority 161, cookie 0x5414262a
      set_field:4e:42:14:a1:2a:fb->eth_src
      ct(commit,table=46,zone=NXM_NX_REG12[0..15],nat(src=204.52.24.116))
      nat(src=204.52.24.116)
       -> A clone of the packet is forked to recirculate. The forked
pipeline will be resumed at table 46.
       -> Sets the packet to an untracked state, and clears all the
conntrack fields.

Final flow: 
recirc_id=0x13b5c,eth,tcp,reg10=0x41,reg11=0x4cc,reg12=0x383,reg14=0x1,reg15=0x2,metadata=0x13,in_port=7753,vlan_tci=0x0000,dl_src=4e:42:14:a1:2a:fb,dl_dst=00:00:5e:00:01:ff,nw_src=172.27.18.244,nw_dst=104.18.3.35,nw_tos=0,nw_ecn=0,nw_ttl=31,nw_frag=no,tp_src=52776,tp_dst=443,tcp_flags=syn
Megaflow: 
recirc_id=0x13b5c,ct_state=-rpl+trk,eth,ip,in_port=7753,dl_src=4e:42:14:a1:2a:fb,nw_src=172.27.18.244,nw_frag=no
Datapath actions: ct(commit,zone=899,nat(src=204.52.24.116)),recirc(0x25f9b)

===============================================================================
recirc(0x25f9b) - resume conntrack with default ct_state=trk|new (use
--ct-next to customize)
Replacing src/dst IP/ports to simulate NAT:
   Initial flow: nw_src=172.27.18.244,tp_src=52776,nw_dst=104.18.3.35,tp_dst=443
   Modified flow: 
nw_src=204.52.24.116,tp_src=52776,nw_dst=104.18.3.35,tp_dst=443
===============================================================================

Flow: 
recirc_id=0x25f9b,ct_state=new|trk,ct_zone=899,eth,tcp,reg10=0x41,reg11=0x4cc,reg12=0x383,reg14=0x1,reg15=0x2,metadata=0x13,in_port=7753,vlan_tci=0x0000,dl_src=4e:42:14:a1:2a:fb,dl_dst=00:00:5e:00:01:ff,nw_src=204.52.24.116,nw_dst=104.18.3.35,nw_tos=0,nw_ecn=0,nw_ttl=31,nw_frag=no,tp_src=52776,tp_dst=443,tcp_flags=syn

bridge("br-int")
----------------
      thaw
          Resuming from table 46
46. metadata=0x13, priority 0, cookie 0x3800cb88
      resubmit(,47)
47. metadata=0x13, priority 0, cookie 0xc2575be2
      resubmit(,48)
48. reg15=0x2,metadata=0x13, priority 100, cookie 0xaf19a22a
      resubmit(,64)
64. reg10=0x1/0x1,reg15=0x2,metadata=0x13, priority 100, cookie 0x407086d0
      push:NXM_OF_IN_PORT[]
      set_field:ANY->in_port
      resubmit(,65)
      65. reg15=0x2,metadata=0x13, priority 100, cookie 0x407086d0
              
clone(ct_clear,set_field:0->reg11,set_field:0->reg12,set_field:0->reg13,set_field:0x2c2->reg11,set_field:0x327->reg12,set_field:0x1->metadata,set_field:0xa->reg14,set_field:0->reg10,set_field:0->reg15,set_field:0->reg0,set_field:0->reg1,set_field:0->reg2,set_field:0->reg3,set_field:0->reg4,set_field:0->reg5,set_field:0->reg6,set_field:0->reg7,set_field:0->reg8,set_field:0->reg9,resubmit(,8))
              ct_clear
              set_field:0->reg11
              set_field:0->reg12
              set_field:0->reg13
              set_field:0x2c2->reg11
              set_field:0x327->reg12
              set_field:0x1->metadata
              set_field:0xa->reg14
              set_field:0->reg10
              set_field:0->reg15
              set_field:0->reg0
              set_field:0->reg1
              set_field:0->reg2
              set_field:0->reg3
              set_field:0->reg4
              set_field:0->reg5
              set_field:0->reg6
              set_field:0->reg7
              set_field:0->reg8
              set_field:0->reg9
              resubmit(,8)
           8. metadata=0x1, priority 50, cookie 0x1645d3f2
              set_field:0/0x1000->reg10
              resubmit(,73)
              73. No match.
                      drop
              move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
               -> NXM_NX_XXREG0[111] is now 0
              resubmit(,9)
           9. metadata=0x1, priority 0, cookie 0xcc4fd106
              resubmit(,10)
          10. metadata=0x1, priority 0, cookie 0x9e10ad0e
              resubmit(,11)
          11. metadata=0x1, priority 0, cookie 0x557f3249
              resubmit(,12)
          12. metadata=0x1, priority 0, cookie 0x915c56b1
              resubmit(,13)
          13. ip,reg14=0xa,metadata=0x1, priority 110, cookie 0xccdcd3e1
              resubmit(,14)
          14. metadata=0x1, priority 0, cookie 0x134ce32f
              resubmit(,15)
          15. metadata=0x1, priority 65535, cookie 0x49627e5f
              resubmit(,16)
          16. metadata=0x1, priority 65535, cookie 0xc947843d
              resubmit(,17)
          17. metadata=0x1, priority 0, cookie 0xd2fb4d2
              resubmit(,18)
          18. metadata=0x1, priority 0, cookie 0x62454929
              resubmit(,19)
          19. metadata=0x1, priority 0, cookie 0x3bb47080
              resubmit(,20)
          20. metadata=0x1, priority 0, cookie 0x73face9d
              resubmit(,21)
          21. metadata=0x1, priority 0, cookie 0x8e46634d
              resubmit(,22)
          22. metadata=0x1, priority 0, cookie 0xbddac461
              resubmit(,23)
          23. metadata=0x1, priority 0, cookie 0x9a32c0de
              resubmit(,24)
          24. metadata=0x1, priority 0, cookie 0x79b1c074
              resubmit(,25)
          25. metadata=0x1, priority 0, cookie 0xc02374d3
              resubmit(,26)
          26. metadata=0x1, priority 0, cookie 0x218dc750
              resubmit(,27)
          27. metadata=0x1, priority 0, cookie 0x8db4eebc
              resubmit(,28)
          28. metadata=0x1, priority 0, cookie 0x6deecbbe
              resubmit(,29)
          29. metadata=0x1, priority 0, cookie 0x34e5fac7
              resubmit(,30)
          30. metadata=0x1, priority 0, cookie 0xac3f53bd
              resubmit(,31)
          31. metadata=0x1, priority 0, cookie 0x54000bd5
              resubmit(,32)
          32. metadata=0x1, priority 0, cookie 0x29ab49f3
              resubmit(,33)
          33. metadata=0x1, priority 0, cookie 0xae6aefcb
              resubmit(,34)
          34. metadata=0x1, priority 0, cookie 0x632dc93b
              resubmit(,35)
          35. metadata=0x1, priority 0, cookie 0x9961115c
              set_field:0->reg15
              resubmit(,71)
              71. No match.
                      drop
              resubmit(,36)
          36. reg15=0,metadata=0x1, priority 50, cookie 0x62255993
              set_field:0x8001->reg15
              resubmit(,37)
          37. priority 0
              resubmit(,39)
          39. priority 0
              resubmit(,40)
          40. reg15=0x8001,metadata=0x1, priority 100, cookie 0x11699429
              set_field:0x14->reg13
              set_field:0x1->reg15
              resubmit(,41)
              41. priority 0
                      set_field:0->reg0
                      set_field:0->reg1
                      set_field:0->reg2
                      set_field:0->reg3
                      set_field:0->reg4
                      set_field:0->reg5
                      set_field:0->reg6
                      set_field:0->reg7
                      set_field:0->reg8
                      set_field:0->reg9
                      resubmit(,42)
                  42. metadata=0x1, priority 0, cookie 0x9ad1480d
                      resubmit(,43)
                  43. ip,reg15=0x1,metadata=0x1, priority 110, cookie 0xca4977d3
                      ct_clear
                      resubmit(,44)
                  44. metadata=0x1, priority 0, cookie 0xa1ddb4f6
                      resubmit(,45)
                  45. metadata=0x1, priority 65535, cookie 0x91b4350e
                      resubmit(,46)
                  46. metadata=0x1, priority 65535, cookie 0xfc9c351d
                      resubmit(,47)
                  47. metadata=0x1, priority 0, cookie 0xb984696e
                      resubmit(,48)
                  48. metadata=0x1, priority 0, cookie 0xc4f6a97f
                      resubmit(,49)
                  49. metadata=0x1, priority 0, cookie 0x8ebb0c71
                      resubmit(,50)
                  50. metadata=0x1, priority 0, cookie 0x61e385f4
                      resubmit(,51)
                  51. metadata=0x1, priority 0, cookie 0x4c722ce1
                      set_field:0/0x1000->reg10
                      resubmit(,75)
                      75. No match.
                              drop
                      move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
                       -> NXM_NX_XXREG0[111] is now 0
                      resubmit(,52)
                  52. metadata=0x1, priority 0, cookie 0x3c01b89d
                      resubmit(,64)
                  64. priority 0
                      resubmit(,65)
                  65. reg15=0x1,metadata=0x1, priority 100, cookie 0x4581da82
                      push_vlan:0x8100
                      set_field:4216->vlan_vid
                      output:7754

                      bridge("br-provider")
                      ---------------------
                           0. priority 0
                              NORMAL
                               -> forwarding to learned port
                      pop_vlan
              set_field:0x8001->reg15
      pop:NXM_OF_IN_PORT[]
       -> NXM_OF_IN_PORT[] is now 7753

Final flow: unchanged
Megaflow: 
recirc_id=0x25f9b,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,ip,in_port=7753,dl_src=4e:42:14:a1:2a:fb,dl_dst=00:00:5e:00:01:ff,nw_dst=0.0.0.0/1,nw_frag=no
Datapath actions: ct_clear,push_vlan(vid=120,pcp=0),5

On Wed, 17 Apr 2024 at 17:32, Gavin McKee <gavmcke...@googlemail.com> wrote:

That information is all in the email

The openflow trace is showing that the pipeline is fine . This is why I’m 
worried about a deeper issue with the kernal / openvswitch kernal module / 
connection tracking



On Wed, Apr 17, 2024 at 16:33 Flavio Leitner <f...@sysclose.org> wrote:

On Wed, 17 Apr 2024 12:26:27 -0700
Gavin McKee <gavmcke...@googlemail.com> wrote:

Hi Flavio,

I had to restart the Open vSwitch across 16 machines to resolve the
issue for a customer .  I think it will occur again and when it does
I'll use that command to gather the tc information.

Until then I think I have found why the issue is occurring .

Take a look at the output below (this is a packet capture from the
physical interface on the compute node , so traffic that has gone
through the OVS Openflow pipeline) - we make a 3 way handshake with R2
, and establish the connection.  A packet goes missing - TLS
handshake, it then appears that it hasn't gone through NAT as it's
using the Private IP of the VM  .


If that is the case, you might be able to see if the data path
is matching correctly and the actions are using NAT with
ovs-appctl ofproto/trace.

fbl



Take a look at frame 14

No.     Time               Source                Destination
Protocol Length Info
           Delta
       14 09:24:08.064432    172.27.18.244         104.18.2.35
TLSv1    502    Client Hello, Alert (Level: Fatal, Description: Decode
Error)   2.362983

Frame 14: 502 bytes on wire (4016 bits), 502 bytes captured (4016
bits) Ethernet II, Src: 4e:42:14:a1:2a:fb (4e:42:14:a1:2a:fb), Dst:
IETF-VRRP-VRID_ff (00:00:5e:00:01:ff)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 120
Internet Protocol Version 4, Src: 172.27.18.244, Dst: 104.18.2.35
Transmission Control Protocol, Src Port: 57394, Dst Port: 443, Seq: 1,
Ack: 1, Len: 444
Transport Layer Security
      TLSv1 Record Layer: Handshake Protocol: Client Hello
          Content Type: Handshake (22)
          Version: TLS 1.0 (0x0301)
          Length: 432
          Handshake Protocol: Client Hello
      TLSv1 Record Layer: Alert (Level: Fatal, Description: Decode
Error) Content Type: Alert (21)
          Version: TLS 1.0 (0x0301)
          Length: 2
          Alert Message
              Level: Fatal (2)
              Description: Decode Error (50)



------------------------------------------------------------------------------------------------------------

On Wed, 17 Apr 2024 at 11:28, Flavio Leitner <f...@sysclose.org> wrote:


Hi Gavin,

It would be helpful if you can provide some TC dumps from the
"good" state to the "bad" state to see how it was and what changes.
Something like:

# tc -s filter show dev enp148s0f0_1 ingress

I haven't checked the attached files, but one suggestion is to
check if this is not a csum issue.

Thanks,
fbl


On Tue, 16 Apr 2024 13:17:10 -0700
Gavin McKee via discuss <ovs-discuss@openvswitch.org> wrote:

Adding information relating to the Open VSwitch kernal module
@Ilya Maximets @Numan Siddique  Can either of you help out here?


modinfo openvswitch
filename:
/lib/modules/5.14.0-362.8.1.el9_3.x86_64/kernel/net/openvswitch/openvswitch.ko.xz
alias:          net-pf-16-proto-16-family-ovs_ct_limit
alias:          net-pf-16-proto-16-family-ovs_meter
alias:          net-pf-16-proto-16-family-ovs_packet
alias:          net-pf-16-proto-16-family-ovs_flow
alias:          net-pf-16-proto-16-family-ovs_vport
alias:          net-pf-16-proto-16-family-ovs_datapath
license:        GPL
description:    Open vSwitch switching datapath
rhelversion:    9.3
srcversion:     8A2159D727C8BADC82261B8
depends:        nf_conntrack,nf_conncount,libcrc32c,nf_nat
retpoline:      Y
intree:         Y
name:           openvswitch
vermagic:       5.14.0-362.8.1.el9_3.x86_64 SMP preempt mod_unload
modversions sig_id:         PKCS#7
signer:         Rocky kernel signing key
sig_key:
17:CA:DE:1F:EC:D1:59:2D:9F:52:34:C6:7C:09:06:81:3D:74:7C:F7
sig_hashalgo:   sha256 signature:
67:31:56:70:86:DB:57:69:8D:4A:9B:A7:ED:17:F3:67:65:98:97:08:
1F:FB:4D:F8:A8:2D:7C:A7:7D:3A:57:85:CA:67:9D:82:72:EB:54:14:
F2:BB:40:78:AD:85:56:2D:EF:D5:00:95:38:A4:86:9F:5F:29:1A:81:
32:94:B4:87:41:94:A0:3E:71:A5:97:44:2E:42:DD:F7:42:6B:69:94:
E3:AB:6E:E5:4F:C9:60:57:70:07:5F:CA:C7:83:7A:2F:C7:81:62:FF:
53:AF:AC:2B:06:D8:08:D3:1D:A7:F0:43:10:98:DE:B1:62:AE:89:A5:
FE:EF:74:09:0F:2D:0F:D9:73:A5:59:75:D0:87:1E:EA:3A:40:86:1E:
76:E5:E7:3B:59:2E:3A:7E:65:F3:92:A1:B4:84:48:3F:43:A0:D7:1C:
21:29:E0:B6:D1:10:36:15:88:43:6A:11:8F:55:EE:1B:F9:53:3B:86:
EF:81:71:17:81:08:EC:53:30:D6:69:8E:13:11:D5:DF:15:75:88:50:
69:19:51:3B:41:6B:6F:E0:7A:30:33:32:E6:60:18:02:A6:0C:63:9B:
C5:D7:2F:6A:D0:BA:45:03:19:0E:21:E8:18:FB:E8:D1:C1:33:05:36:
1F:9B:0F:29:3F:05:51:7A:30:86:88:B7:C7:44:2E:2B:50:F9:EF:4F:
D4:70:EA:1B:33:E2:F0:E3:E2:88:00:E5:BF:06:E2:D4:B7:81:EE:6E:
89:02:18:65:8B:1C:84:42:2F:89:14:63:1D:51:70:37:42:C5:68:DD:
4D:12:7B:07:33:2B:C6:BC:8F:7F:23:D7:58:DF:47:AC:DE:08:67:FE:
CB:E8:E6:4D:95:2F:6B:F5:07:4D:32:92:80:0A:7C:D1:B6:81:EE:AB:
26:C3:C6:22:77:00:5E:64:DE:96:0E:9F:A4:A0:F0:45:9F:19:73:EB:
CC:60:AE:E9:63:E2:6D:2E:BA:65:9B:BD:04:CC:13:C2:55:88:05:03:
1B:30:18:8B

On Tue, 16 Apr 2024 at 11:12, Gavin McKee
<gavmcke...@googlemail.com> wrote:

Hi,

I need some help with strange OVS behaviours.

ovs-vsctl (Open vSwitch) 3.2.2
ovn-controller 23.09.1
Open vSwitch Library 3.2.2

TLDR: We need to restart Open VSwitch in order for TLS traffic
to work between a VM and Cloudflare R2.  After restarting Open
VSwitch the TLS connection works fine.
(see attached pcap tls-error.txt)

See the attached openflow traces - they show a flow trace from
Open Vswitch.

Also there is a retis trace (retis tool discussed at Open
VSwitch conference 2023).

Note the drop (TC_INGRESS) in this file
    + 1702601116185568 [swapper/140] 0 [tp] skb:kfree_skb
#60c81b6b91e2cff284fb3a3d65800 (skb 18386033671255367680) n 3
drop (TC_INGRESS)
      if 21 (enp148s0f0_1) rxif 21 172.27.18.244.57394 >
104.18.2.35.443 ttl 63 tos 0x0 id 26162 off 0 [DF] len 477 proto
TCP (6) flags [P.] seq 792060930:792061367 ack 951229219 win 11

Again , once I restart Open vSwitch the problem goes away for a
time and comes back sometime later (not sure what that time
frame is but its a recurring issue.)
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss



--
fbl



--
fbl
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss



_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to