[Bug 243126] Assertion fl->ifl_cidx == cidx failed at /usr/src/sys/net/iflib.c:2531

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243126

Andriy Gapon  changed:

   What|Removed |Added

 CC||pkel...@freebsd.org

--- Comment #1 from Andriy Gapon  ---
Correction, the panic happened with vmxnet3 network driver.
The VM was later switched to em as a workaround and that got me confused.

Some data from the crash:
(kgdb) fr 14
#14 0x808b721e in rxd_frag_to_sd (rxq=0xfe3fe000,
irf=, unload=, sd=0xfe0011a54900, pf_rv=0xfe0011a549b0,
ri=0xfe0011a54960)
at /usr/src/sys/net/iflib.c:2531
2531/usr/src/sys/net/iflib.c: No such file or directory.
(kgdb) p cidx
$1 = 142

(kgdb) p *rxq
$3 = {ifr_ctx = 0xf80002d22400, ifr_fl = 0xf80002d1a000, ifr_rx_irq =
0, pfil = 0xf8000436fb80, ifr_cq_cidx = 477, ifr_id = 0, ifr_nfl = 2
'\002', ifr_ntxqirq = 1 '\001', ifr_txqid = "\000\000\000",
  ifr_fl_offset = 1 '\001', ifr_lc = {ifp = 0xf80002d1a800, lro_mbuf_data =
0xfe00da9c4000, lro_queued = 1753531, lro_flushed = 339087, lro_bad_csum =
0, lro_cnt = 8, lro_mbuf_count = 0, lro_mbuf_max = 256,
lro_ackcnt_lim = 65535, lro_length_lim = 65535, lro_hashsz = 251, lro_hash
= 0xf80002fa5000, lro_active = {lh_first = 0x0}, lro_free = {lh_first =
0xfe00da9c5360}}, ifr_task = {gt_task = {ta_link = {stqe_next = 0x0},
  ta_flags = 2, ta_priority = 0, ta_func = 0x808b0bd0
<_task_fn_rx>, ta_context = 0xfe3fe000}, gt_taskqueue =
0xf80002922600, gt_list = {le_next = 0x0, le_prev = 0xfe00117e98a8},
gt_uniq = 0xfe3fe000, gt_name = "rxq0", '\000' ,
gt_dev = 0xf80002d3b000, gt_irq = 0xf80002d17900, gt_cpu = 0},
ifr_filter_info = {ifi_filter = 0x80af8510 ,
ifi_filter_arg = 0xf80002fab800, ifi_task = 0xfe3fe090, ifi_ctx
= 0xfe3fe000}, ifr_ifdi = 0xf80002d17d80, ifr_frags = {{irf_flid =
0 '\000', irf_idx = 142, irf_len = 1514}, {irf_flid = 1 '\001',
  irf_idx = 76, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 77,
irf_len = 1762}, {irf_flid = 1 '\001', irf_idx = 53, irf_len = 1038}, {irf_flid
= 1 '\001', irf_idx = 114, irf_len = 2048}, {irf_flid = 1 '\001',
  irf_idx = 115, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 116,
irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 117, irf_len = 2048},
{irf_flid = 1 '\001', irf_idx = 118, irf_len = 2048}, {irf_flid = 1 '\001',
  irf_idx = 119, irf_len = 1906}, {irf_flid = 1 '\001', irf_idx = 184,
irf_len = 1306}, {irf_flid = 1 '\001', irf_idx = 17, irf_len = 706}, {irf_flid
= 1 '\001', irf_idx = 3, irf_len = 2048}, {irf_flid = 1 '\001',
  irf_idx = 4, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 5, irf_len
= 2048}, {irf_flid = 1 '\001', irf_idx = 6, irf_len = 1202}, {irf_flid = 0
'\000', irf_idx = 0, irf_len = 0} }}

(kgdb) p *ri
$4 = {iri_qsidx = 0, iri_vtag = 0, iri_len = 1514, iri_cidx = 477, iri_ifp =
0xf80002d1a800, iri_frags = 0xfe3fe140, iri_flowid = 600473664,
iri_csum_flags = 251658240, iri_csum_data = 65535, iri_flags = 0 '\000',
  iri_nfrags = 1 '\001', iri_rsstype = 130 '\202', iri_pad = 0 '\000'}


(kgdb) fr 17
#17 iflib_rxeof (rxq=, budget=16) at
/usr/src/sys/net/iflib.c:2803
2803in /usr/src/sys/net/iflib.c
(kgdb) i loc
ctx = 
scctx = 
lro_possible = 
v4_forwarding = 
v6_forwarding = 
sctx = 0x810e7780 
rx_pkts = 1
rx_bytes = 1514
mh = 0x0
mt = 0x0
ifp = 0xf80002d1a800
cidxp = 0xfe3fe020
avail = 17
i = 
fl = 
m = 0x0
budget_left = 16
ri = 
err = 
mf = 
lro_enabled = 

(kgdb) p *cidxp
$1 = 477

(kgdb) p *$5.ifc_sctx
$7 = {isc_magic = 3405705229, isc_driver = 0x810e7900
, isc_q_align = 512, isc_tx_maxsize = 65536,
isc_tx_maxsegsize = 16383, isc_tso_maxsize = 65550, isc_tso_maxsegsize = 16383,
  isc_rx_maxsize = 16383, isc_rx_maxsegsize = 16383, isc_rx_nsegments = 1,
isc_admin_intrcnt = 1, isc_vendor_info = 0x810e7930
, isc_driver_version = 0x80ba63e4 "2",
  isc_parse_devinfo = 0x0, isc_nrxd_min = {32, 32, 32, 0, 0, 0, 0, 0},
isc_nrxd_default = {256, 256, 256, 0, 0, 0, 0, 0}, isc_nrxd_max = {2048, 2048,
2048, 0, 0, 0, 0, 0}, isc_ntxd_min = {32, 32, 0, 0, 0, 0, 0, 0},
  isc_ntxd_default = {512, 512, 0, 0, 0, 0, 0, 0}, isc_ntxd_max = {4096, 4096,
0, 0, 0, 0, 0, 0}, isc_nfl = 2, isc_ntxqs = 2, isc_nrxqs = 3, __spare0__ = 0,
isc_tx_reclaim_thresh = 0, isc_flags = 9, isc_name = 0x0}

(kgdb) p *rxq->ifr_ctx
$5 = {ops = 0xf80002d18000, ifc_softc = 0xf80002d22000, ifc_dev =
0xf80002d3b000, ifc_ifp = 0xf80002d1a800, ifc_cpus = {__bits = {255, 0,
0, 0}}, ifc_sctx = 0x810e7780 ,
  ifc_softc_ctx = {isc_vectors = 9, isc_nrxqsets = 8, isc_ntxqsets = 8,
__spare0__ = 0, __spare1__ = 0, isc_msix_bar = 24, isc_tx_nsegments = 32,
isc_ntxd = {512, 512, 0, 0, 0, 0, 0, 0}, isc_nrxd = {512, 256, 256, 0, 0, 0, 0,
  0}, isc_txqsizes = {8192, 8192, 0, 0, 0, 0, 0, 0}, isc_rxqsizes = {8192,
4096, 409

[Bug 243126] Assertion fl->ifl_cidx == cidx failed at /usr/src/sys/net/iflib.c:2531

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243126

--- Comment #2 from Andriy Gapon  ---
Looking at vmxnet3_isc_rxd_pkt_get, I see that both iri_cidx and ifr_cq_cidx
will be set to cqidx after the last fragment.  irf_idx of each fragment is set
to rxd_idx of the rspective descriptor.

Getting back to the assertion in rxd_frag_to_sd():

(kgdb) p *ri
$4 = {iri_qsidx = 0, iri_vtag = 0, iri_len = 1514, iri_cidx = 477, iri_ifp =
0xf80002d1a800, iri_frags = 0xfe3fe140, iri_flowid = 600473664,
iri_csum_flags = 251658240, iri_csum_data = 65535, iri_flags = 0 '\000',
  iri_nfrags = 1 '\001', iri_rsstype = 130 '\202', iri_pad = 0 '\000'}

(kgdb) p ri->iri_frags[0]
$13 = {irf_flid = 0 '\000', irf_idx = 142, irf_len = 1514}

So, irf_idx = 142 in the first and the only fragment.
irf_flid  is zero, so:

$15 = {ifl_cidx = 141, ifl_pidx = 140, ifl_credits = 254, ifl_gen = 1 '\001',
ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xf80002daa8c0, ifl_fragidx = 139,
ifl_size = 256, ifl_buf_size = 2048, ifl_cltype = 1, 
  ifl_zone = 0xf80002957000, ifl_sds = {ifsd_map = 0xf80002d2d800,
ifsd_m = 0xf80002d1b800, ifsd_cl = 0xf80002d1b000, ifsd_ba =
0xf80002d1e000}, ifl_rxq = 0xfe3fe000, ifl_id = 0 '\000', 
  ifl_buf_tag = 0xf80002cf8100, ifl_ifdi = 0xf80002d17da8,
ifl_bus_addrs = {32051296256, 32050450432, 31627747328, 31625582592,
31637278720, 6269554688, 6269536256, 6272448512, 6512416768, 6269433856,
5922217984, 
6273015808, 31638720512, 6261741568, 6268790784, 29797609472, 6271569920,
6271576064, 6271574016, 6271580160, 6271578112, 6271584256, 6271582208,
6271588352, 6271739904, 6271733760, 6271735808, 6271729664, 6271731712, 
6271725568, 6271727616, 6271721472}, ifl_vm_addrs = {0xf8077667f800 ...

So, ifl_cidx is 141 and that's the actual problem.
477 that I mentioned in the previous comment is irrelevant.
The real problem is ifl_cidx != irf_idx, 141 != 142.

Still no clue how that could happen.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 243126] Assertion fl->ifl_cidx == cidx failed at /usr/src/sys/net/iflib.c:2531

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243126

Andriy Gapon  changed:

   What|Removed |Added

 Status|New |Open

--- Comment #3 from Andriy Gapon  ---
I wonder if irf_idx can "jump ahead" of ifl_cidx because of a skipped
zero-length packet?

Hmm, it seems like it could be the case.
Given the current value of iri_cidx/ifr_cq_cidx, I examined three descriptors
before that index:

(kgdb) p $19.vxcr_u.rxcd[476]
$21 = {rxd_idx = 142, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 2,
no_csum = 0, pad2 = 0, rss_hash = 600473664, len = 1514, error = 0, vlan = 0,
vtag = 0, csum = 0, csum_ok = 1, udp = 0, tcp = 1, ipcsum_ok = 1, ipv6 = 0, 
  ipv4 = 1, fragment = 0, fcs = 0, type = 3, gen = 1}

(kgdb) p $19.vxcr_u.rxcd[475]
$22 = {rxd_idx = 141, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0,
no_csum = 0, pad2 = 0, rss_hash = 0, len = 0, error = 0, vlan = 0, vtag = 0,
csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, 
  fragment = 0, fcs = 0, type = 3, gen = 1}

(kgdb) p $19.vxcr_u.rxcd[474]
$23 = {rxd_idx = 140, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 2,
no_csum = 0, pad2 = 0, rss_hash = 600473664, len = 66, error = 0, vlan = 0,
vtag = 0, csum = 0, csum_ok = 1, udp = 0, tcp = 1, ipcsum_ok = 1, ipv6 = 0, 
  ipv4 = 1, fragment = 0, fcs = 0, type = 3, gen = 1}

The descriptor at 476 with rxd_idx = 142 seems like the current packet.
And the previous descriptor at 475 with rxd_idx = 141 is a zero-length packet:
eop = 1, sop = 1, len = 0.
The packet before it is a normal packet again: eop = 1, sop = 1,  len = 66.

Now, how to fix the problem?
I see two ways:
- a driver can notify iflib of a zero length packet (via a new callback), so
that iflib can skip the corresponding entry in the appropriate free list
- rxd_frag_to_sd() can skip through fl entries until ifl_cidx becomes equal to
irf_idx

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 243126] Assertion fl->ifl_cidx == cidx failed at /usr/src/sys/net/iflib.c:2531

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243126

--- Comment #4 from Andriy Gapon  ---
I wonder if vmxnet3_isc_rxd_pkt_get() should not hide those zero-length
packets.
It could return packets with irf_len = 0.
I see that there is some provision for that case in assemble_segments(), but
not sure if it is really expected.
Especially, in rxd_frag_to_sd().

Could any of iflib developers please help?
Thank you!

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


ssh command hang

2020-01-07 Thread Bejiita78 .
has anyone ever noticed that locally a system may respond just fine, but
running a command like port make install or top would cause the ssh session
to hang indefinitely?
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 236724] igb(4): Interfaces fail to switch active to inactive state

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236724

Marius Strobl  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|Open|Closed

--- Comment #34 from Marius Strobl  ---
(In reply to Vinícius Zavam from comment #32)

The fix for this PR, i. e. link state change detection for interfaces in the up
state, didn't make it into 12.1 as RC3 was cancelled, unfortunately. Disabling
the use of MSI-X as described in comment 32 is a viable workaround, though.

Comment 20 describes an orthogonal bug consisting in link status being reported
for interfaces in the down state, while the expected behavior for an interface
in this state is that no link status is reported and that - unless WOL is
enabled - its PHY(s) is/are shut down.

I'm closing this PR again as the regression it's about has been fixed and I
won't file an EN request for the fix.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 236724] igb(4): Interfaces fail to switch active to inactive state

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236724

--- Comment #35 from Marius Strobl  ---
Sorry, typo; the workaround actually has been described in comment 31.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 243096] netgraph ng_nat example causes panic on CURRENT

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243096

Mark Johnston  changed:

   What|Removed |Added

 CC||ma...@freebsd.org

--- Comment #2 from Mark Johnston  ---
I haven't been able to reproduce this on head in a VM with a vtnet interface or
a machine using igb, both acting as NFS clients.  Do you know what kind of
network traffic your system is seeing?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 243096] netgraph ng_nat example causes panic on CURRENT

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243096

--- Comment #3 from Mark Johnston  ---
I haven't been able to reproduce this on head in a VM with a vtnet interface or
a machine using igb, both acting as NFS clients.  Do you know what kind of
network traffic your system is seeing?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 243096] netgraph ng_nat example causes panic on CURRENT

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243096

--- Comment #4 from Mark Johnston  ---
(In reply to Mark Johnston from comment #3)
Hmm never mind, it just took a little while. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 243096] netgraph ng_nat example causes panic on CURRENT

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243096

Mark Johnston  changed:

   What|Removed |Added

 Status|New |In Progress
   Assignee|n...@freebsd.org |ma...@freebsd.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: ssh command hang

2020-01-07 Thread Ryan Rawdon

> On Jan 7, 2020, at 3:30 PM, Bejiita78 .  wrote:
> 
> has anyone ever noticed that locally a system may respond just fine, but
> running a command like port make install or top would cause the ssh session
> to hang indefinitely?

This is a common sign of a MTU mismatch on a network segment somewhere between 
your client and the server (large segments/packets/frames go into a black hole 
and nobody knows); or the path has a properly-configured reduced MTU, but the 
server is sending the traffic with the Don’t Fragment bit set (IP header); but 
the device in the path dropping it due to a smaller MTU is not successfully 
having Packet Too Big ICMP errors get back to the server.  

If you perform a packet capture on the server, you will likely see it 
retransmitting one or more segments over and over - but not see those arriving 
to the client.  

The approach to diagnosing the point of the issue being introduced (MTU 
mismatch, ICMP filtering, or the server not utilizing ICMP PTB responses 
properly) depends largely on the network topology between your client and 
server; and your ability to investigate or reproduce the symptoms in systems 
along that path.

There are plenty of other potential causes for this behavior, but this is the 
first one I would investigate if experiencing this issue.  Have there been any 
network changes near your client or server that might have meddled with MTU 
sizes or ICMP blocking?

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 230807] if_alc.ko driver not working for Killer Networking E2200

2020-01-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230807

--- Comment #6 from Mark Millard  ---
(In reply to Mark Millard from comment #5)

While I was not looking for such at the time,
I noticed somewhat after switching to non-NUMA
on the ThreadRipper that the E2500 had started
working.

I'm not aware of any other configuration change
that would be likely to have contributed.

It has been working ever since I switched to
non-NUMA. I waited to see if it would stay
operational for a time (weeks) before making
this comment.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"