On 10/5/2021 4:11 PM, Tudor Cornea wrote:
Hi Ferruh,
I have attempted to narrow down the issue.
I have the following bash script, which computes packet rates on an
interface.
[root@localhost ~]# cat compute-rates.sh
#!/usr/bin/env bash
if [[ ${#} -ne 2 ]]; then
echo "Usage: ${0} <iface-name> <sleep-interval-seconds>"
exit 1
fi
IFACE_NAME="${1}"
SLEEP_INTERVAL_SECONDS="${2}"
TMP_STATS_FILE="/tmp/netstat"
# Clear Previous stats file
echo "0 0 0 0" > "${TMP_STATS_FILE}"
echo "Press CTRL+C to exit..."
while true; do
export "RxB=0" "RxP=0" "TxB=0" "TxP=0"
# Extract Rx{Bytes,Packets} and Tx{Bytes,Packets} and
# format the output. Individual fields will be exported
export $(\
ifconfig "${IFACE_NAME}" \
| grep 'packets' \
| awk '{print $5, $3}' \
| xargs echo \
| sed -E -e \
"s/([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)/RxB=\1 RxP=\2
TxB=\3 TxP=\4/")
# Print Packet and Byte Rates
# Format: | Rx Bytes | Rx Packets | Tx Bytes | Tx Packets |
echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" $(cat "${TMP_STATS_FILE}") \
| awk '{print "RxB="$1-$5, "RxP="$2-$6, "TxB="$3-$7, "TxP="$4-$8}'
# Save the new values
echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" > "${TMP_STATS_FILE}"
sleep "${SLEEP_INTERVAL_SECONDS}"
done
On the transmit side, I'm using the engine behind [1] with the af_packet
PMD.
The configuration for the af_packet PMD is the following:
--vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
I'm configuring a Tx rate of 335 packets / second and a packet size of 300
Bytes.
These seem to be the values using which we seem to have better chances of
seeing the problem. I suspect it might also be linked with the af_packet
configuration.
I'm starting traffic using the specified configuration, and in parallel,
running the script that computes the rates as follows:
./compute-rates.sh eth1 0.1
Initially, the packet rates seem steady
RxB=0 RxP=0 TxB=10952 TxP=37
RxB=0 RxP=0 TxB=10656 TxP=36
RxB=0 RxP=0 TxB=10656 TxP=36
RxB=0 RxP=0 TxB=10656 TxP=36
RxB=0 RxP=0 TxB=10952 TxP=37
RxB=0 RxP=0 TxB=10952 TxP=37
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10952 TxP=37
[...]
After a while, we toggle the interface up / down with a sleep between the
steps. I suspect the length of the sleep might be a variable in the
equation.
ifconfig eth1 down; sleep 7; ifconfig eth1 up
What we see, is that even after the interface is toggled back up, the rates
never seem to recover.
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=2072 TxP=7
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=521256 TxP=1761
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
[...]
I've attempted to mirror the same behavior using dpdk-pktgen [2] on a
different machine (Ubuntu 20.04). This time, af_packet runs on top of
a Linux virtio_net interface.
I seem to be getting a similar behavior. I have used the following
dpdk-pktgen configuration and run-time settings
pktgen \
-l 1-4 \
-n 4 \
--proc-type=primary \
--no-pci \
--no-telemetry \
--no-huge \
-m 512 \
--vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
\
-- \
-P \
-T \
-m "3.0" \
-f themes/black-yellow.theme
set 0 size 300
set 0 rate 0.008
set 0 burst 1
start 0
[1] https://github.com/open-traffic-generator/ixia-c
[2] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md
On Wed, 29 Sept 2021 at 13:03, Tudor Cornea <tudor.cor...@gmail.com> wrote:
Hi Tudor,
I have used testpmd, 'txonly' forwarding. Tx recovers after interface up,
but by adding some debug logs I can see 'poll()' returns with POLLOUT even
there is no space in the buffer.
According the logic in the PMD, when 'poll()' returns success, it expects
to have some space in the Tx buffer.
So I agree to add the check.
Only have a question on the POLLERR, should we separate the POLLERR check
to cover ifdown case, what do you think about following logic:
if (!TP_STATUS_AVAILABLE) {
if (poll() < 0)
break;
if (pfd.revents & POLLERR)
break;
}
if (!TP_STATUS_AVAILABLE)
break;
Hi Ferruh,
What you described above looks like a ring buffer with single producer and
single consumer, and producer overwrites the not consumed items.
Indeed. This is also my understanding of the bug.
I am going to try to isolate the issue, and should probably be able to
come up with a script in a few days.
Our of curiosity, are you using an modified af_packet implementation in
kernel
for above described usage?
We are currently using an Ubuntu-based distro with a 4.15 Linux kernel.
We don't have any kernel patches for the af_packet implementation to my
knowledge (probably excepting patches that are back-ported by Ubuntu
maintainers from newer releases).
On Mon, 20 Sept 2021 at 20:44, Ferruh Yigit <ferruh.yi...@intel.com>
wrote:
On 9/13/2021 2:45 PM, Tudor Cornea wrote:
The poll call can return POLLERR which is ignored, or it can return
POLLOUT, even if there are no free frames in the mmap-ed area.
We can account for both of these cases by re-checking if the next
frame is empty before writing into it.
Signed-off-by: Mihai Pogonaru <pogonarumi...@gmail.com>
Signed-off-by: Tudor Cornea <tudor.cor...@gmail.com>
---
drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
b/drivers/net/af_packet/rte_eth_af_packet.c
index b73b211..087c196 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
**bufs, uint16_t nb_pkts)
(poll(&pfd, 1, -1) < 0))
break;
+ /*
+ * Poll can return POLLERR if the interface is down
+ *
+ * It will almost always return POLLOUT, even if there
+ * are no extra buffers available
+ *
+ * This happens, because packet_poll() calls
datagram_poll()
+ * which checks the space left in the socket buffer and,
+ * in the case of packet_mmap, the default socket buffer
length
+ * doesn't match the requested size for the tx_ring.
+ * As such, there is almost always space left in socket
buffer,
+ * which doesn't seem to be correlated to the requested
size
+ * for the tx_ring in packet_mmap.
+ *
+ * This results in poll() returning POLLOUT.
+ */
+ if (ppd->tp_status != TP_STATUS_AVAILABLE)
+ break;
+
If 'POLLOUT' doesn't indicate that there is space in the buffer, what is
the
point of the 'poll()' at all?
What can we test/reproduce the mentioned behavior? Or is there a way to
fix the
behavior of poll() or use an alternative of it?
OK to break on the 'POLLERR', I guess it can be detected in the
'pfd.revent'.
/* copy the tx frame data */
pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
sizeof(struct sockaddr_ll);