Hi Ferruh, I have attempted to narrow down the issue. I have the following bash script, which computes packet rates on an interface.
[root@localhost ~]# cat compute-rates.sh #!/usr/bin/env bash if [[ ${#} -ne 2 ]]; then echo "Usage: ${0} <iface-name> <sleep-interval-seconds>" exit 1 fi IFACE_NAME="${1}" SLEEP_INTERVAL_SECONDS="${2}" TMP_STATS_FILE="/tmp/netstat" # Clear Previous stats file echo "0 0 0 0" > "${TMP_STATS_FILE}" echo "Press CTRL+C to exit..." while true; do export "RxB=0" "RxP=0" "TxB=0" "TxP=0" # Extract Rx{Bytes,Packets} and Tx{Bytes,Packets} and # format the output. Individual fields will be exported export $(\ ifconfig "${IFACE_NAME}" \ | grep 'packets' \ | awk '{print $5, $3}' \ | xargs echo \ | sed -E -e \ "s/([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)/RxB=\1 RxP=\2 TxB=\3 TxP=\4/") # Print Packet and Byte Rates # Format: | Rx Bytes | Rx Packets | Tx Bytes | Tx Packets | echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" $(cat "${TMP_STATS_FILE}") \ | awk '{print "RxB="$1-$5, "RxP="$2-$6, "TxB="$3-$7, "TxP="$4-$8}' # Save the new values echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" > "${TMP_STATS_FILE}" sleep "${SLEEP_INTERVAL_SECONDS}" done On the transmit side, I'm using the engine behind [1] with the af_packet PMD. The configuration for the af_packet PMD is the following: --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0 I'm configuring a Tx rate of 335 packets / second and a packet size of 300 Bytes. These seem to be the values using which we seem to have better chances of seeing the problem. I suspect it might also be linked with the af_packet configuration. I'm starting traffic using the specified configuration, and in parallel, running the script that computes the rates as follows: ./compute-rates.sh eth1 0.1 Initially, the packet rates seem steady RxB=0 RxP=0 TxB=10952 TxP=37 RxB=0 RxP=0 TxB=10656 TxP=36 RxB=0 RxP=0 TxB=10656 TxP=36 RxB=0 RxP=0 TxB=10656 TxP=36 RxB=0 RxP=0 TxB=10952 TxP=37 RxB=0 RxP=0 TxB=10952 TxP=37 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=10952 TxP=37 [...] After a while, we toggle the interface up / down with a sleep between the steps. I suspect the length of the sleep might be a variable in the equation. ifconfig eth1 down; sleep 7; ifconfig eth1 up What we see, is that even after the interface is toggled back up, the rates never seem to recover. RxB=0 RxP=0 TxB=0 TxP=0 RxB=0 RxP=0 TxB=0 TxP=0 RxB=0 RxP=0 TxB=0 TxP=0 RxB=0 RxP=0 TxB=0 TxP=0 RxB=0 RxP=0 TxB=2072 TxP=7 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=10360 TxP=35 RxB=0 RxP=0 TxB=521256 TxP=1761 RxB=0 RxP=0 TxB=0 TxP=0 RxB=0 RxP=0 TxB=0 TxP=0 RxB=0 RxP=0 TxB=0 TxP=0 [...] I've attempted to mirror the same behavior using dpdk-pktgen [2] on a different machine (Ubuntu 20.04). This time, af_packet runs on top of a Linux virtio_net interface. I seem to be getting a similar behavior. I have used the following dpdk-pktgen configuration and run-time settings pktgen \ -l 1-4 \ -n 4 \ --proc-type=primary \ --no-pci \ --no-telemetry \ --no-huge \ -m 512 \ --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0 \ -- \ -P \ -T \ -m "3.0" \ -f themes/black-yellow.theme set 0 size 300 set 0 rate 0.008 set 0 burst 1 start 0 [1] https://github.com/open-traffic-generator/ixia-c [2] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md On Wed, 29 Sept 2021 at 13:03, Tudor Cornea <tudor.cor...@gmail.com> wrote: > Hi Ferruh, > > What you described above looks like a ring buffer with single producer and >> single consumer, and producer overwrites the not consumed items. > > > Indeed. This is also my understanding of the bug. > I am going to try to isolate the issue, and should probably be able to > come up with a script in a few days. > > Our of curiosity, are you using an modified af_packet implementation in >> kernel >> for above described usage? > > > We are currently using an Ubuntu-based distro with a 4.15 Linux kernel. > We don't have any kernel patches for the af_packet implementation to my > knowledge (probably excepting patches that are back-ported by Ubuntu > maintainers from newer releases). > > > On Mon, 20 Sept 2021 at 20:44, Ferruh Yigit <ferruh.yi...@intel.com> > wrote: > >> On 9/13/2021 2:45 PM, Tudor Cornea wrote: >> > The poll call can return POLLERR which is ignored, or it can return >> > POLLOUT, even if there are no free frames in the mmap-ed area. >> > >> > We can account for both of these cases by re-checking if the next >> > frame is empty before writing into it. >> > >> > Signed-off-by: Mihai Pogonaru <pogonarumi...@gmail.com> >> > Signed-off-by: Tudor Cornea <tudor.cor...@gmail.com> >> > --- >> > drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++ >> > 1 file changed, 19 insertions(+) >> > >> > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c >> b/drivers/net/af_packet/rte_eth_af_packet.c >> > index b73b211..087c196 100644 >> > --- a/drivers/net/af_packet/rte_eth_af_packet.c >> > +++ b/drivers/net/af_packet/rte_eth_af_packet.c >> > @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf >> **bufs, uint16_t nb_pkts) >> > (poll(&pfd, 1, -1) < 0)) >> > break; >> > >> > + /* >> > + * Poll can return POLLERR if the interface is down >> > + * >> > + * It will almost always return POLLOUT, even if there >> > + * are no extra buffers available >> > + * >> > + * This happens, because packet_poll() calls >> datagram_poll() >> > + * which checks the space left in the socket buffer and, >> > + * in the case of packet_mmap, the default socket buffer >> length >> > + * doesn't match the requested size for the tx_ring. >> > + * As such, there is almost always space left in socket >> buffer, >> > + * which doesn't seem to be correlated to the requested >> size >> > + * for the tx_ring in packet_mmap. >> > + * >> > + * This results in poll() returning POLLOUT. >> > + */ >> > + if (ppd->tp_status != TP_STATUS_AVAILABLE) >> > + break; >> > + >> >> If 'POLLOUT' doesn't indicate that there is space in the buffer, what is >> the >> point of the 'poll()' at all? >> >> What can we test/reproduce the mentioned behavior? Or is there a way to >> fix the >> behavior of poll() or use an alternative of it? >> >> >> OK to break on the 'POLLERR', I guess it can be detected in the >> 'pfd.revent'. >> >> >> > /* copy the tx frame data */ >> > pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN - >> > sizeof(struct sockaddr_ll); >> > >> >>