On Tue, 13 Oct 2015 02:57:46 +0000 "Sanford, Robert" <rsanford at akamai.com> wrote:
> I'm hoping that someone (perhaps at Intel) can help us understand > an IXGBE RX packet loss issue we're able to reproduce with testpmd. > > We run testpmd with various numbers of cores. We offer line-rate > traffic (~14.88 Mpps) to one ethernet port, and forward all received > packets via the second port. > > When we configure 1, 2, 3, or 4 cores (per port, with same number RX > queues per port), there is no RX packet loss. When we configure 5 or > more cores, we observe the following packet loss (approximate): > 5 cores - 3% loss > 6 cores - 7% loss > 7 cores - 11% loss > 8 cores - 15% loss > 9 cores - 18% loss > > All of the "lost" packets are accounted for in the device's Rx Missed > Packets Count register (RXMPC[0]). Quoting the datasheet: > "Packets are missed when the receive FIFO has insufficient space to > store the incoming packet. This might be caused due to insufficient > buffers allocated, or because there is insufficient bandwidth on the > IO bus." > > RXMPC, and our use of API rx_descriptor_done to verify that we don't > run out of mbufs (discussed below), lead us to theorize that packet > loss occurs because the device is unable to DMA all packets from its > internal packet buffer (512 KB, reported by register RXPBSIZE[0]) > before overrun. > > Questions > ========= > 1. The 82599 device supports up to 128 queues. Why do we see trouble > with as few as 5 queues? What could limit the system (and one port > controlled by 5+ cores) from receiving at line-rate without loss? > > 2. As far as we can tell, the RX path only touches the device > registers when it updates a Receive Descriptor Tail register (RDT[n]), > roughly every rx_free_thresh packets. Is there a big difference > between one core doing this and N cores doing it 1/N as often? > > 3. Do CPU reads/writes from/to device registers have a higher priority > than device reads/writes from/to memory? Could the former transactions > (CPU <-> device) significantly impede the latter (device <-> RAM)? > > Thanks in advance for any help you can provide. As you add cores, there is more traffic on the PCI bus from each core polling. There is a fix number of PCI bus transactions per second possible. Each core is increasing the number of useless (empty) transactions. Why do you think adding more cores will help?