According to the problem to this thread --> http://mails.dpdk.org/archives/dev/2015-October/024966.html
Venkatesan, Venky mentioned the following reason: To add a little more detail - this ends up being both a bandwidth and a transaction bottleneck. Not only do you add an increased transaction count, you also add a huge amount of bandwidth overhead (each 16 byte descriptor is preceded by a PCI-E TLP which is about the same size). So what ends up happening in the case where the incoming packets are bifurcated to different queues (1 per queue) is that you have 2x the number of transactions (1 for the packet and one for the descriptor) and then we essentially double the bandwidth used because you now have the TLP overhead per descriptor write. But I couldn't figure out why we have bandwidth and transaction bottleneck. Can anyone help me? Best regards, Saber