[dpdk-dev] queue to VF assigment in SR-IOV
On Mon, Jun 13, 2016 at 4:56 AM, Mauricio V?squez wrote: > Hello Alexander, > > On Tue, Jun 7, 2016 at 11:31 PM, Alexander Duyck gmail.com> > wrote: >> >> On Tue, Jun 7, 2016 at 1:49 PM, Mauricio V?squez >> wrote: >> > Dear All, >> > >> > I implemented a program that uses flow director to forward packets to a >> > specific virtual function, however I faced the problem that I did not >> > know >> > which queue belongs to a VF. I found in [1] that in the case of Intel >> > 82599, queues 0-7 belongs to VF0, 8-15 to VF1 and so on, I tested it but >> > it >> > did not work, using the trial and error method I found that queue 0 is >> > in >> > VF0, queue 4 in VF1 and so on. >> > >> > My question is: is there a standard way to know which queues belong to a >> > specific VF? >> > >> > Thanks in advance >> > >> > Mauricio V, >> > >> > [1] >> > >> > http://www.intel.it/content/dam/www/public/us/en/documents/datasheets/82599-10-gbe-controller-datasheet.pdf, >> > Table 7-72 >> >> If you are using the kernel driver the way the queues are laid out >> depends on the number of VFs allocated and what features are enabled >> in the kernel. > > > I forgot to mention that I am using the DPDK ixgbe PMD. > >> >> Assuming you are not using DCB you should be able to >> figure out how many queues are being allocated via VF by looking at >> the output of "ethtool -l ". The upper limit on RSS is t he >> number of queues each pool is allocated. >> >> So for example if you only enable up to 31 VFs then the PF driver >> allocates 4 queues per VF so you would have queues 0 - 3 allocated to >> VF0, queues 4-7 allocated to VF1, etc all the way through to the PF >> occupying (num_vfs * 4) to 127. If you enable 32 or more VFs then the >> number of queues drops to 2 per VF and RSS on the PF will be limited >> to the 2 queues following the block reserved for the VFs. >> > > I found that the behavior of the DPDK PMD is almost the same you described: > 1 - 15 VFs -> 8 queues per VF > 16 - 31 VFs -> 4 queues per VF >>= 32 VFs -> 2 queues per VF > > But, according to the datasheet it should be > 16 VFs -> 8 queues per VF > 32 VFs -> 4 queues per VF > 64 VFs -> 2 queues per VF > > Am I missing something? The datasheet should be referring to "VM pools". The PF consumes one pool for any queues it is using. As such VFs + 1 is the total number of pools in use if the PF is active. > One extra thing that I am not understanding, in the case I assign the > maximum number of possible VFs, the PF remains without queues? The device can support at most 64 pools. So if you are allocating 64 VFs then there are no resources left for the PF to allocate queues from. I hope this helps to make it a bit clearer. - Alex
[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
On 10/02/2015 07:00 AM, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote: >> validation and translation would add 10s if not 100s of nanoseconds to the >> time needed to process each packet. In addition we are talking about doing >> this in kernel space which means we wouldn't really be able to take >> advantage of things like SSE or AVX instructions. > Yes. But the nice thing is that it's rearming so it can happen on > a separate core, in parallel with packet processing. > It does not need to add to latency. Moving it to another core is automatically going to add extra latency. You will have to evict the data out of the L1 cache for one core and into the L1 cache for another when you update it, and then reading it will force it to have to transition back out. If you are lucky it is only evicted to L2, if not then to L3, or possibly even back to memory. Odds are that alone will add tens of nanoseconds to the process, and you would need three or more cores to do the same workload as running the process over multiple threads means having to add synchronization primitives to the whole mess. Then there is the NUMA factor on top of that. > You will burn up more CPU, but again, all this for boxes/hypervisors > without an IOMMU. There are use cases this will completely make useless. If for example you are running a workload that needs three cores with DPDK bumping it to nine or more will likely push you out of being able to do the workload on some systems. > I'm sure people can come up with even better approaches, once enough > people get it that kernel absolutely needs to be protected from > userspace. I don't see that happening. Many people don't care about kernel security that much. If they did something like DPDK wouldn't have gotten off of the ground. Once someone has the ability to load kernel modules any protection of the kernel from userspace pretty much goes right out the window. You are just as much at risk from a buggy driver in userspace as you are from one that can be added to the kernel. > Long term, the right thing to do is to focus on IOMMU support. This > gives you hardware-based memory protection without need to burn up CPU > cycles. We have a solution that makes use of IOMMU support with vfio. The problem is there are multiple cases where that support is either not available, or using the IOMMU provides excess overhead. - Alex
[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro
On 10/12/2015 06:33 AM, Harry van Haaren wrote: > Fix a misinterpretation of VF stats in ixgbe > > Signed-off-by: Harry van Haaren > --- > drivers/net/ixgbe/ixgbe_ethdev.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > b/drivers/net/ixgbe/ixgbe_ethdev.c > index ec2918c..d226e8d 100644 > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > @@ -329,10 +329,14 @@ static int ixgbe_timesync_read_tx_timestamp(struct > rte_eth_dev *dev, > /* >* Define VF Stats MACRO for Non "cleared on read" register >*/ > -#define UPDATE_VF_STAT(reg, last, cur) \ > +#define UPDATE_VF_STAT(reg, last, cur) \ > { \ > uint32_t latest = IXGBE_READ_REG(hw, reg); \ > - cur += latest - last; \ > + if(likely(latest > last)) { \ > + cur += latest - last; \ > + } else {\ > + cur += (UINT_MAX - last) + latest; \ > + } \ > last = latest; \ > } > From what I can tell your math is adding an off by one error. You should probably be using UINT_MAX as a mask for the result, not as a part of the calculation itself. So the correct way to compute this would be "cur += (latest - last) & UINT_MAX". Also the mask approach should be faster as it avoids any conditional jumps. - Alex
[dpdk-dev] IXGBE RX packet loss with 5+ cores
On 10/13/2015 07:47 AM, Sanford, Robert wrote: [Robert:] 1. The 82599 device supports up to 128 queues. Why do we see trouble with as few as 5 queues? What could limit the system (and one port controlled by 5+ cores) from receiving at line-rate without loss? 2. As far as we can tell, the RX path only touches the device registers when it updates a Receive Descriptor Tail register (RDT[n]), roughly every rx_free_thresh packets. Is there a big difference between one core doing this and N cores doing it 1/N as often? >>> [Stephen:] >>> As you add cores, there is more traffic on the PCI bus from each core >>> polling. There is a fix number of PCI bus transactions per second >>> possible. >>> Each core is increasing the number of useless (empty) transactions. >> [Bruce:] >> The polling for packets by the core should not be using PCI bandwidth >> directly, >> as the ixgbe driver (and other drivers) check for the DD bit being set on >> the >> descriptor in memory/cache. > I was preparing to reply with the same point. > >>> [Stephen:] Why do you think adding more cores will help? > We're using run-to-completion and sometimes spend too many cycles per pkt. > We realize that we need to move to io+workers model, but wanted a better > understanding of the dynamics involved here. > > > >> [Bruce:] However, using an increased number of queues can >> use PCI bandwidth in other ways, for instance, with more queues you >> reduce the >> amount of descriptor coalescing that can be done by the NICs, so that >> instead of >> having a single transaction of 4 descriptors to one queue, the NIC may >> instead >> have to do 4 transactions each writing 1 descriptor to 4 different >> queues. This >> is possibly why sending all traffic to a single queue works ok - the >> polling on >> the other queues is still being done, but has little effect. > Brilliant! This idea did not occur to me. You can actually make the throughput regression disappear by altering the traffic pattern you are testing with. In the past I have found that sending traffic in bursts where 4 frames belong to the same queue before moving to the next one essentially eliminated the dropped packets due to PCIe bandwidth limitations. The trick is you need to have the Rx descriptor processing work in batches so that you can get multiple descriptors processed for each PCIe read/write. - Alex
[dpdk-dev] Question about unsupported transceivers
On 10/13/2015 11:57 AM, Alex Forster wrote: > I believe I've discovered my problem: > https://gist.github.com/AlexForster/0fb4699bcdf196cf5462 > > As mentioned previously, I have two X520-Q1 cards installed. It appears that > initialization of the first card obeys allow_unsupported_sfp=1, but > initialization of the second card does not. > > Is this a bug, or is there a way to work around this that I'm not aware of? > > Alex Forster If you are using Intel's out-of-tree ixgbe driver I believe the module parameters are comma separated with one index per port. So if you have two ports you should be passing "allow_unsupported_sfp=1,1", and for 4 you would need four '1's. - Alex
[dpdk-dev] Question about unsupported transceivers
On 10/15/2015 07:46 AM, Alex Forster wrote: > On 10/13/15, 4:34 PM, "Alexander Duyck" wrote: > >> If you are using Intel's out-of-tree ixgbe driver I believe the module >> parameters are comma separated with one index per port. So if you have >> two ports you should be passing "allow_unsupported_sfp=1,1", and for 4 >> you would need four '1's. > > This seemed very promising. I compiled and installed the out of tree ixgbe > driver and set the option in /etc/modprobe.d/ixgbe.conf. dmesg shows all > eight "allow_unsupported_sfp enabled" messages but the last four ports > still error out with the unsupported SFP message when running the tests. > > Before I start arbitrarily trying to patch out parts of the SFP > verification code in ixgbe, are there any other tips I should know? Can you send me the command you used to load the module, and the exact number of ixgbe ports you have in the system? With that I could then verify that the command was entered correctly as it is possible there could still be an issue in the way the command was entered. One other possibility is that when the driver loads each load counts as an instance in the module parameter array. So if for example you unbind the driver on one port and then later rebind it you will have consumed one of the values in the array. Do it enough times and you exceed the bounds of the array as you entered it and it will simply use the default value of 0. Also the output of "ethtool -i " would be useful to verify that you have the out-of-tree driver loaded and not the in kernel. - Alex
[dpdk-dev] Question about unsupported transceivers
On 10/15/2015 08:43 AM, Alex Forster wrote: > On 10/15/15, 11:30 AM, "Alexander Duyck" wrote: > >> On 10/15/2015 07:46 AM, Alex Forster wrote: >>> On 10/13/15, 4:34 PM, "Alexander Duyck" >>> wrote: >>> >>>> If you are using Intel's out-of-tree ixgbe driver I believe the module >>>> parameters are comma separated with one index per port. So if you have >>>> two ports you should be passing "allow_unsupported_sfp=1,1", and for 4 >>>> you would need four '1's. >>> >>> This seemed very promising. I compiled and installed the out of tree >>> ixgbe >>> driver and set the option in /etc/modprobe.d/ixgbe.conf. dmesg shows all >>> eight "allow_unsupported_sfp enabled" messages but the last four ports >>> still error out with the unsupported SFP message when running the tests. >>> >>> Before I start arbitrarily trying to patch out parts of the SFP >>> verification code in ixgbe, are there any other tips I should know? >> >> Can you send me the command you used to load the module, and the exact >> number of ixgbe ports you have in the system? With that I could then >> verify that the command was entered correctly as it is possible there >> could still be an issue in the way the command was entered. >> >> One other possibility is that when the driver loads each load counts as >> an instance in the module parameter array. So if for example you unbind >> the driver on one port and then later rebind it you will have consumed >> one of the values in the array. Do it enough times and you exceed the >> bounds of the array as you entered it and it will simply use the default >> value of 0. >> >> Also the output of "ethtool -i " would be useful to verify that >> you have the out-of-tree driver loaded and not the in kernel. >> >> - Alex >> > > Er, let me try that again. > > https://gist.github.com/AlexForster/f5372c5b60153d278089 > > > Alex Forster > > It looks like you are probably seeing interfaces be unbound and then rebound. As such you are likely pushing things outside of the array boundary. One solution might just be to at more ",1"s if you are only going to be doing this kind of thing at boot up. The upper limit for the array is 32 entries so as long as you only are setting this up once you could probably get away with that. An alternative would be to modify the definition of the parameter in ixgbe_param.c. If you look through the file you should fine several likes like below: struct ixgbe_option opt = { .type = enable_option, .name = "allow_unsupported_sfp", .err = "defaulting to Disabled", .def = OPTION_DISABLED }; If you modify the .def value to "OPTION_ENABLED", and then rebuild and reinstall your driver you should be able have it install without any issues. - Alex
[dpdk-dev] Question about unsupported transceivers
On 10/15/2015 10:13 AM, Alex Forster wrote: > On 10/15/15, 12:17 PM, "Alexander Duyck" wrote: > > >> On 10/15/2015 08:43 AM, Alex Forster wrote: >>> On 10/15/15, 11:30 AM, "Alexander Duyck" >>> wrote: >>> >>>> On 10/15/2015 07:46 AM, Alex Forster wrote: >>>>> On 10/13/15, 4:34 PM, "Alexander Duyck" >>>>> wrote: >>>>> >>>>>> If you are using Intel's out-of-tree ixgbe driver I believe the >>>>>> module >>>>>> parameters are comma separated with one index per port. So if you >>>>>> have >>>>>> two ports you should be passing "allow_unsupported_sfp=1,1", and for >>>>>> 4 >>>>>> you would need four '1's. >>>>> This seemed very promising. I compiled and installed the out of tree >>>>> ixgbe >>>>> driver and set the option in /etc/modprobe.d/ixgbe.conf. dmesg shows >>>>> all >>>>> eight "allow_unsupported_sfp enabled" messages but the last four ports >>>>> still error out with the unsupported SFP message when running the >>>>> tests. >>>>> >>>>> Before I start arbitrarily trying to patch out parts of the SFP >>>>> verification code in ixgbe, are there any other tips I should know? >>>> Can you send me the command you used to load the module, and the exact >>>> number of ixgbe ports you have in the system? With that I could then >>>> verify that the command was entered correctly as it is possible there >>>> could still be an issue in the way the command was entered. >>>> >>>> One other possibility is that when the driver loads each load counts as >>>> an instance in the module parameter array. So if for example you >>>> unbind >>>> the driver on one port and then later rebind it you will have consumed >>>> one of the values in the array. Do it enough times and you exceed the >>>> bounds of the array as you entered it and it will simply use the >>>> default >>>> value of 0. >>>> >>>> Also the output of "ethtool -i " would be useful to verify that >>>> you have the out-of-tree driver loaded and not the in kernel. >>>> >>>> - Alex >>>> >>> Er, let me try that again. >>> >>> https://gist.github.com/AlexForster/f5372c5b60153d278089 >>> >>> >>> Alex Forster >>> >>> >> It looks like you are probably seeing interfaces be unbound and then >> rebound. As such you are likely pushing things outside of the array >> boundary. One solution might just be to at more ",1"s if you are only >> going to be doing this kind of thing at boot up. The upper limit for >> the array is 32 entries so as long as you only are setting this up once >> you could probably get away with that. >> >> An alternative would be to modify the definition of the parameter in >> ixgbe_param.c. If you look through the file you should fine several >> likes like below: >> struct ixgbe_option opt = { >> .type = enable_option, >> .name = "allow_unsupported_sfp", >> .err = "defaulting to Disabled", >> .def = OPTION_DISABLED >> }; >> >> If you modify the .def value to "OPTION_ENABLED", and then rebuild and >> reinstall your driver you should be able have it install without any >> issues. >> >> - Alex >> > Yeah, I've had roughly the same thought process since you mentioned the > args array. My first idea was "maybe the driver can't fit all of my 1's" > but I saw it was defined at 32. Then I decided to just patch the whole > enable_unsupported_sfp option out > https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still > failing. Your changes are a bit over-kill and actually take things in the wrong direction. By commenting out the whole allow_unsupported_sfp block you are disabling it by default. Remember the module parameter allows it, by removing it there is no way to enable the feature. Like I mentioned in my previous email just take a look at replacing the "OPTION_DISABLED" value with "OPTION_ENABLED" in the .def part of the structure. After that you won't need to pass the module parameter as it will always be enabled by default. - Alex
[dpdk-dev] Question about unsupported transceivers
On 10/15/2015 10:13 AM, Alex Forster wrote: > On 10/15/15, 12:17 PM, "Alexander Duyck" wrote: > > >> On 10/15/2015 08:43 AM, Alex Forster wrote: >>> On 10/15/15, 11:30 AM, "Alexander Duyck" >>> wrote: >>> >>>> On 10/15/2015 07:46 AM, Alex Forster wrote: >>>>> On 10/13/15, 4:34 PM, "Alexander Duyck" >>>>> wrote: >>>>> >>>>>> If you are using Intel's out-of-tree ixgbe driver I believe the >>>>>> module >>>>>> parameters are comma separated with one index per port. So if you >>>>>> have >>>>>> two ports you should be passing "allow_unsupported_sfp=1,1", and for >>>>>> 4 >>>>>> you would need four '1's. >>>>> This seemed very promising. I compiled and installed the out of tree >>>>> ixgbe >>>>> driver and set the option in /etc/modprobe.d/ixgbe.conf. dmesg shows >>>>> all >>>>> eight "allow_unsupported_sfp enabled" messages but the last four ports >>>>> still error out with the unsupported SFP message when running the >>>>> tests. >>>>> >>>>> Before I start arbitrarily trying to patch out parts of the SFP >>>>> verification code in ixgbe, are there any other tips I should know? >>>> Can you send me the command you used to load the module, and the exact >>>> number of ixgbe ports you have in the system? With that I could then >>>> verify that the command was entered correctly as it is possible there >>>> could still be an issue in the way the command was entered. >>>> >>>> One other possibility is that when the driver loads each load counts as >>>> an instance in the module parameter array. So if for example you >>>> unbind >>>> the driver on one port and then later rebind it you will have consumed >>>> one of the values in the array. Do it enough times and you exceed the >>>> bounds of the array as you entered it and it will simply use the >>>> default >>>> value of 0. >>>> >>>> Also the output of "ethtool -i " would be useful to verify that >>>> you have the out-of-tree driver loaded and not the in kernel. >>>> >>>> - Alex >>>> >>> Er, let me try that again. >>> >>> https://gist.github.com/AlexForster/f5372c5b60153d278089 >>> >>> >>> Alex Forster >>> >>> >> It looks like you are probably seeing interfaces be unbound and then >> rebound. As such you are likely pushing things outside of the array >> boundary. One solution might just be to at more ",1"s if you are only >> going to be doing this kind of thing at boot up. The upper limit for >> the array is 32 entries so as long as you only are setting this up once >> you could probably get away with that. >> >> An alternative would be to modify the definition of the parameter in >> ixgbe_param.c. If you look through the file you should fine several >> likes like below: >> struct ixgbe_option opt = { >> .type = enable_option, >> .name = "allow_unsupported_sfp", >> .err = "defaulting to Disabled", >> .def = OPTION_DISABLED >> }; >> >> If you modify the .def value to "OPTION_ENABLED", and then rebuild and >> reinstall your driver you should be able have it install without any >> issues. >> >> - Alex >> > Yeah, I've had roughly the same thought process since you mentioned the > args array. My first idea was "maybe the driver can't fit all of my 1's" > but I saw it was defined at 32. Then I decided to just patch the whole > enable_unsupported_sfp option out > https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still > failing. > > I've been digging a bit, and I'm failing here in ixgbe_main.c... > > /* reset_hw fills in the perm_addr as well */ > hw->phy.reset_if_overtemp = true; > err = hw->mac.ops.reset_hw(hw); > hw->phy.reset_if_overtemp = false; > if (err == IXGBE_ERR_SFP_NOT_PRESENT) { > err = IXGBE_SUCCESS; > } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) { > e_dev_err("failed to load because an unsupported SFP+ or QSFP " > "module type was detected.\n"); >
[dpdk-dev] Question about unsupported transceivers
On 10/18/2015 06:06 PM, Alex Forster wrote: > On 10/15/15, 3:53 PM, "Alexander Duyck" wrote: > > >>>> It looks like you are probably seeing interfaces be unbound and then >>>> rebound. As such you are likely pushing things outside of the array >>>> boundary. One solution might just be to at more ",1"s if you are only >>>> going to be doing this kind of thing at boot up. The upper limit for >>>> the array is 32 entries so as long as you only are setting this up once >>>> you could probably get away with that. >>>> >>>> An alternative would be to modify the definition of the parameter in >>>> ixgbe_param.c. If you look through the file you should fine several >>>> likes like below: >>>>struct ixgbe_option opt = { >>>>.type = enable_option, >>>>.name = "allow_unsupported_sfp", >>>>.err = "defaulting to Disabled", >>>>.def = OPTION_DISABLED >>>>}; >>>> >>>> If you modify the .def value to "OPTION_ENABLED", and then rebuild and >>>> reinstall your driver you should be able have it install without any >>>> issues. >>>> >>>> - Alex >>>> >>> Yeah, I've had roughly the same thought process since you mentioned the >>> args array. My first idea was "maybe the driver can't fit all of my 1's" >>> but I saw it was defined at 32. Then I decided to just patch the whole >>> enable_unsupported_sfp option out >>> https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still >>> failing. >>> >>> I've been digging a bit, and I'm failing here in ixgbe_main.c... >>> >>> /* reset_hw fills in the perm_addr as well */ >>> hw->phy.reset_if_overtemp = true; >>> err = hw->mac.ops.reset_hw(hw); >>> hw->phy.reset_if_overtemp = false; >>> if (err == IXGBE_ERR_SFP_NOT_PRESENT) { >>> err = IXGBE_SUCCESS; >>> } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) { >>> e_dev_err("failed to load because an unsupported SFP+ or QSFP " >>> "module type was detected.\n"); >>> e_dev_err("Reload the driver after installing a supported " >>> "module.\n"); >>> goto err_sw_init; >>> } else if (err) { >>> e_dev_err("HW Init failed: %d\n", err); >>> goto err_sw_init; >>> } >>> >>> >>> I've attempted a hand-stacktrace and came up with the following... >>> >>> ixgbe_82599.c at 1016 >>>* ixgbe_reset_hw_82599() is defined >>>* calls phy->ops.init() which potentially returns >>> IXGBE_ERR_SFP_NOT_SUPPORTED >>> >>> ixgbe_82599.c at 102 >>>* ixgbe_init_phy_ops_82599() is defined >>>* IXGBE_ERR_SFP_NOT_SUPPORTED is returned after calling >>> phy->ops.identify() >>> >>> ixgbe_82599.c at 2085 >>>* ixgbe_identify_phy_82599() is defined >>>* calls ixgbe_identify_module_generic() >>> >>> ixgbe_phy.c at 1281 >>>* ixgbe_identify_module_generic() is defined >>>* calls ixgbe_identify_qsfp_module_generic() >>> >>> ixgbe_phy.c at 1663 >>>* ixgbe_identify_qsfp_module_generic() is defined >>>* We fail somewhere before the ending call to ixgbe_get_device_caps() >>> which does take allow_unsupported_sfp into account >>> >>>* Possibility: hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER, >>> &identifier) != IXGBE_SFF_IDENTIFIER_QSFP_PLUS >>>* Possibility: active_cable != true >>> >>> And then I'm over my head. Should I assume from here that the most >>> likely >>> explanation is a bad transceiver or bad fiber? >>> >>> Alex Forster >> >> Are you able to swap transceiver or fiber between the 4 ports that work >> and the 4 that don't? If you could do that then you should be able to >> tell if the issue is following the NIC ports, or if it is an issue with >> the external connections. If it is issue is following the transceiver >> or fiber then it is probably what is causing the issue. >> >> The other thing you could try doing is adding a printk to the spots >> where the status is set to SFP_NOT_SUPPORTED so that you could
[dpdk-dev] queue to VF assigment in SR-IOV
On Tue, Jun 7, 2016 at 1:49 PM, Mauricio V?squez wrote: > Dear All, > > I implemented a program that uses flow director to forward packets to a > specific virtual function, however I faced the problem that I did not know > which queue belongs to a VF. I found in [1] that in the case of Intel > 82599, queues 0-7 belongs to VF0, 8-15 to VF1 and so on, I tested it but it > did not work, using the trial and error method I found that queue 0 is in > VF0, queue 4 in VF1 and so on. > > My question is: is there a standard way to know which queues belong to a > specific VF? > > Thanks in advance > > Mauricio V, > > [1] > http://www.intel.it/content/dam/www/public/us/en/documents/datasheets/82599-10-gbe-controller-datasheet.pdf, > Table 7-72 If you are using the kernel driver the way the queues are laid out depends on the number of VFs allocated and what features are enabled in the kernel. Assuming you are not using DCB you should be able to figure out how many queues are being allocated via VF by looking at the output of "ethtool -l ". The upper limit on RSS is t he number of queues each pool is allocated. So for example if you only enable up to 31 VFs then the PF driver allocates 4 queues per VF so you would have queues 0 - 3 allocated to VF0, queues 4-7 allocated to VF1, etc all the way through to the PF occupying (num_vfs * 4) to 127. If you enable 32 or more VFs then the number of queues drops to 2 per VF and RSS on the PF will be limited to the 2 queues following the block reserved for the VFs. There are a few other configurations such as if DCB is enabled I believe it is possible to get 8 queues per VF if less than 16 VFs are allocated but in such a case you would not have access to RSS. In this case if the maximum combined queue count reported is 1 you would need to check to see how many TCs are being supported by the PF in order to determine if the queue count is 4 or 8 per VF. - Alex
[dpdk-dev] [PATCH 0/2] uio_msi: device driver
On 10/01/2015 07:57 AM, Stephen Hemminger wrote: > On Thu, 1 Oct 2015 13:59:02 +0300 > Avi Kivity wrote: > >> On 10/01/2015 01:28 AM, Stephen Hemminger wrote: >>> This is a new UIO device driver to allow supporting MSI-X and MSI devices >>> in userspace. It has been used in environments like VMware and older >>> versions >>> of QEMU/KVM where no IOMMU support is available. >> Why not add msi/msix support to uio_pci_generic? > That is possible but that would meet ABI and other resistance from the author. > Also, uio_pci_generic makes it harder to find resources since it doesn't fully > utilize UIO infrastructure. I'd say you are better off actually taking this in the other direction. >From what I have seen it seems like this driver is meant to deal with mapping VFs contained inside of guests. If you are going to fork off and create a UIO driver for mapping VFs why not just make it specialize in that. You could probably simplify the code by dropping support for legacy interrupts and IO regions since all that is already covered by uio_pci_generic anyway if I am not mistaken. You could then look at naming it something like uio_vf since the uio_msi is a bit of a misnomer since it is MSI-X it supports, not MSI interrupts. - Alex
[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
On 10/01/2015 06:14 AM, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 01:07:13PM +0100, Bruce Richardson wrote: This in itself is going to use up a good proportion of the processing time, as well as that we have to spend cycles copying the descriptors from one ring in memory to another. Given that right now with the vector ixgbe driver, the cycle cost per packet of RX is just a few dozen cycles on modern cores, every additional cycle (fraction of a nanosecond) has an impact. Regards, /Bruce >>> See above. There is no need for that on data path. Only re-adding >>> buffers requires a system call. >>> >> Re-adding buffers is a key part of the data path! Ok, the fact that its only >> on >> descriptor rearm does allow somewhat bigger batches, > That was the point, yes. > >> but the whole point of having >> the kernel do this extra work you propose is to allow the kernel to scan and >> sanitize the physical addresses - and that will take a lot of cycles, >> especially >> if it has to handle all the different descriptor formats of all the >> different NICs, >> as has already been pointed out. >> >> /Bruce > Well the driver would be per NIC, so there's only need to support > specific formats supported by a given NIC. One thing that seems to be overlooked in your discussion is the cost to translate these descriptors. It isn't as if most systems running DPDK have the cycles to spare. As I believe was brought up in another thread we are looking at a budget of something like 68ns of 10Gbps line rate. The overhead for having to go through and translate/parse/validate the descriptors would end up being pretty significant. If you need proof of that just try running the ixgbe driver and route small packets. We end up spending something like 40ns in ixgbe_clean_rx_irq and that is mostly just translating the descriptor bits into the correct sk_buff bits. Also trying to maintain a user-space ring in addition to the kernel-space ring means that much more memory overhead and increasing the liklihood of things getting pushed out of the L1 cache. As far as the descriptor validation itself the overhead for that would guarantee that you cannot get any performance out of the device. There are too many corner cases that would have to be addressed in validating user-space input to allow for us to process packets in any sort of timely fashion. For starters we would have to validate the size, alignment, and ownership of a given buffer. If it is a transmit buffer we have to go through and validate any offloads being requested. Likely just the validation and translation would add 10s if not 100s of nanoseconds to the time needed to process each packet. In addition we are talking about doing this in kernel space which means we wouldn't really be able to take advantage of things like SSE or AVX instructions. > An alternative is to format the descriptors in kernel, based > on just the list of addresses. This seems cleaner, but I don't > know how efficient it would be. > > Device vendors and dpdk developers are probably the best people to > figure out what's the best thing to do here. As far as the bifurcated driver approach the only way something like that would ever work is if you could limit the access via an IOMMU. At least everything I have seen proposed for a bifurcated driver still involved one if they expected to get any performance. > But it looks like it's not going to happen unless security is made > a requirement for upstreaming code. The fact is we already ship uio_pci_generic. User space drivers are here to stay. What is being asked for is an extension to the existing infrastructure to allow MSI-X interrupts to trigger an event on a file descriptor. As far as I know that doesn't add any additional security risk since it is the kernel PCIe subsystem itself that would be programming the address and data for said device, it wouldn't actually grant any more access other then the additional file descriptors to support MSI-X vectors. Anyway that is just my $.02 on this. - Alex
[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
On 10/01/2015 02:42 AM, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 12:22:46PM +0300, Avi Kivity wrote: >> even when they are some users >> prefer to avoid the performance penalty. > I don't think there's a measureable penalty from passing through the > IOMMU, as long as mappings are mostly static (i.e. iommu=pt). I sure > never saw any numbers that show such. It depends on the IOMMU. I believe Intel had a performance penalty on all CPUs prior to Ivy Bridge. Since then things have improved to where they are comparable to bare metal. The graph on page 5 of https://networkbuilders.intel.com/docs/Network_Builders_RA_vBRAS_Final.pdf shows the penalty clear as day. Pretty much anything before Ivy Bridge w/ small packets is slowed to a crawl with an IOMMU enabled. - Alex
[dpdk-dev] [PATCH 0/2] uio_msi: device driver
On 10/01/2015 03:00 PM, Stephen Hemminger wrote: > On Thu, 1 Oct 2015 12:48:36 -0700 > Alexander Duyck wrote: > >> On 10/01/2015 07:57 AM, Stephen Hemminger wrote: >>> On Thu, 1 Oct 2015 13:59:02 +0300 >>> Avi Kivity wrote: >>> >>>> On 10/01/2015 01:28 AM, Stephen Hemminger wrote: >>>>> This is a new UIO device driver to allow supporting MSI-X and MSI devices >>>>> in userspace. It has been used in environments like VMware and older >>>>> versions >>>>> of QEMU/KVM where no IOMMU support is available. >>>> Why not add msi/msix support to uio_pci_generic? >>> That is possible but that would meet ABI and other resistance from the >>> author. >>> Also, uio_pci_generic makes it harder to find resources since it doesn't >>> fully >>> utilize UIO infrastructure. >> I'd say you are better off actually taking this in the other direction. >> From what I have seen it seems like this driver is meant to deal with >> mapping VFs contained inside of guests. If you are going to fork off >> and create a UIO driver for mapping VFs why not just make it specialize >> in that. You could probably simplify the code by dropping support for >> legacy interrupts and IO regions since all that is already covered by >> uio_pci_generic anyway if I am not mistaken. >> >> You could then look at naming it something like uio_vf since the uio_msi >> is a bit of a misnomer since it is MSI-X it supports, not MSI interrupts. > The support needs to cover: >- VF in guest >- VNIC in guest (vmxnet3) > it isn't just about VF's I get that, but the driver you are talking about adding is duplicating much of what is already there in uio_pci_generic. If nothing else it might be worth while to look at replacing the legacy interrupt with MSI. Maybe look at naming it something like uio_pcie to indicate that we are focusing on assigning PCIe and virtual devices that support MSI and MSI-X and use memory BARs rather than legacy PCI devices that are doing things like mapping I/O BARs and using INTx signaling. My main argument is that we should probably look at dropping support for anything that isn't going to be needed. If it is really important we can always add it later. I just don't see the value in having code around for things we aren't likely to ever use with real devices as we are stuck supporting it for the life of the driver. I'll go ahead and provide a inline review of your patch 2/2 as I think my feedback might make a bit more sense that way. - Alex
[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
On 09/30/2015 03:28 PM, Stephen Hemminger wrote: > This driver allows using PCI device with Message Signalled Interrupt > from userspace. The API is similar to the igb_uio driver used by the DPDK. > Via ioctl it provides a mechanism to map MSI-X interrupts into event > file descriptors similar to VFIO. > > VFIO is a better choice if IOMMU is available, but often userspace drivers > have to work in environments where IOMMU support (real or emulated) is > not available. All UIO drivers that support DMA are not secure against > rogue userspace applications programming DMA hardware to access > private memory; this driver is no less secure than existing code. > > Signed-off-by: Stephen Hemminger > --- > drivers/uio/Kconfig | 9 ++ > drivers/uio/Makefile | 1 + > drivers/uio/uio_msi.c| 378 > +++ > include/uapi/linux/Kbuild| 1 + > include/uapi/linux/uio_msi.h | 22 +++ > 5 files changed, 411 insertions(+) > create mode 100644 drivers/uio/uio_msi.c > create mode 100644 include/uapi/linux/uio_msi.h > > diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig > index 52c98ce..04adfa0 100644 > --- a/drivers/uio/Kconfig > +++ b/drivers/uio/Kconfig > @@ -93,6 +93,15 @@ config UIO_PCI_GENERIC > primarily, for virtualization scenarios. > If you compile this as a module, it will be called uio_pci_generic. > > +config UIO_PCI_MSI > + tristate "Generic driver supporting MSI-x on PCI Express cards" > + depends on PCI > + help > + Generic driver that provides Message Signalled IRQ events > + similar to VFIO. If IOMMMU is available please use VFIO > + instead since it provides more security. > + If you compile this as a module, it will be called uio_msi. > + > config UIO_NETX > tristate "Hilscher NetX Card driver" > depends on PCI Should you maybe instead depend on CONFIG_PCI_MSI. Without MSI this is essentially just uio_pci_generic with a bit more greedy mapping setup. > diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile > index 8560dad..62fc44b 100644 > --- a/drivers/uio/Makefile > +++ b/drivers/uio/Makefile > @@ -9,3 +9,4 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o > obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o > obj-$(CONFIG_UIO_MF624) += uio_mf624.o > obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o > +obj-$(CONFIG_UIO_PCI_MSI)+= uio_msi.o > diff --git a/drivers/uio/uio_msi.c b/drivers/uio/uio_msi.c > new file mode 100644 > index 000..802b5c4 > --- /dev/null > +++ b/drivers/uio/uio_msi.c > @@ -0,0 +1,378 @@ > +/*- > + * > + * Copyright (c) 2015 by Brocade Communications Systems, Inc. > + * Author: Stephen Hemminger > + * > + * This work is licensed under the terms of the GNU GPL, version 2 only. > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DRIVER_VERSION "0.1.1" > +#define MAX_MSIX_VECTORS 64 > + > +/* MSI-X vector information */ > +struct uio_msi_pci_dev { > + struct uio_info info; /* UIO driver info */ > + struct pci_dev *pdev; /* PCI device */ > + struct mutexmutex; /* open/release/ioctl mutex */ > + int ref_cnt;/* references to device */ > + unsigned intmax_vectors;/* MSI-X slots available */ > + struct msix_entry *msix;/* MSI-X vector table */ > + struct uio_msi_irq_ctx { > + struct eventfd_ctx *trigger; /* vector to eventfd */ > + char *name; /* name in /proc/interrupts */ > + } *ctx; > +}; > + I would move the definition of uio_msi_irq_ctx out of uio_msi_pci_dev. It would help to make this a bit more readable. > +static irqreturn_t uio_intx_irqhandler(int irq, void *arg) > +{ > + struct uio_msi_pci_dev *udev = arg; > + > + if (pci_check_and_mask_intx(udev->pdev)) { > + eventfd_signal(udev->ctx->trigger, 1); > + return IRQ_HANDLED; > + } > + > + return IRQ_NONE; > +} > + I would really prefer to see the intx handling dropped since there are already 2 different UIO drivers setup for handling INTx style interrupts. Lets focus on the parts from the last decade and drop support for INTx now in favor of MSI-X and maybe MSI. If we _REALLY_ need it we can always come back later and add it. > +static irqreturn_t uio_msi_irqhandler(int irq, void *arg) > +{ > + struct eventfd_ctx *trigger = arg; > + > + eventfd_signal(trigger, 1); > + return IRQ_HANDLED; > +} > + > +/* set the mapping between vector # and existing eventfd. */ > +static int set_irq_eventfd(struct uio_msi_pci_dev *udev, u32 vec, int fd) > +{ > + struct eventfd_ctx *trigger; > + int irq, err; > + > + if (vec >= udev->max_vectors) { > + dev_notice(&udev->pdev->dev, "vec %u >= num_vec %u\n",
[dpdk-dev] [PATCH 0/2] uio_msi: device driver
On 10/01/2015 04:39 PM, Stephen Hemminger wrote: > On Thu, 1 Oct 2015 16:03:06 -0700 > Alexander Duyck wrote: > >> On 10/01/2015 03:00 PM, Stephen Hemminger wrote: >>> On Thu, 1 Oct 2015 12:48:36 -0700 >>> Alexander Duyck wrote: >>> >>>> On 10/01/2015 07:57 AM, Stephen Hemminger wrote: >>>>> On Thu, 1 Oct 2015 13:59:02 +0300 >>>>> Avi Kivity wrote: >>>>> >>>>>> On 10/01/2015 01:28 AM, Stephen Hemminger wrote: >>>>>>> This is a new UIO device driver to allow supporting MSI-X and MSI >>>>>>> devices >>>>>>> in userspace. It has been used in environments like VMware and older >>>>>>> versions >>>>>>> of QEMU/KVM where no IOMMU support is available. >>>>>> Why not add msi/msix support to uio_pci_generic? >>>>> That is possible but that would meet ABI and other resistance from the >>>>> author. >>>>> Also, uio_pci_generic makes it harder to find resources since it doesn't >>>>> fully >>>>> utilize UIO infrastructure. >>>> I'd say you are better off actually taking this in the other direction. >>>> From what I have seen it seems like this driver is meant to deal with >>>> mapping VFs contained inside of guests. If you are going to fork off >>>> and create a UIO driver for mapping VFs why not just make it specialize >>>> in that. You could probably simplify the code by dropping support for >>>> legacy interrupts and IO regions since all that is already covered by >>>> uio_pci_generic anyway if I am not mistaken. >>>> >>>> You could then look at naming it something like uio_vf since the uio_msi >>>> is a bit of a misnomer since it is MSI-X it supports, not MSI interrupts. >>> The support needs to cover: >>> - VF in guest >>> - VNIC in guest (vmxnet3) >>> it isn't just about VF's >> I get that, but the driver you are talking about adding is duplicating >> much of what is already there in uio_pci_generic. If nothing else it >> might be worth while to look at replacing the legacy interrupt with >> MSI. Maybe look at naming it something like uio_pcie to indicate that >> we are focusing on assigning PCIe and virtual devices that support MSI >> and MSI-X and use memory BARs rather than legacy PCI devices that are >> doing things like mapping I/O BARs and using INTx signaling. >> >> My main argument is that we should probably look at dropping support for >> anything that isn't going to be needed. If it is really important we >> can always add it later. I just don't see the value in having code >> around for things we aren't likely to ever use with real devices as we >> are stuck supporting it for the life of the driver. I'll go ahead and >> provide a inline review of your patch 2/2 as I think my feedback might >> make a bit more sense that way. > Ok, but having one driver that can deal with failures with msi-x vector > setup and fallback seemed like a better strategy. Yes, but in the case of something like a VF it is going to just make a bigger mess of things since INTx doesn't work. So what would you expect your driver to do in that case? Also we have to keep in mind that the MSI-X failure case is very unlikely. One other thing that just occurred to me is that you may want to try using the range allocation call instead of a hard set number of interrupts. Then if you start running short on vectors you don't hard fail and instead just allocate what you can. - Alex
[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
On 10/01/2015 05:01 PM, Stephen Hemminger wrote: > On Thu, 1 Oct 2015 16:40:10 -0700 > Alexander Duyck wrote: > >> I agree with some other reviewers. Why call pci_enable_msix in open? >> It seems like it would make much more sense to do this on probe, and >> then disable MSI-X on free. I can only assume you are trying to do it >> to save on resources but the fact is this is a driver you have to >> explicitly force onto a device so you would probably be safe to assume >> that they plan to use it in the near future. > Because if interface is not up, the MSI handle doesn't have to be open. > This saves resources and avoids some races. Yes, but it makes things a bit messier for the interrupts. Most drivers take care of interrupts during probe so that if there are any allocation problems they can take care of them then instead of leaving an interface out that will later fail when it is brought up. It ends up being a way to deal with the whole MSI-X fall-back issue. - Alex
[dpdk-dev] [PATCH 0/2] uio_msi: device driver
On 10/01/2015 05:04 PM, Stephen Hemminger wrote: > On Thu, 1 Oct 2015 16:43:23 -0700 > Alexander Duyck wrote: > >> Yes, but in the case of something like a VF it is going to just make a >> bigger mess of things since INTx doesn't work. So what would you expect >> your driver to do in that case? Also we have to keep in mind that the >> MSI-X failure case is very unlikely. >> >> One other thing that just occurred to me is that you may want to try >> using the range allocation call instead of a hard set number of >> interrupts. Then if you start running short on vectors you don't hard >> fail and instead just allocate what you can. > I tried that but the bookkeeping gets messy since there is no good > way to communicate that back to userspace and have it adapt. Actually I kind of just realized that uio_msi_open is kind of messed up. So if the MSI-X allocation fails due to no resources it will return a positive value indicating the number of vectors that could be allocated, a negative value if one of the input values is invalid, or 0. I'm not sure if returning a positive value on failure is an issue or not. I know the open call is supposed to return a negative value or the file descriptor if not negative. I don't know if the return value might be interpreted as a file descriptor or not. Also if MSI-X is supported by the hardware, but disabled for some reason by the kernel ("pci=nomsi") then this driver is rendered inoperable since it will never give you anything but -EINVAL from the open call. I really think you should probably look at taking care of enabling MSI-X and maybe MSI as a fall-back in probe. At least then you can post a message about how many vectors are enabled and what type. Then if you cannot enable any interrupts due to MSI being disabled you can simply fail at probe time and let then load a different driver. - Alex
[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
On 10/01/2015 05:04 PM, Stephen Hemminger wrote: > On Thu, 1 Oct 2015 16:40:10 -0700 > Alexander Duyck wrote: > >> Do you really need to map IORESOURCE bars? Most drivers I can think of >> don't use IO BARs anymore. Maybe we could look at just dropping the >> code and adding it back later if we have a use case that absolutely >> needs it. > Mapping is not strictly necessary, but for virtio it acts a way to communicate > the regions. I think I see what you are saying. I was hoping we could get away from having to map any I/O ports but it looks like virtio is still using them for BAR 0, or at least that is what I am seeing on my VM with virtio_net. I was really hoping we could get away from that since a 16b address space is far too restrictive anyway. >> Also how many devices actually need resources beyond BAR 0? I'm just >> curious as I know BAR 2 on many of the Intel devices is the register >> space related to MSI-X so now we have both the PCIe subsystem and user >> space with access to this region. > VMXNet3 needs 2 bars. Most use only one. So essentially we are needing to make exceptions for the virtual interfaces. I guess there isn't much we can do then and we probably need to map any and all base address registers we can find for the given device. I was hoping for something a bit more surgical since we are opening a security hole of sorts, but I guess it can't be helped if we want to support multiple devices and they all have such radically different configurations. - Alex
[dpdk-dev] [PATCH] fm10k: add missing newline to debug log
On 07/16/2015 10:26 AM, Stephen Hemminger wrote: > If FM10K_DEBUG_DRIVER is enabled, then the log messages about > function entry are missing newline causing extremely long lines. > > Signed-off-by: Stephen Hemminger > --- > drivers/net/fm10k/base/fm10k_osdep.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/fm10k/base/fm10k_osdep.h > b/drivers/net/fm10k/base/fm10k_osdep.h > index 04f8fe9..33d9120 100644 > --- a/drivers/net/fm10k/base/fm10k_osdep.h > +++ b/drivers/net/fm10k/base/fm10k_osdep.h > @@ -46,7 +46,7 @@ POSSIBILITY OF SUCH DAMAGE. > > #define STATIC static > #define DEBUGFUNC(F)DEBUGOUT(F); > -#define DEBUGOUT(S, args...)PMD_DRV_LOG_RAW(DEBUG, S, ##args) > +#define DEBUGOUT(S, args...)PMD_DRV_LOG_RAW(DEBUG, S "\n", ##args) > #define DEBUGOUT1(S, args...) DEBUGOUT(S, ##args) > #define DEBUGOUT2(S, args...) DEBUGOUT(S, ##args) > #define DEBUGOUT3(S, args...) DEBUGOUT(S, ##args) I think this ends up adding a redundant "\n" to several other DEBUGOUT statements then. Maybe you should update it so that DEBUGFUNC adds the "\n" instead of DEBUGOUT. - Alex
[dpdk-dev] Any chance someone could fix the SPF records for this mailing list?
I have noticed a number of emails from this list are going to spam. It looks like it might be gmail filtering based on the fact that most of the list has a valid SPF based on an IPv4 address that reports out like below: Received: from dpdk.org (dpdk.org. [92.243.14.124]) by mx.google.com with ESMTP id eq6si3307415wib.54.2015.06.03.11.21.20; Wed, 03 Jun 2015 11:21:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of dev-bounces at dpdk.org designates 92.243.14.124 as permitted sender) client-ip=92.243.14.124; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of dev-bounces at dpdk.org designates 92.243.14.124 as permitted sender) smtp.mail=dev-bounces at dpdk.org However the ones that are going straight into my spam folder list an IPv6 address that is rated neutral by the SPF: Received: from dpdk.org ([2001:4b98:dc0:41:216:3eff:fe72:dd13]) by mx.google.com with ESMTP id cy1si1185671wib.89.2015.06.03.04.00.32; Wed, 03 Jun 2015 04:00:37 -0700 (PDT) Received-SPF: neutral (google.com: 2001:4b98:dc0:41:216:3eff:fe72:dd13 is neither permitted nor denied by best guess record for domain of dev-bounces at dpdk.org) client-ip=2001:4b98:dc0:41:216:3eff:fe72:dd13; Authentication-Results: mx.google.com; spf=neutral (google.com: 2001:4b98:dc0:41:216:3eff:fe72:dd13 is neither permitted nor denied by best guess record for domain of dev-bounces at dpdk.org) smtp.mail=dev-bounces at dpdk.org I was just wondering if it would be possible to get the IPv6 address added as a permitted sender for the domain to help reduce the amount of messages that are likely being flagged as spam for myself and likely others. Thanks. - Alex
[dpdk-dev] Rx_missed_errors drops with larger packet size
On Fri, May 20, 2016 at 2:09 AM, SwamZ wrote: > Hi, > > > While doing performance testing with larger packet size (like 4000 bytes), > we are seeing rx_missed_errors on the interface. This issue is not seen > with packet size less than 2000. There were questions asked in this forum > on rx_missed_error, but there was not any conclusion. Please let me know > what could be the reason for these drops. > > > I tried the following without any luck: > > 1) Tried different burst size like 16, 32 an 64 > > 2) Tried different number of the rx descriptors like 512, 1024, 2048 and > 4096. I've seen issues like this occur on the 82599 using the Linux kernel diver in the past, but it was usually related to CPU C states. I'm assuming that shouldn't apply since you are using the poll mode driver right so there are no interrupts and the CPUs are busy polling. Another thing to check is that you have sufficient PCIe bandwidth. Take a look at an lspci -vvv for the device and make certain it is connected at x8 Gen2. > Setup and testing details: > > CPU Speed : 2.6 GHz > > NIC : 82599ES 10-Gigabit > > IO Virtualization: SR-IOV > > IO and packet processing cores are in the same NUMA. If you are running SR-IOV then the SRRCTL_DROP_EN bit should be set in all the SRRCTL registers if I am not mistaken. As such if you are triggering Rx missed errors it may not be related to the driver directly but could point to some sort of PCIe but utilization issue. How many VFs are you using and is any of this traffic multicast or broadcast? Multicast or broadcast traffic can potentially saturate the PCIe bus if there is enough of it on the wire as each packet is normally replicated for each active VF. So 1 packet on the wire (4K) can become 8 on the PCIe bus (32K) if you have 7 active VFs. > Packet size : 4000 bytes > > Traffic rate: 80 % line rate. If the traffic rate is < ~80% then drops are > not seen. > > Application: This is a sample application developed using L3fwd example app. > > DPDK version: 1.7 > Thanks, > Swamy