On Mon, Jun 16, 2014 at 08:59:25AM +0000, Doherty, Declan wrote: > > -----Original Message----- > > From: Neil Horman [mailto:nhorman at tuxdriver.com] > > Sent: Friday, June 13, 2014 8:38 PM > > To: Doherty, Declan > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library > > (librte_eal/librte_ether link bonding support changes) > > > > On Fri, Jun 13, 2014 at 06:34:04PM +0000, Doherty, Declan wrote: > > > > -----Original Message----- > > > > From: Neil Horman [mailto:nhorman at tuxdriver.com] > > > > Sent: Friday, June 13, 2014 5:08 PM > > > > To: Doherty, Declan > > > > Cc: dev at dpdk.org > > > > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library > > > > (librte_eal/librte_ether link bonding support changes) > > > > > > > > On Fri, Jun 13, 2014 at 03:41:59PM +0100, Declan Doherty wrote: > > > > > Updating functionality in EAL to support adding link bonding > > > > > devices via ?vdev option. Link bonding devices will be > > > > > initialized after all physical devices have been probed and > > > > > initialized. > > > > > > > > > > Signed-off-by: Declan Doherty <declan.doherty at intel.com> > > > > > --- > > > > > lib/librte_eal/common/eal_common_dev.c | 66 > > > > +++++++++++++++++++++++++++-- > > > > > lib/librte_eal/common/eal_common_pci.c | 6 +++ > > > > > lib/librte_eal/common/include/eal_private.h | 7 +++ > > > > > lib/librte_eal/common/include/rte_dev.h | 1 + > > > > > lib/librte_ether/rte_ethdev.c | 34 +++++++++++++-- > > > > > lib/librte_ether/rte_ethdev.h | 7 ++- > > > > > lib/librte_pmd_pcap/rte_eth_pcap.c | 22 +++++----- > > > > > lib/librte_pmd_ring/rte_eth_ring.c | 32 +++++++------- > > > > > lib/librte_pmd_ring/rte_eth_ring.h | 3 +- > > > > > lib/librte_pmd_xenvirt/rte_eth_xenvirt.c | 2 +- > > > > > 10 files changed, 144 insertions(+), 36 deletions(-) > > > > > > > > > > diff --git a/lib/librte_eal/common/eal_common_dev.c > > > > b/lib/librte_eal/common/eal_common_dev.c > > > > > index eae5656..b50c908 100644 > > > > > --- a/lib/librte_eal/common/eal_common_dev.c > > > > > +++ b/lib/librte_eal/common/eal_common_dev.c > > > > > @@ -75,14 +75,28 @@ rte_eal_dev_init(void) > > > > > > > > > > /* call the init function for each virtual device */ > > > > > TAILQ_FOREACH(devargs, &devargs_list, next) { > > > > > + uint8_t bdev = 0; > > > > > > > > > > if (devargs->type != RTE_DEVTYPE_VIRTUAL) > > > > > continue; > > > > > > > > > > TAILQ_FOREACH(driver, &dev_driver_list, next) { > > > > > - if (driver->type != PMD_VDEV) > > > > > + /* RTE_DEVTYPE_VIRTUAL can only be a virtual or > > > > > bonded > > > > device*/ > > > > > + if (driver->type != PMD_VDEV && driver->type != > > > > PMD_BDEV) > > > > > continue; > > > > > > > > > > + /* > > > > > + * Bonded devices are not initialize here, we > > > > > do it later in > > > > > + * rte_eal_bonded_dev_init() after all physical > > > > > devices > > > > have been > > > > > + * probed and initialized > > > > > + */ > > > > > + if (driver->type == PMD_BDEV && > > > > > + !strncmp(driver->name, devargs- > > > > >virtual.drv_name, > > > > > + > > > > > strlen(driver->name))) { > > > > > + bdev = 1; > > > > > + break; > > > > > + } > > > > > + > > > > I really don't think you need to add a new device type for bonded devs. > > > > Its got > > > > no specific hardware that it drives, and you configure it with a --vdev > > command, > > > > so treat it as one here. I understand that you need to pass additional > > > > information about slaves to a bonded device, which is fine, but you can > > > > do that > > > > with kvargs pretty easily, at which point its just another vdev. The > > > > only other > > > > requirement is that you initilize the bonded vdev after the slave vdevs > > > > have > > > > been created, which you can do by any of several methods (a priority > > > > field to > > > > indicate that bonded drivers should be initilized last/later, a > > > > deferral return > > > > code from the init routine, or by dead reckoning via the careful > > > > construction > > of > > > > the application command line (placed the bonded --vdev option last on > > > > the > > > > command line argument list at run time). > > > > > > > It was my initial intent to do as you have describe above, but the > > > physical devices > > > cause a real issue here, physical devices don't call through to > > rte_eth_dev_allocate until > > > during rte_eal_pci_probe call, so it's not possible to initialize the > > > bonded device > > from > > > within rte_eal_dev_init as the physical devices have not been fully > > > initialized at > > this > > > point, as a port_id has not been allocated and can't be added as bonding > > > slaves. I > > don't > > > see away around this without changing the EAL API, which I've tried to > > > avoid with > > this solution. > > > > > > Ordering isn't an issue, and can easily be solved if the above problem > > > didn't > > exist, and although > > > a new device type isn't technically required, I think it's a cleaner > > > solution than > > doing > > > string comparisons. > > > > > > > > > Ugh, you're right. That stinks. We still shouldn't just drop the init code > > in > > with the pci probe code though, especially if dpdk starts supporting more > > stacked network devices. A notification system for interface creation that > > you > > could monitor for pci_device addresses (B:D:F tupples) would be nice, but > > might > > be overkill since dpdk doesn't really support device freeing yet. what > > might be > > useful is a generic ordering mechanism for rte_eal_dev_init. If you added a > > priority field to the rte_driver structure, you could create an enumeration > > for > > the field to define when the init routine for that driver was called > > something > > like: > > typdef enum { > > PMD_INIT_FIRST = 0, > > PMD_INIT_POST_PROBE = 1, > > } > > > > Then you could make multiple calls to rte_eal_dev_init from rte_eal_init at > > the > > appropriate places in the initalization process, each time passing one of > > the > > above values. rte_eal_dev_init would only initalize those drivers whos > > priority > > matched the passed in priority. That would be helpful for interfaces like > > vlans, vxlans, tunnels, etc. > > > I'm not sure I understand how you intend this to work, are you proposing that > rte_eal_dev_init is call multiple times by the user pre and post > rte_eal_pci_probe Mostly, I'm proposing that rte_eal_pci_probe be called by rte_eal_init, and that we then call rte_eal_dev_init multiple times from within rte_eal_dev_init, passing a parameter to indicate the priority level of device that should be initalized within that call.
> this doesn't seem like an idea solution either. I'm not 100% clear why > rte_eal_pci_probe is currently called by the application code and not > initiated > from within rte_eal_dev_init, if this was the case we would be able to > figure out > a dependency tree for initialization of devices, based on the parsed input > arguments without having extra user input to or multiple calls of > rte_eal_dev_init. > I agree, I think you should move it into its proper place within rte_eal_init, though I think multiple calls to rte_eal_dev_init is perfectly acceptible in this scenario. You certainly could parse the command line to build a dependency tree if you wish, though I don't think thats crruently overly complex. The current dependency tree is static at (physical devices, virtual devices, probe, stacked devices). If you need something more complex later, you can re-write it freely as it will be an internal library mechanism with no external visibility. > Either way I don't think that any of these solution are trivial and they > would > need to be thoroughly tested, which I don't think will be possible in the time > frame for 1.7. If my current implementation calling rte_eal_bonded_dev_init > from within rte_eal_pci_probe is unacceptable, then I propose that I remove > the EAL changes for --vdev initialization of bonding devices from this > release and > only support the C API, and that an overall strategy is discussed for static > device > initialization that is developed for 1.7.1 or 1.8. > I would assert that if you can't write the pmd to support the existing pmd interfaces, then this probably needs to wait until 1.8. I'll bow to consensus on that, but I would think that, if it gets in now, the additional work won't happen. Neil > > > > > > > > > > +static int > > > > > +rte_eth_dev_name_unique(const char* name) > > > > > +{ > > > > > + unsigned i; > > > > > + > > > > > + for (i = 0; i < nb_ports; i++) { > > > > > + if (strcmp(rte_eth_devices[i].data->name, name) == 0) > > > > > + return -1; > > > > > + } > > > > > + > > > > > + return 0; > > > > > +} > > > > > + > > > > > struct rte_eth_dev * > > > > > -rte_eth_dev_allocate(void) > > > > > +rte_eth_dev_allocate(const char* name) > > > > > { > > > > > struct rte_eth_dev *eth_dev; > > > > > > > > > > @@ -165,23 +179,37 @@ rte_eth_dev_allocate(void) > > > > > if (rte_eth_dev_data == NULL) > > > > > rte_eth_dev_data_alloc(); > > > > > > > > > > + if (rte_eth_dev_name_unique(name)) { > > > > > + PMD_DEBUG_TRACE("Ethernet Device with name %s already > > > > allocated!\n"); > > > > > + return NULL; > > > > > + } > > > > > + > > > > This seems fairly racy if you allow dynamic device creation at run time > > > > from > > the > > > > application, if multiple threads attempt to create bonds in parallel. > > > > > > True but if this is an issue then there probably should be some locking > > > around > > this allocation anyway. > > Agreed the allocation of the port id is also racy. > > > > > > > > > > > /** > > > > > diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c > > > > b/lib/librte_pmd_pcap/rte_eth_pcap.c > > > > Hmm, we're modifying other pmds for the naming feature, I think it > > > > would be > > best > > > > split out into a separate patch. Something entitled "support unique > > > > interface > > > > naming for virtual pmds" or something. > > > > > > True, but if I do that, I will need to break support for virtual devices > > > in this patch > > set, and > > > I would prefer to keep this functionality working initial release of link > > > bonding. > > Also another > > > patchset will be required at some point in the near future to tidy > > > up/rationalize > > the current > > > handling of virtual devices identification and these changes do not change > > functionally how > > > these devices work. > > > > I'm not sure I'm following what you're saying here. If you split this into > > another patch set, and order earlier in this series, then you won't break > > anything. I'm not suggesting a functional change here, just a > > organizational > > one, so that this change doesn't do more than the changelog indicates. > > > > Sorry I misunderstood, I thought that you where proposing that I create a > completely new > patchset separate to the link bonding release set for these changes. > > > Neil > > Declan