> -----Original Message----- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Friday, June 13, 2014 8:38 PM > To: Doherty, Declan > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library > (librte_eal/librte_ether link bonding support changes) > > On Fri, Jun 13, 2014 at 06:34:04PM +0000, Doherty, Declan wrote: > > > -----Original Message----- > > > From: Neil Horman [mailto:nhorman at tuxdriver.com] > > > Sent: Friday, June 13, 2014 5:08 PM > > > To: Doherty, Declan > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library > > > (librte_eal/librte_ether link bonding support changes) > > > > > > On Fri, Jun 13, 2014 at 03:41:59PM +0100, Declan Doherty wrote: > > > > Updating functionality in EAL to support adding link bonding > > > > devices via ?vdev option. Link bonding devices will be > > > > initialized after all physical devices have been probed and > > > > initialized. > > > > > > > > Signed-off-by: Declan Doherty <declan.doherty at intel.com> > > > > --- > > > > lib/librte_eal/common/eal_common_dev.c | 66 > > > +++++++++++++++++++++++++++-- > > > > lib/librte_eal/common/eal_common_pci.c | 6 +++ > > > > lib/librte_eal/common/include/eal_private.h | 7 +++ > > > > lib/librte_eal/common/include/rte_dev.h | 1 + > > > > lib/librte_ether/rte_ethdev.c | 34 +++++++++++++-- > > > > lib/librte_ether/rte_ethdev.h | 7 ++- > > > > lib/librte_pmd_pcap/rte_eth_pcap.c | 22 +++++----- > > > > lib/librte_pmd_ring/rte_eth_ring.c | 32 +++++++------- > > > > lib/librte_pmd_ring/rte_eth_ring.h | 3 +- > > > > lib/librte_pmd_xenvirt/rte_eth_xenvirt.c | 2 +- > > > > 10 files changed, 144 insertions(+), 36 deletions(-) > > > > > > > > diff --git a/lib/librte_eal/common/eal_common_dev.c > > > b/lib/librte_eal/common/eal_common_dev.c > > > > index eae5656..b50c908 100644 > > > > --- a/lib/librte_eal/common/eal_common_dev.c > > > > +++ b/lib/librte_eal/common/eal_common_dev.c > > > > @@ -75,14 +75,28 @@ rte_eal_dev_init(void) > > > > > > > > /* call the init function for each virtual device */ > > > > TAILQ_FOREACH(devargs, &devargs_list, next) { > > > > + uint8_t bdev = 0; > > > > > > > > if (devargs->type != RTE_DEVTYPE_VIRTUAL) > > > > continue; > > > > > > > > TAILQ_FOREACH(driver, &dev_driver_list, next) { > > > > - if (driver->type != PMD_VDEV) > > > > + /* RTE_DEVTYPE_VIRTUAL can only be a virtual or > > > > bonded > > > device*/ > > > > + if (driver->type != PMD_VDEV && driver->type != > > > PMD_BDEV) > > > > continue; > > > > > > > > + /* > > > > + * Bonded devices are not initialize here, we > > > > do it later in > > > > + * rte_eal_bonded_dev_init() after all physical > > > > devices > > > have been > > > > + * probed and initialized > > > > + */ > > > > + if (driver->type == PMD_BDEV && > > > > + !strncmp(driver->name, devargs- > > > >virtual.drv_name, > > > > + > > > > strlen(driver->name))) { > > > > + bdev = 1; > > > > + break; > > > > + } > > > > + > > > I really don't think you need to add a new device type for bonded devs. > > > Its got > > > no specific hardware that it drives, and you configure it with a --vdev > command, > > > so treat it as one here. I understand that you need to pass additional > > > information about slaves to a bonded device, which is fine, but you can > > > do that > > > with kvargs pretty easily, at which point its just another vdev. The only > > > other > > > requirement is that you initilize the bonded vdev after the slave vdevs > > > have > > > been created, which you can do by any of several methods (a priority > > > field to > > > indicate that bonded drivers should be initilized last/later, a deferral > > > return > > > code from the init routine, or by dead reckoning via the careful > > > construction > of > > > the application command line (placed the bonded --vdev option last on the > > > command line argument list at run time). > > > > > It was my initial intent to do as you have describe above, but the physical > > devices > > cause a real issue here, physical devices don't call through to > rte_eth_dev_allocate until > > during rte_eal_pci_probe call, so it's not possible to initialize the > > bonded device > from > > within rte_eal_dev_init as the physical devices have not been fully > > initialized at > this > > point, as a port_id has not been allocated and can't be added as bonding > > slaves. I > don't > > see away around this without changing the EAL API, which I've tried to > > avoid with > this solution. > > > > Ordering isn't an issue, and can easily be solved if the above problem > > didn't > exist, and although > > a new device type isn't technically required, I think it's a cleaner > > solution than > doing > > string comparisons. > > > > > Ugh, you're right. That stinks. We still shouldn't just drop the init code in > with the pci probe code though, especially if dpdk starts supporting more > stacked network devices. A notification system for interface creation that you > could monitor for pci_device addresses (B:D:F tupples) would be nice, but > might > be overkill since dpdk doesn't really support device freeing yet. what might > be > useful is a generic ordering mechanism for rte_eal_dev_init. If you added a > priority field to the rte_driver structure, you could create an enumeration > for > the field to define when the init routine for that driver was called something > like: > typdef enum { > PMD_INIT_FIRST = 0, > PMD_INIT_POST_PROBE = 1, > } > > Then you could make multiple calls to rte_eal_dev_init from rte_eal_init at > the > appropriate places in the initalization process, each time passing one of the > above values. rte_eal_dev_init would only initalize those drivers whos > priority > matched the passed in priority. That would be helpful for interfaces like > vlans, vxlans, tunnels, etc. > I'm not sure I understand how you intend this to work, are you proposing that rte_eal_dev_init is call multiple times by the user pre and post rte_eal_pci_probe this doesn't seem like an idea solution either. I'm not 100% clear why rte_eal_pci_probe is currently called by the application code and not initiated from within rte_eal_dev_init, if this was the case we would be able to figure out a dependency tree for initialization of devices, based on the parsed input arguments without having extra user input to or multiple calls of rte_eal_dev_init.
Either way I don't think that any of these solution are trivial and they would need to be thoroughly tested, which I don't think will be possible in the time frame for 1.7. If my current implementation calling rte_eal_bonded_dev_init from within rte_eal_pci_probe is unacceptable, then I propose that I remove the EAL changes for --vdev initialization of bonding devices from this release and only support the C API, and that an overall strategy is discussed for static device initialization that is developed for 1.7.1 or 1.8. > > > > > > > > +static int > > > > +rte_eth_dev_name_unique(const char* name) > > > > +{ > > > > + unsigned i; > > > > + > > > > + for (i = 0; i < nb_ports; i++) { > > > > + if (strcmp(rte_eth_devices[i].data->name, name) == 0) > > > > + return -1; > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > + > > > > struct rte_eth_dev * > > > > -rte_eth_dev_allocate(void) > > > > +rte_eth_dev_allocate(const char* name) > > > > { > > > > struct rte_eth_dev *eth_dev; > > > > > > > > @@ -165,23 +179,37 @@ rte_eth_dev_allocate(void) > > > > if (rte_eth_dev_data == NULL) > > > > rte_eth_dev_data_alloc(); > > > > > > > > + if (rte_eth_dev_name_unique(name)) { > > > > + PMD_DEBUG_TRACE("Ethernet Device with name %s already > > > allocated!\n"); > > > > + return NULL; > > > > + } > > > > + > > > This seems fairly racy if you allow dynamic device creation at run time > > > from > the > > > application, if multiple threads attempt to create bonds in parallel. > > > > True but if this is an issue then there probably should be some locking > > around > this allocation anyway. > Agreed the allocation of the port id is also racy. > > > > > > > > /** > > > > diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c > > > b/lib/librte_pmd_pcap/rte_eth_pcap.c > > > Hmm, we're modifying other pmds for the naming feature, I think it would > > > be > best > > > split out into a separate patch. Something entitled "support unique > > > interface > > > naming for virtual pmds" or something. > > > > True, but if I do that, I will need to break support for virtual devices in > > this patch > set, and > > I would prefer to keep this functionality working initial release of link > > bonding. > Also another > > patchset will be required at some point in the near future to tidy > > up/rationalize > the current > > handling of virtual devices identification and these changes do not change > functionally how > > these devices work. > > I'm not sure I'm following what you're saying here. If you split this into > another patch set, and order earlier in this series, then you won't break > anything. I'm not suggesting a functional change here, just a organizational > one, so that this change doesn't do more than the changelog indicates. > Sorry I misunderstood, I thought that you where proposing that I create a completely new patchset separate to the link bonding release set for these changes. > Neil Declan