Hi Anatoly: Thanks for the review, see my reply in inline.
> -----Original Message----- > From: Burakov, Anatoly > Sent: Friday, June 15, 2018 11:16 PM > To: Zhang, Qi Z <qi.z.zh...@intel.com>; tho...@monjalon.net > Cc: Ananyev, Konstantin <konstantin.anan...@intel.com>; dev@dpdk.org; > Richardson, Bruce <bruce.richard...@intel.com>; Yigit, Ferruh > <ferruh.yi...@intel.com>; Shelton, Benjamin H > <benjamin.h.shel...@intel.com>; Vangati, Narender > <narender.vang...@intel.com> > Subject: Re: [PATCH 00/22] enable hotplug on multi-process > > Hi Qi, > > I haven't read the code yet, and i'll be the first to admit that i'm not too > well > versed on how shared/private device data works, so my apologies in advance > if all of below comments are addressed by implementation details or are way > off base! > > > > On 07-Jun-18 1:38 PM, Qi Zhang wrote: > > > > 1. Attach a share device from primary > > 2. Detach a share device from primary > > 3. Attach a share device from secondary 4. Detach a share device from > > secondary 5. Attach a private device from secondary 6. Detach a > > private device from secondary 7. Detach a share device from secondary > > privately 8. Attach a share device from secondary privately > > > > <...> > > > Case 7, 8: > > Secondary process can also temporally to detach a share device > > "privately" then attach it back later, this action also not impact other > > processes. > > > > Do we really need to implement these cases? It seems to me that this > "reattach it later" introduces unnecessary complexity. I agree it's not necessary, but this looks like a free feature based on current implementation :) >If secondary has > detached the device, I think it is safer if we cannot reattach it, > period, because it was a shared device. What if we try to attach it when > a handshake has already completed and all other processes expect to > detach it? in the case: attach back a shared device already be detached will fail as expected. For PCI devices, it will fail at driver probe. For vdev, it will failed at rte_eth_dev_attach_secondary. > > (in fact, do we differentiate between non-existent device and shared > device that has been "privately detached"? I would expect that we keep > the device as detached as opposed to forgetting about it, so that, come > handshake, we can safely reply "yeah, we can detach the device", but > maybe it's OK to not treat request to detach a non-existent device as an > error... thoughts? am i getting something wrong?) > > > APIs chenages: > > ============== > > > > rte_eth_dev_attach and rte_eth_dev_attach are extended to support > > share device attach/detach in primary-secondary process model, it will > > be called in case 1,2,3,4. > > > > New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are > > introduced to cover case 5,6,7,8, this API can only be invoked in > > secondary process. > > > > New API rte_eth_dev_lock and rte_eth_dev_unlock are introduced to let > > application lock or unlock on specific ethdev, a locked device > > can't be detached. This help applicaiton to prevent unexpected > > device detaching, especially in multi-process envrionment. > > Aslo the new API let application to register a callback function > > which will be invoked before a device is going to be detached, > > the return value of the function will decide if device will continue > > be detached or not, this support application to do condition check > > at runtime. > > I assume that you've added device locking to avoid having to do > handshake before detach, correct? Yes. >Is this a shared lock of some kind, or > is it a private lock? If it's shared lock, what happens if the process > holding that lock crashes? It’s a kind of process's private lock, but a shared device be locked any process will prevent it be detached. > > > > > PMD Impact: > > =========== > > > > Currently device removing is not handled well in secondary process on > most > > pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up > > primary process since it reset all shared data. So we introduced new API > > rte_eth_dev_release_port_local which only reset ethdev's state to unsued > but > > not touch shared data so other process will not be impacted. > > Since not all device driver is target to support primary-secondary > > process model, so the patch set only fix this on all Intel devices and > > vdev, it can be refereneced by other driver when equevalent fix is required > > Nitpick - why the naming mismatch between *_private() and *_local()? Agree. "private" make the API more identical. > > > > > Limitation: > > =========== > > > > The solution does not cover the case that primary process exit while > > secondary processes still be active. Though this is not a typial use > > case, but if this happens: > > 1. secondary process can't attach / detach any shared device since no > > primary exist. > > 2. secondary process still can attach / detach private device. > > 3. secondary process still can detach a share device privately but may > > not attach it back, that ethdev slot will become zombie slot. > > I think this should be explicit and by design. Shared devices can only > be communicated to all secondaries through a primary process. No primary > - no shared devices. I don't think we can do anything about it unless we > implement some kind of peer-to-peer IPC (which isn't happening as far as > i'm aware). Agree. > > > > Thanks for your work on this patchset! Thanks for the design review and all the helpful inputs. Regards Qi > > -- > Thanks, > Anatoly