Hi all From: Thomas Monjalon, Friday, January 19, 2018 8:17 PM > 19/01/2018 19:13, Ferruh Yigit: > > On 1/19/2018 5:54 PM, Thomas Monjalon wrote: > > > 19/01/2018 17:19, Ferruh Yigit: > > >> On 1/18/2018 6:10 PM, Matan Azrad wrote: > > >>> From: Ferruh Yigit, Thursday, January 18, 2018 7:31 PM > > >>>> This patch updates *all* ethdev public APIs to add if device is > > >>>> removed check? > > >>> > > >>> Yes. > > >>> > > >>>> And each check goes to ethdev is_removed() dev_ops to ask if dev > > >>>> is removed. > > >>> Probably, if the REMOVED state setted in will not call device > is_remove. > > >>> > > >>>> These must be better way of doing this, am I missing something. > > >>> > > >>> Suggest. > > >> > > >> With a silly analogy, this is like a blind person asking each time > > >> if he is dead before talking to other person.
Just to accurate the analogy:) This is like a blind person(application,ethdev) using its guide dog(ethdev device), every time the dog refuses to take action (error occurred), the blind person asks if the dog can be a guide dog anymore(removal error). > > >> At first glance I can think of a kind of watchdog timer can be > > >> implemented in ethdev layer. It provides periodic checks and if > > >> device is dead it calls the registered user callback function. > > >> > > >> This method presented as synchronous method but not triggered from > > >> side where event happens, I mean not triggered from PMD but from > application. > > >> So does application doing polling continuously if device is dead? > > >> Or if application is relying this patch to add a check in each API, > > >> what happens if device removed during data processing, will app rely on > asynchronous method? > > > > > > We cannot put a mutex on hardware removal :) So we have to live with > > > errors du to removal. > > > If we are trying to configure a removed device, the error will be > > > not related to the root cause. > > > This patch is just trying to improve the situation by returning an > > > appropriate error code if removal can be detected. > > > Note: the check is run only if there is an error and if the removal > > > is not already detected. > > > > > > If I understand well, you prefer relying only on asynchronous > > > hotplug events? Even if they come really late? > > > > I think asynchronous hotplug events are better approach, but if they > > are late I understand you need a solution. > > > > I assume issue is when device is removed, application can make API > > calls until it detects the removal. > > > > For that case what do you think instead of ethdev abstraction layer > > doing a translation, define a specific error type and return it from > > PMDs to indicate that error is related to the missing device and > > application will know device is no more there? And consistently use that > error type in all PMDs. > > Yes it is also a good solution. > I think Matan proposed to integrate it in ethdev to avoid duplicating the > same mechanism in every PMDs. > Yes, as a lot of ethdev API code pieces do. > > Or what do you think above suggested flexible watchdog timer > > implementation in ethdev, so applications may be informed about missing > device faster. > > I think the watchdog would compete with hotplug events. > So I prefer not integrating one more asynchronous mechanism for the same > purpose. > If we want polling, it can be an option to enable in EAL hotplug. Konstantin wrote in another thread: >+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0); >+ >+ dev = &rte_eth_devices[port_id]; >+ >+ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0); > I'd says these 2 checks have to be swapped. Konstantin, Please explain why.