Hi Adrien From: Adrien Mazarguil , Sent: Wednesday, January 31, 2018 4:32 PM > Hi Matan, > > On Wed, Jan 31, 2018 at 01:44:41PM +0000, Matan Azrad wrote: > > Hi Adrien > > > > From: Adrien Mazarguil, Sent: Wednesday, January 31, 2018 12:44 PM > > > On Wed, Jan 31, 2018 at 10:08:06AM +0000, Matan Azrad wrote: > > > > Hi all > > > > > > > > From: Adrien Mazarguil > > > > > On Tue, Jan 30, 2018 at 08:37:06PM +0000, Shahaf Shuler wrote: > > > > > > Tuesday, January 30, 2018 11:40 AM, Adrien Mazarguil: > > > > > > > Unfortunately I didn't get a chance to review this patch > > > > > > > before it was > > > > > applied. > > > > > > > I'm not sure a stopped port is supposed to report events > > > > > > > (interrupts). Will applications expect them to occur at this > > > > > > > point? > > > > > > > > > > > > Why not? > > > > > > > > > > > > Stopped port is still counted as attached. The fact the > > > > > > application stopped > > > > > the packet receive on it doesn't mean it should not receive a > > > > > sync events (such as the remove event). > > > > > > async events, by definition, are not related to traffic being > > > > > > flows through > > > > > the port. > > > > > > > > > > My comment is based on my understanding of rte_eth_dev_stop(), > > > which > > > > > is a device (or port) is completely stopped, in a suspended > > > > > state and no interrupts shall occur, as a means for applications > > > > > to temporarily not be bothered by them until restarted. > > > > > > > > Stopping traffic is not saying that the application is not > > > > interesting in the device, I think that you mean to dev_close(). > > > > > > No, dev_close() releases resources and destroys configuration. Good > > > luck using dev_start() or any other devop after dev_close(). > > > > I'm just saying here that when the user call dev_close() it means he > > is not interesting in the device any more, This is not the case in > > dev_stop(). > > > > > > Any event may still be usable for application between dev_stop() > > > > to dev_start(), especially RMV or LCS can still be interested. > > > > > > Possibly, but then, how come no PMD implements it that way? > > > > Again, maybe others PMDs make mistakes. > > It's basically the same as stating that for all these years, neither PMD nor > application maintainers understood the true meaning of this API. > > Maybe you're right, maybe not. What's for sure is this patch unilaterally > decided to modify an accepted behavior. Perhaps it's not a big deal but we > must be cautious.
Agree. I'm just saying my opinion here. > > > Neither did mlx4 before this patch got applied. At the very least it > > > cannot be considered a "fix". > > > > It fixes something. > > Technically every time a feature is added, its absence gets "fixed" :) :) > > > > > Think about it that way: applications do not want to get > > > > > interrupts immediately after the device is initialized, because > > > > > they might not be ready to process them at this point. An > > > > > explicit call to > > > > > rte_eth_dev_start() tells the PMD when it's OK to do so. The > > > > > converse is > > > rte_eth_dev_stop(). > > > > > > > > So, they can delay the event registration to the time they > > > > interesting in the > > > events. > > > > And use event unregister when they are not interesting in it anymore. > > > > > > Of course you can ask application maintainers to adapt to the new > > > behavior, or you know, leave things as they used to be. > > > > > > > I don't know what any application does but for me it is a mistake to > > stop all event processes in dev_stop(), Maybe for other application > maintainers too. > > Just like you, I don't know either what all the applications ever written for > DPDK expect out of dev_stop(). What's for sure is that currently, LSC/RMV > don't occcur afterward, the same way these events do not occur before > dev_start(). Why not? RMV event can occur any time after probe. > Any application possibly relying on this fact will break. In such a > situation, a conservative approach is better. If an application should fail to get event in stopped state it may fail in the previous code too: The interrupt run from host thread so the next race may occur: dev_start() : master thread. Context switch. RMV interrupt started to run callbacks: host thread. Context switch. dev_stop(): master thread. Start reconfiguration: master thread. Context switch. Callback running. So, the only thing which can disable callback running after dev_stop() is callback unregistration before it. > > > Setting up RMV/LSC callbacks is not the only configuration an > > > application usually performs before calling dev_start(). Think about > > > setting up flow rules, MAC addresses, VLANs, and so on, this on > > > multiple ports before starting them up all at once. Previously it > > > could be done in an unspecified order, now they have to take special care > for RMV/LSC. > > > > Or maybe there callbacks code are already safe for it. > > Or they manages the unregister\register calls in the right places. > > That's my point, these "maybes" don't argue in favor of changing things. What I'm saying is that callbacks should be safe or not registered in the right time. > > > Many devops are only safe when called while a device is stopped. > > > It's even documented in rte_ethdev.h. > > > > > > > And? > > ...And applications therefore often do all their configuration in an > unspecified > order while a port is stopped as a measure of safety. No extra care is taken > for RMV/LSC. This uncertainty can be addressed by not modifying the current > behavior. Or they expect to get interrupt and the corner case will come later if we will not change it. > > > > > Stopping traffic can already be achieved by not polling from the > > > > > application side, calling rte_eth_dev_[rt]x_queue_stop() and/or > > > > > toggling RX/TX interrupts through rte_eth_dev_[rt]x_intr_enable(). > > > > > rte_eth_dev_stop() provides lower-level device control. > > > > > > > > I think it makes sense only for Rx interrupt which is traffic > > > > oriented(like stop > > > and start). > > > > > > OK, looks like we disagree :) > > > > > > > > Perhaps documentation is not clear, however that's how LSC seems > > > > > implemented in all PMDs; it gets disabled after > > > > > rte_eth_dev_stop() and one should explicitly use > > > > > rte_eth_link_get() to retrieve link status afterward. I think > > > > > RMV should behave similarly with > > > rte_eth_dev_is_removed(). > > > > > Adapting fail-safe should be easier than modifying all the > > > > > remaining > > > PMDs. > > > > > > > > Or maybe PMDs which do it make mistakes. > > > > > > I'm not convinced mlx4 is the only PMD doing the right thing, we > > > should ask other maintainers as well as application writers' opinion > > > on the matter. If it's not a problem for RMV/LSC to occur while a > > > device is stopped, then I'll agree with the approach. It still won't make > > > it a > fix regardless. > > > > Let's think about RMV event, How can the application\other dpdk entities > to know if the device was removed when it was in stopped state? > > Checking it synchronically (by rte_eth_dev_is_removed()) can miss the > removal just a moment after the device came back again, errors will occur > and no one will know why. > > > > So, at least for RMV event, we need the notification also in stopped state. > > You sent the rte_eth_dev_is_removed() series. You're aware that PMDs > implementing this call benefit from an automatic is_removed() check on all > remaining devops whenever some error occur. > In short, an application will get notified simply by getting dev_start() (or > any > other callback) return -EIO and not being able to use the device. Yes, but between dev_stop to dev_start may not be any ethdev API calling. > PMDs that do not implement is_removed() (or in addition to it) could also > artificially trigger a RMV event after dev_start() is called. As long as the > PMD > remains quiet while the device is stopped, it's fine. How can the PMD do it after dev_start()? Initiate alarm in dev start function to do it later And entering into race again? I think it is not worth the effort and this patch is doing the right thing(also some other PMDs) . Thanks. > > > > > > > In my opinion it's not a fix, as in, it doesn't address an > > > > > > > issue introduced by the mentioned patch whose behavior was > correct. > > > > > > > > > > > > > > It's probably too late to change it now and it does address > > > > > > > an issue seen with a use case involving this PMD, however I > > > > > > > think the fail-safe PMD could as well poll using the > > > > > > > recently-added > > > > > > > rte_eth_dev_is_removed() when it's aware the underlying port > > > > > > > is > > > > > stopped instead of expecting interrupts. > > -- > Adrien Mazarguil > 6WIND