Gaetan, Konstantin, Thomas Any response to my suggestion below?
From: Matan Azrad > Hi > > From: Gaëtan Rivet [mailto:gaetan.ri...@6wind.com] <snip> > > > > > > > Look, > > > > > > > > Testpmd initiates some of its internal databases depends > > > > > > > > on specific port iteration, In some time someone may take > > > > > > > > ownership of Testpmd ports and testpmd will continue to > > > > > > > > touch > > them. > > > > > > > > > > > > But if someone will take the ownership (assign new owner_id) > > > > > > that port will not appear in RTE_ETH_FOREACH_DEV() any more. > > > > > > > > > > > > > > > > Yes, but testpmd sometimes depends on previous iteration using > > internal database. > > > > > So it uses internal database that was updated by old iteration. > > > > > > > > That sounds like just a bug in testpmd that need to be fixed, no? > > > > > > If Testpmd already took ownership for these ports(like I did), it is ok. > > > > > > > Have you tested using the default iterator (NO_OWNER)? > > It worked until now with the bare minimal device tagging using > > DEV_DEFERRED. Testpmd did not seem to mind having to skip this port. > > > > I'm sure there were places where this was overlooked, but overall, I'd > > think everything should be fixable using only the NO_OWNER iteration. > > I don't think so. > > > Can you point to a specific scenario (command line, chain of event) > > that would lead to a problem? > > > > I didn't construct a race test to catch testpmd issue, but I think without > this > patch, there is a lot of issues. > Go to the testpmd code (before ownership) and find usage of the old > iterator(after the first iteration in main), Ask yourself what should happen > if > exactly in this time, a new port is created by fail-safe(plug in event). > > > > > Any particular places where outdated device info is used? > > > > > > For example, look for the stream management in testpmd(I think I saw > > > it > > there). > > > > > > > The stream management is certainly shaky, but it happens after the EAL > > initial port creation, and is not able to update itself for new > > hotplugged ports (unless something changed). > > > > Yes, but conceptually someone in the future may take the port(because it > ownerless). > > > > > > > > If I look back on the fail-safe, its sole purpose is to have > > > > > > > seamless hotplug with existing applications. > > > > > > > > > > > > > > Port ownership is a genericization of some functions > > > > > > > introduced by the fail-safe, that could structure DPDK > > > > > > > further. It should allow applications to have a seamless > > > > > > > integration with subsystems using port ownership. Without > > > > > > > this, > > port ownership cannot be used. > > > > > > > > > > > > > > Testpmd should be fixed, but follow the most common design > > > > > > > patterns of DPDK applications. Going with port ownership > > > > > > > seems like a paradigm shift. > > > > > > > > > > > > > > > In addition > > > > > > > > Using the old iterator in some places in testpmd will > > > > > > > > cause a race for run- > > > > > > time new ports(can be created by failsafe or any hotplug code): > > > > > > > > - testpmd finds an ownerless port(just now created) by the > > > > > > > > old iterator and start traffic there, > > > > How does testpmd start traffic there? Testpmd has only a callback for > > displaying that it received an event for a new port. It has no concept > > of hotplugging beyond that. > > > > Yes, so no traffic just some control command. > > > Testpmd will not start using any new port probed using the hotplug API > > on its own, again, unless something has drastically changed. > > > > Every iterator using in testpmd is exposed to race. > > > > > > > > > - failsafe takes ownership of this new port and start traffic > > > > > > > > there. > > > > > > > > Problem! > > > > > > > > > > > > Could you shed a bit more light here - it would be race > > > > > > condition between whom and whom? > > > > > > > > > > Sure. > > > > > > > > > > > As I remember in testpmd all control ops are done within one > > > > > > thread (main lcore). > > > > > > > > > > But other dpdk entity can use another thread, for example: > > > > > Failsafe uses the host thread(using alarm callback) to create a > > > > > new port and > > > > to take ownership of a port. > > > > > > > > Hm, and you create new ports inside failsafe PMD, right and then > > > > set new owner_id for it? > > > > > > Yes. > > > > > > > And all this in alarm in interrupt thread? > > > > > > Yes. > > > > > > > If so I wonder how you can guarantee that no-one else will set > > > > different owner_id between > > > > rte_eth_dev_allocate() and rte_eth_dev_owner_set()? > > > > > > I check it (see failsafe patch to this series - V5). > > > Function: fs_bus_init. > > > > > > > Could you point me to that place (I am not really familiar with > > > > familiar with failsafe code)? > > > > > > > > > > > > > > The race: > > > > > Testpmd iterates over all ports by the master thread. > > > > > Failsafe takes ownership of a port by the host thread and start using > it. > > > > > => The two dpdk entities may use the device at same time! > > > > > > > > When can this happen? Fail-safe creates its initial pool of ports > > during EAL init, before testpmd scans eth_dev ports and configure its > streams. > > At that point, it has taken ownership, from the master lcore context. > > > > After this point, new ports could be detected and hotplugged by fail-safe. > > However, even if testpmd had a callback to capture those new ports and > > reconfigure its streams, it would be executed from within the > > intr-thread, same as failsafe. If the thread was interrupted, by a > > dataplane-lcore for example, streams would not have been reconfigured. > > The fail-safe would execute its callback and set the owner-id before > > the callback chains goes to the application. > > > > Some iterator may be invoked in plug out process by other thread in testpmd > and causes to control command > > > And that would only be if testpmd had any callback for hotplugging > > ports and reconfiguring its streams, which it hasn't, as far as I know. > > > > We don't need to implement it in testpmd. > > > > > Ok, if failsafe really assigns its owner_id(s) to ports that are > > > > already in use by the app, then how such scheme supposed to work > > > > at > > all? > > > > > > If the app works well (with the new rules) it already took ownership > > > and > > failsafe will see it and will wait until the application release it. > > > Every dpdk entity should know which port it wants to manage, If 2 > > > entities want to manage the same device - it can be ok and port > > > ownership > > can synchronize the usage. > > > > > > Probably, application which will run fail-safe wants to manage only > > > the fail- > > safe port and therefor to take ownership only for it. > > > > > > > I.E. application has a port - it assigns some owner_id != 0 to it, > > > > then PMD tries to set its owner_id tot the same port. > > > > Obviously failsafe's set_owner() will always fail in such case. > > > > > > > Yes, and will try again after some time. > > > > > > > From what I hear we need to introduce a concept of 'default owner id'. > > > > I.E. when failsafe PMD is created - user assigns some owner_id to > > > > it > > (default). > > > > Then failsafe PMD generates it's own owner_id and assigns it only > > > > to the ports whose current owner_id is equal either 0 or 'default' > owner_id. > > > > > > > > > > It is a suggestion and we need to think about it more (I'm talking > > > about it > > with Gaetan in another thread). > > > Actually I think, if we want a generic solution to the generic > > > problem the > > current solution is ok. > > > > > > > We could as well conclude this other thread there. > > > > The only solution would be to have a default relationship between > > owners, something that goes beyond the scope assigned by Thomas to > > your evolution, but would be necessary for this API to be properly > > used by existing applications. > > > > I think it's the only way to have a sane default behavior with your > > API, but I also think this goes beyong the scope of the DPDK altogether. > > > > But even with those considerations that could be ironed out later (API > > is still experimental anyway), in the meantime, I think we should > > strive not to break "userland" as much as possible. Meaning that > > unless you have a specific situation creating a bug, you shouldn't > > have to modify testpmd, and if an issues arises, you need to try to > > improve your API before resorting to changing the resource management > model of all existing applications. > > > > I understand it. > Suggestion: > > 2 system owners. > APP_OWNER - 1. > NO_OWNER - 0. > > And allowing for more owners as now. > > 1. Every port creation will set the owner for NO_OWNER (as now). > 2. There is option for all dpdk entities to take owner of NO_OWNER ports all > the time(as now). > 3. In some point in the end of EAL init: set all the NO_OWNER to > APP_OWNER(for V6). > 4. Change the old iterator to iterate over APP_OWNER ports(for V6). > > What do you think? > > <snip>