> > 26/06/2020 16:43, David Marchand: > > On Wed, Jun 24, 2020 at 1:59 PM Ananyev, Konstantin > > <konstantin.anan...@intel.com> wrote: > > > > > Do you mean - make this new dynamic-lcore API return an error if > > > > > callied > > > > > from secondary process? > > > > > > > > Yes, and prohibiting from attaching a secondary process if dynamic > > > > lcore API has been used in primary. > > > > I intend to squash in patch 6: > > > > https://github.com/david-marchand/dpdk/commit/e5861ee734bfe2e4dc23d9b919b0db2a32a58aee > > > > > > But secondary process can attach before lcore_register, so we'll have > > > some sort of inconsistency in behaviour. > > > > If the developer tries to use both features, he gets an ERROR log in > > the two init path. > > So whatever the order at runtime, we inform the developer (who did not > > read/understand the rte_thread_register() documentation) that what he > > is doing is unsupported. > > I agree. > Before this patch, pinning a thread on a random core can > trigger some issues. > After this patch, register an external thread will > take care of logging errors in case of inconsistencies. > So the user will know he is doing something not supported > by the app.
I understand that, and return a meaningful error is definitely better the silent crash or memory corruption. The problem with that approach, as I said before, MP group behaviour becomes non-deterministic. > > It is an nice improvement. > > > > If we really want to go ahead with such workaround - > > It is not a workaround. > It is fixing some old issues and making clear what is really impossible. The root cause of the problem is in our MP model design decisions: from one side we treat lcore_id as process local data, from other side in some shared data-structures we use lcore_id as an index. I think to fix it properly we need either: make lcore_id data shared or stop using lcore_id as an index for shared data. So from my perspective this approach is just one of possible workarounds. BTW, there is nothing wrong to have a workaround for the problem we are not ready to fix right now. > > > probably better to introduce explicit EAL flag ( --single-process or so). > > > As Thomas and Bruce suggested, if I understood them properly. > > No I was thinking to maintain the tri-state information: > - secondary is possible > - secondary is attached > - secondary is forbidden Ok, then I misunderstood you. > Asking the user to use an option to forbid attaching a secondary process > is the same as telling him it is forbidden. I don't think it is the same. On a live and complex system user can't always predict will the primary proc use dynamic lcore and if it will at what particular moment. Same for secondary process launching - user might never start it, might start it straight after the primary one, or might be after several hours. > The error log is enough in my opinion. I think it is better than nothing, but probably not the best one. Apart from possible non-consistent behaviour, it is quite restrictive: dynamic lcore_id wouldn't be available on any DPDK MP deployment. Which is a pity - I think it is a cool and useful feature. What do you guys think about different approach: introduce new optional EAL parameter to restrict lcore_id values available for the process. #let say to start primary proc that can use lcore_id=[0-99] only: dpdk_primary --lcore-allow=0-99 ... --file-prefix=xz1 #to start secondary one for it with allowed lcore_id=[100-109]: dpdk_secondary --lcore-allow=100-109 ... --file-prefix=xz1 --proc-type=secondary It is still a workaround, but that way we don't need to add any new limitations for dynamic lcores and secondary process usage. Now it is up to user to decide would multiple-process use the same shared data and if so - split lcore_id space properly among them (same as he has to do now with static lcores). > > A EAL flag is a stable API from the start, as there is nothing > > describing how we can remove one. > > So a new EAL flag for an experimental API/feature seems contradictory. > > > > Going with a new features status API... I think it is beyond this series. > > > > Thomas seems to suggest an automatic resolution when features conflict > > happens.. ? > > I suggest allowing the maximum and raise an error when usage conflicts. > It seems this is what you did in v4. > > > I'll send the v4, let's discuss it there if you want. >