> 30/06/2020 14:07, Ananyev, Konstantin: > > > 26/06/2020 16:43, David Marchand: > > > > On Wed, Jun 24, 2020 at 1:59 PM Ananyev, Konstantin > > > > <konstantin.anan...@intel.com> wrote: > > > > > > > Do you mean - make this new dynamic-lcore API return an error if > > > > > > > callied > > > > > > > from secondary process? > > > > > > > > > > > > Yes, and prohibiting from attaching a secondary process if dynamic > > > > > > lcore API has been used in primary. > > > > > > I intend to squash in patch 6: > > > > > > https://github.com/david-marchand/dpdk/commit/e5861ee734bfe2e4dc23d9b919b0db2a32a58aee > > > > > > > > > > But secondary process can attach before lcore_register, so we'll have > > > > > some sort of inconsistency in behaviour. > > > > > > > > If the developer tries to use both features, he gets an ERROR log in > > > > the two init path. > > > > So whatever the order at runtime, we inform the developer (who did not > > > > read/understand the rte_thread_register() documentation) that what he > > > > is doing is unsupported. > > > > > > I agree. > > > Before this patch, pinning a thread on a random core can > > > trigger some issues. > > > After this patch, register an external thread will > > > take care of logging errors in case of inconsistencies. > > > So the user will know he is doing something not supported > > > by the app. > > > > I understand that, and return a meaningful error is definitely > > better the silent crash or memory corruption. > > The problem with that approach, as I said before, MP group > > behaviour becomes non-deterministic. > > It was already non-deterministic before these patches. > > > It is an nice improvement. > > > > > > > > If we really want to go ahead with such workaround - > > > > > > It is not a workaround. > > > It is fixing some old issues and making clear what is really impossible. > > > > The root cause of the problem is in our MP model design decisions: > > from one side we treat lcore_id as process local data, from other side > > in some shared data-structures we use lcore_id as an index. > > I think to fix it properly we need either: > > make lcore_id data shared or stop using lcore_id as an index for shared > > data. > > So from my perspective this approach is just one of possible workarounds. > > BTW, there is nothing wrong to have a workaround for the problem > > we are not ready to fix right now. > > I think you are trying to fix multi-process handling. > This patch is not about multi-process, it only highlight incompatibilities.
Yes, the problem has been there for a while. David's patch just made it more visible. We discussing different workarounds for the problem. > > > > > probably better to introduce explicit EAL flag ( --single-process or > > > > > so). > > > > > As Thomas and Bruce suggested, if I understood them properly. > > > > > > No I was thinking to maintain the tri-state information: > > > - secondary is possible > > > - secondary is attached > > > - secondary is forbidden > > > > Ok, then I misunderstood you. > > > > > Asking the user to use an option to forbid attaching a secondary process > > > is the same as telling him it is forbidden. > > > > I don't think it is the same. > > On a live and complex system user can't always predict will the primary proc > > use dynamic lcore and if it will at what particular moment. > > Same for secondary process launching - user might never start it, > > might start it straight after the primary one, > > or might be after several hours. > > I don't see the difference. > An app which register external threads is not compatible > with multi-process. It needs to be clear. > If the user tries to do it anyway, there can be some error, OK. Copying from other mail thread: Imagine the situation - there is a primary proc (supposed to run forever) that does rte_thread_register/rte_thread_unregister during its lifetime. Plus from time to time user runs some secondary process to collect stats/debug the primary one (proc-info or so). Now behaviour of such system will be non-deterministic: In some runs primary proc will do rte_thread_register() first, and then secondary proc will be never able to attach. In other cases - secondary will win the race, and then for primary eal_lcore_non_eal_allocate() will always fail. Which means different behaviour between runs, varying performance, etc. > > > The error log is enough in my opinion. > > > > I think it is better than nothing, but probably not the best one. > > Apart from possible non-consistent behaviour, it is quite restrictive: > > dynamic lcore_id wouldn't be available on any DPDK MP deployment. > > Which is a pity - I think it is a cool and useful feature. > > So you are asking to extend the feature. I am asking for solution that would guarantee deterministic behaviour to the user. If dynamic lcores and MP support need to be mutually exclusive, then there should be a clean way for the user to *always* enable one and disable the other. "--proc-type=standalone" will at least guarantee such consistent behaviour between runs: secondary proc will always fail to attach and eal_lcore_non_eal_allocate() will always succeed (as long as there are free lcore_ids off-course). Though I think even better would be not to make them mutually exclusive, but instead let user to split lcore_id space accordingly. Let me list the options currently under discussion: a) New EAL parameter '--lcore-allow=...' Explicit EAL parameter to enable dyn-lcore=Y Consistent behaviour between runs=Y DYN-lcores/MP-support are mutually exclusive=N b) Extend '--proc-type' EAL parameter with new 'standalone' type Explicit EAL parameter to enable dyn-lcore =Y Consistent behaviour between runs=Y Dyn lcores/MP-support are mutually exclusive=Y c) dynamic allow/forbid dynamic-lcore/MP support Explicit EAL parameter=N Consistent behaviour between runs=N Dyn lcores/MP-support are mutually exclusive=Y My preference list (from top to bottom): a, b, c. > Honestly, I'm not a fan of multi-process, > so I would not push any feature for it. Me too, but as we can't drop it, we probably have no choice but to live with it. > > If we don't add any new option now, and restrict MP handling > to error messages, it would not prevent from extending > in future, right? It shouldn't I think. Though what is the urgency to push this feature without having an agreement first? > > > What do you guys think about different approach: > > introduce new optional EAL parameter to restrict lcore_id > > values available for the process. > > > > #let say to start primary proc that can use lcore_id=[0-99] only: > > dpdk_primary --lcore-allow=0-99 ... --file-prefix=xz1 > > > > #to start secondary one for it with allowed lcore_id=[100-109]: > > dpdk_secondary --lcore-allow=100-109 ... --file-prefix=xz1 > > --proc-type=secondary > > > > It is still a workaround, but that way we don't need to > > add any new limitations for dynamic lcores and secondary process usage. > > Now it is up to user to decide would multiple-process use the same shared > > data > > and if so - split lcore_id space properly among them > > (same as he has to do now with static lcores). > > Isn't it pushing too much to the user? User has to do the similar thing with static lcores right now. > > > > > A EAL flag is a stable API from the start, as there is nothing > > > > describing how we can remove one. > > > > So a new EAL flag for an experimental API/feature seems contradictory. > > > > > > > > Going with a new features status API... I think it is beyond this > > > > series. > > > > > > > > Thomas seems to suggest an automatic resolution when features conflict > > > > happens.. ? > > > > > > I suggest allowing the maximum and raise an error when usage conflicts. > > > It seems this is what you did in v4. > > > > > > > I'll send the v4, let's discuss it there if you want. > >