On Tue, Jun 30, 2020 at 12:07:32PM +0000, Ananyev, Konstantin wrote:
> > 
> > 26/06/2020 16:43, David Marchand:
> > > On Wed, Jun 24, 2020 at 1:59 PM Ananyev, Konstantin
> > > <konstantin.anan...@intel.com> wrote:
> > > > > > Do you mean - make this new dynamic-lcore API return an error if 
> > > > > > callied
> > > > > > from secondary process?
> > > > >
> > > > > Yes, and prohibiting from attaching a secondary process if dynamic
> > > > > lcore API has been used in primary.
> > > > > I intend to squash in patch 6:
> > > > > https://github.com/david-marchand/dpdk/commit/e5861ee734bfe2e4dc23d9b919b0db2a32a58aee
> > > >
> > > > But secondary process can attach before lcore_register, so we'll have 
> > > > some sort of inconsistency in behaviour.
> > >
> > > If the developer tries to use both features, he gets an ERROR log in
> > > the two init path.
> > > So whatever the order at runtime, we inform the developer (who did not
> > > read/understand the rte_thread_register() documentation) that what he
> > > is doing is unsupported.
> > 
> > I agree.
> > Before this patch, pinning a thread on a random core can
> > trigger some issues.
> > After this patch, register an external thread will
> > take care of logging errors in case of inconsistencies.
> > So the user will know he is doing something not supported
> > by the app.
> 
> I understand that, and return a meaningful error is definitely
> better the silent crash or memory corruption.
> The problem with that approach, as I said before, MP group
> behaviour becomes non-deterministic. 
> 
> > 
> > It is an nice improvement.
> > 
> > > > If we really  want to go ahead with such workaround -
> > 
> > It is not a workaround.
> > It is fixing some old issues and making clear what is really impossible.
> 
> The root cause of the problem is in our MP model design decisions:
> from one side we treat lcore_id as process local data, from other side
> in some shared data-structures we use lcore_id as an index.
> I think to fix it properly we need either: 
> make lcore_id data shared or stop using lcore_id as an index for shared data. 
> So from my perspective this approach is just one of possible workarounds.
> BTW, there is nothing wrong to have a workaround for the problem
> we are not ready to fix right now.
>  
> > > > probably better to introduce explicit EAL flag ( --single-process or 
> > > > so).
> > > > As Thomas and  Bruce suggested, if I understood them properly.
> > 
> > No I was thinking to maintain the tri-state information:
> >     - secondary is possible
> >     - secondary is attached
> >     - secondary is forbidden
> 
> Ok, then I misunderstood you.
>  
> > Asking the user to use an option to forbid attaching a secondary process
> > is the same as telling him it is forbidden.
> 
> I don't think it is the same.
> On a live and complex system user can't always predict will the primary proc 
> use dynamic lcore and if it will at what particular moment.
> Same for secondary process launching - user might never start it,
> might start it straight after the primary one,
> or might be after several hours. 
> 
> > The error log is enough in my opinion.
> 
> I think it is better than nothing, but probably not the best one.
> Apart from possible non-consistent behaviour, it is quite restrictive:
> dynamic lcore_id wouldn't be available on any DPDK MP deployment.
> Which is a pity - I think it is a cool and useful feature.
>  
> What do you guys think about different approach:
> introduce new optional EAL parameter to restrict lcore_id
> values available for the process.
> 
> #let say to start primary proc that can use lcore_id=[0-99] only:
> dpdk_primary --lcore-allow=0-99 ... --file-prefix=xz1
> 
> #to start secondary one for it with allowed lcore_id=[100-109]:
> dpdk_secondary --lcore-allow=100-109 ... --file-prefix=xz1 
> --proc-type=secondary  
>  
> It is still a workaround, but that way we don't need to
> add any new limitations for dynamic lcores and secondary process usage. 
> Now it is up to user to decide would multiple-process use the same shared data
> and if so - split lcore_id space properly among them
> (same as he has to do now with static lcores).

A variant (more simple) of your approach could be to add
"--proc-type=standalone" to explicitly disable MP and enable dynamic thread
registration.



> > > A EAL flag is a stable API from the start, as there is nothing
> > > describing how we can remove one.
> > > So a new EAL flag for an experimental API/feature seems contradictory.
> > >
> > > Going with a new features status API... I think it is beyond this series.
> > >
> > > Thomas seems to suggest an automatic resolution when features conflict
> > > happens.. ?
> > 
> > I suggest allowing the maximum and raise an error when usage conflicts.
> > It seems this is what you did in v4.
> > 
> > > I'll send the v4, let's discuss it there if you want.
> > 
> 

Reply via email to