> > > > > > + > > > > +/* Add a reader thread, running on an lcore, to the list of > > > > +threads > > > > + * reporting their quiescent state on a TQS variable. > > > > + */ > > > > +int __rte_experimental > > > > +rte_tqs_register_lcore(struct rte_tqs *v, unsigned int lcore_id) { > > > > + TQS_RETURN_IF_TRUE((v == NULL || lcore_id >= > > > RTE_TQS_MAX_LCORE), > > > > + -EINVAL); > > > > > > It is not very good practice to make function return different > > > values and behave in a different way in debug/non-debug mode. > > > I'd say that for slow-path (functions in .c) it is always good to > > > check input parameters. > > > For fast-path (functions in .h) we sometimes skip such checking, but > > > debug mode can probably use RTE_ASSERT() or so. > > Makes sense, I will change this in the next version. > > > > > > > > > > > lcore_id >= RTE_TQS_MAX_LCORE > > > > > > Is this limitation really necessary? > > I added this limitation because currently DPDK application cannot take > > a mask more than 64bit wide. Otherwise, this should be as big as > RTE_MAX_LCORE. > > I see that in the case of '-lcores' option, the number of lcores can > > be more than the number of PEs. In this case, we still need a MAX limit (but > can be bigger than 64). > > > > > First it means that only lcores can use that API (at least data-path > > > part), second even today many machines have more than 64 cores. > > > I think you can easily avoid such limitation, if instead of > > > requiring lcore_id as input parameter, you'll just make it return index of > next available entry in w[]. > > > Then tqs_update() can take that index as input parameter. > > I had thought about a similar approach based on IDs. I was concerned > > that ID will be one more thing to manage for the application. But, I see the > limitations of the current approach now. I will change it to allocation based. > This will support even non-EAL pthreads as well. > > Yes, with such approach non-lcore threads will be able to use it also. > I realized that rte_tqs_register_lcore/ rte_tqs_unregister_lcore need to be efficient as they can be called from the worker's packet processing loop (rte_event_dequeue_burst allows blocking. So, the worker thread needs to call rte_tqs_unregister_lcore before calling rte_event_dequeue_burst and rte_tqs_register_lcore before starting packet processing). Allocating the thread ID in these functions will make them more complex.
I suggest that we change the argument 'lcore_id' to 'thread_id'. The application could use 'lcore_id' as 'thread_id' if threads are mapped to physical cores 1:1. If the threads are not mapped 1:1 to physical cores, the threads need to use a thread_id in the range of 0 - RTE_TQS_MAX_THREADS. I do not see that DPDK has a thread_id concept. For TQS, the thread IDs are global (i.e. not per TQS variable). I could provide APIs to do the thread ID allocation, but I think the thread ID allocation should not be part of this library. Such thread ID might be useful for other libraries. <snip> > > > > > > > > > > + > > > > + while (lcore_mask) { > > > > + l = __builtin_ctz(lcore_mask); > > > > + if (v->w[l].cnt != t) > > > > + break; > > > > > > As I understand, that makes control-path function progress dependent > > > on simultaneous invocation of data-path functions. > > I agree that the control-path function progress (for ex: how long to > > wait for freeing the memory) depends on invocation of the data-path > > functions. The separation of 'start', 'check' and the option not to block in > 'check' provide the flexibility for control-path to do some other work if it > chooses to. > > > > > In some cases that might cause control-path to hang. > > > Let say if data-path function wouldn't be called, or user invokes > > > control-path and data-path functions from the same thread. > > I agree with the case of data-path function not getting called. I > > would consider that as programming error. I can document that warning in > the rte_tqs_check API. > > Sure, it can be documented. > Though that means, that each data-path thread would have to do explicit > update() call for every tqs it might use. > I just think that it would complicate things and might limit usage of the > library > quite significantly. Each data path thread has to report its quiescent state. Hence, each data-path thread has to call update() (similar to how rte_timer_manage() has to be called periodically on the worker thread). Do you have any particular use case in mind where this fails? > > > > > In the case of same thread calling both control-path and data-path > > functions, it would depend on the sequence of the calls. The following > sequence should not cause any hangs: > > Worker thread > > 1) 'deletes' an entry from a lock-free data structure > > 2) rte_tqs_start > > 3) rte_tqs_update > > 4) rte_tqs_check (wait == 1 or wait == 0) > > 5) 'free' the entry deleted in 1) > > That an interesting idea, and that should help, I think. > Probably worth to have {2,3,4} sequence as a new high level function. > Yes, this is a good idea. Such a function would be applicable only in the worker thread. I would prefer to leave it to the application to take care.