> > > > > > > > > > > > > > > > > > > > > > After evaluating long term API/ABI issues, I think > > > > > > > > > > > you need to get rid of almost all use of inline and > > > > > > > > > > > visible structures. Yes it might be marginally > > > > > > > > > > > slower, but you thank me > > > > > the first time you have to fix something. > > > > > > > > > > > > > > > > > > > > > Agree, I was planning on another version to address > > > > > > > > > > this (I am yet > > > > > to take a look at your patch addressing the ABI). > > > > > > > > > > The structure visibility definitely needs to be addressed. > > > > > > > > > > For the inline functions, is the plan to convert all > > > > > > > > > > the inline functions in DPDK? If yes, I think we need > > > > > > > > > > to consider the performance > > > > > > > > > difference. May be consider L3-fwd application, change > > > > > > > > > all the > > > > > inline functions in its path and run a test? > > > > > > > > > > > > > > > > > > Every function that is not in the direct datapath should > > > > > > > > > not be > > > > > inline. > > > > > > > > > Exceptions or things like rx/tx burst, ring > > > > > > > > > enqueue/dequeue, and packet alloc/free > > > > I do not understand how DPDK can claim ABI compatibility if we > > > > have > > > inline functions (unless we freeze any development in these inline > > > functions forever). > > > > > > > > > > > > > > > > > > > > Plus synchronization routines: spin/rwlock/barrier, etc. > > > > > > > > I think rcu should be one of such exceptions - it is just > > > > > > > > another synchronization mechanism after all (just a bit > > > > > > > > more > > > sophisticated). > > > > > > > > Konstantin > > > > > > > > > > > > > > If you look at the other userspace RCU, you wil see that the > > > > > > > only inlines are the rcu_read_lock,rcu_read_unlock and > > > > > rcu_reference/rcu_assign_pointer. > > > > > > > > > > > > > > The synchronization logic is all real functions. > > > > > > > > > > > > In fact, I think urcu provides both flavors: > > > > > > https://github.com/urcu/userspace- > > > > > rcu/blob/master/include/urcu/static/ > > > > > > urcu-qsbr.h I still don't understand why we have to treat it > > > > > > differently then let say spin-lock/ticket-lock or rwlock. > > > > > > If we gone all the way to create our own version of rcu, we > > > > > > probably want it to be as fast as possible (I know that main > > > > > > speedup should come from the fact that readers don't have to > > > > > > wait for writer to finish, but still...) > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > Having locking functions inline is already a problem in current > releases. > > > > > The implementation can not be improved without breaking ABI (or > > > > > doing special workarounds like lock v2) > > > > I think ABI and inline function discussion needs to be taken up in > > > > a > > > different thread. > > > > > > > > Currently, I am looking to hide the structure visibility. I looked > > > > at your > > > patch [1], it is a different case than what I have in this patch. It > > > is a pretty generic use case as well (similar situation exists in > > > other libraries). I think a generic solution should be agreed upon. > > > > > > > > If we have to hide the structure content, the handle to QS > > > > variable > > > returned to the application needs to be opaque. I suggest using 'void *' > > > behind which any structure can be used. > > > > > > > > typedef void * rte_rcu_qsbr_t; > > > > typedef void * rte_hash_t; > > > > > > > > But it requires typecasting. > > > > > > > > [1] http://patchwork.dpdk.org/cover/52609/ > > > > > > C allows structure to be defined without knowing what is in it > therefore. > > > > > > typedef struct rte_rcu_qsbr rte_rcu_qsbr_t; > > > > > > is preferred (or do it without typedef) > > > > > > struct rte_rcu_qsbr; > > > > I see that rte_hash library uses the same approach (struct rte_hash in > rte_hash.h, though it is marking as internal). But the ABI Laboratory tool > [1] seems to be reporting incorrect numbers for this library even though > the internal structure is changed. > > > > [1] > > https://abi- > laboratory.pro/index.php?view=compat_report&l=dpdk&v1=19.0 > > 2&v2=current&obj=66794&kind=abi > > The problem is rte_hash structure is exposed as part of ABI in > rte_cuckoo_hash.h This was a mistake. Do you mean, due to the use of structure with the same name? I am wondering if it is just a tools issue. The application is not supposed to include rte_cuckoo_hash.h.
For the RCU library, we either need to go all functions or leave it the way it is. I do not see a point in trying to hide the internal structure while having inline functions. I converted the inline functions to function calls. Testing on Arm platform (results *are* repeatable) shows very minimal drop (0.1% to 0.2%) in performance while using lock-free rte_hash data structure. But one of the test cases which is just spinning shows good amount of drop (41%). Testing on x86 (Xeon Gold 6132 CPU @ 2.60GHz, results *are* pretty repeatable) shows performance improvements (7% to 8%) while using lock-free rte_hash data structure. The test cases which is just spinning show significant drop (14%, 155%, 231%). Konstantin, any thoughts on the results? I will send out V6 which will fix issues reported so far. The function vs inline part is still open, need to close it soon.