> > > > Agree. There are multiple micro-architectures in Arm eco-system. > > > > We > > > should establish few simple rules to make sure algorithms perform > > > well on all the available platforms. I established few rules in VPP > > > and they are working fine so far. > > > > > > Can you share that rules for everyone's benefit? > > > > > These are just few simple rules anyone can think of, but avoid the > > surprises. > > We identified a owner for each platform (we have this already in DPDK, > > even across platforms) Each patch submitted for Arm platforms is > > marked with -2 (VPP uses Gerrit) Every platform owner tests on her/his > platform. -2 will be removed only if it does not cause regression on any > platforms. Platform owner helps out with optimization where required as they > understand their micro-architecture best. I guess this is what is supposed to > happen through the review process in DPDK. But making sure everyone tests it > before it gets merged avoids the surprises. > > I think, The very same philosophy can be implemented with exiting mailing list > method, if > > 1) Author Cc all the architecture maintainers and platform owners for any fast > path change where it can introduce performance regression. > 2) Author can CC the same list to request for performance check along with > test command if area of performance regression known before. > > I agree with last minute surprises are bad for both Author and platform owner. > I think, it can fixed by above scheme. > Makes sense to me.
> > > > > > > > > > > > IMO, This scheme won't work. I think, we are introducing such > > > > > performance critical feature, we need to put under function > > > > > pointer scheme so that if an application does not need such > > > > > feature it can use > > > plain loads. > > > > > > > > > IMO, we should do some more debugging before going into exploring > > > > other > > > options. > > > > > > Yes. But, how do we align with v18.11 release? > > > > > I think I have spent enough time optimizing the code. Please provide the > feedback and I will work on completing the fix. > > > > However, if the new patch is not satisfactory enough, we need another plan. > > Based on the public release meeting held yesterday, RC3 date is on next > Monday. > > I would suggest: > - Send your exiting tested patch in mailing list for review. In my > setup, The regression reduced to 5.7% from 24% > - Extend the patch for hash bulk case as well and check > NO_HASH_MULTI_LOOKUP > as zero with EM_HASH_LOOKUP_COUNT value as 16 or 8 for arm64. > Thank you Jerin for testing. Unfortunately, when I looked at making the corresponding changes for the hash_add API, they look more involved. They may not be acceptable for RC3. I also need more time to spend on bulk case. For RC3, we will split the lookup path for RW-lock and Lock-free. I will add the rest in a follow up patch. > > > > You had mentioned about using function pointers. I suggest, we use the > function pointer only for lookup function. Otherwise, it will be too much of > code duplication. > > When lock-free is not used, the function with no memory-orderings will be > called. However, I am not sure about the function pointer overhead. But this > will be a simple change. > > It may not be very simple change as we need to take care secondary process > case as well, see struct rte_mempool::ops_index scheme. > Yes, I realized it while making changes. I have used a if statement with an existing lock-free flag. Will send out the patch soon. > Since rte_hash_lookup() already NOT a inline function, so making it as inline > and calling a function pointer inside may not attract much overhead. But we > can tell only after testing(Which may be not possible for > RC3) > > I think, in future it make sense to have function pointer scheme to avoid new > APIs for different hash library and we can plugin other proven hash library > like > urcu based one etc to DPDK. > > https://lwn.net/Articles/573431/