> > Agree. There are multiple micro-architectures in Arm eco-system. We > should establish few simple rules to make sure algorithms perform well on all > the available platforms. I established few rules in VPP and they are working > fine so far. > > Can you share that rules for everyone's benefit? > These are just few simple rules anyone can think of, but avoid the surprises. We identified a owner for each platform (we have this already in DPDK, even across platforms) Each patch submitted for Arm platforms is marked with -2 (VPP uses Gerrit) Every platform owner tests on her/his platform. -2 will be removed only if it does not cause regression on any platforms. Platform owner helps out with optimization where required as they understand their micro-architecture best. I guess this is what is supposed to happen through the review process in DPDK. But making sure everyone tests it before it gets merged avoids the surprises.
> > > > > > IMO, This scheme won't work. I think, we are introducing such > > > performance critical feature, we need to put under function pointer > > > scheme so that if an application does not need such feature it can use > plain loads. > > > > > IMO, we should do some more debugging before going into exploring other > options. > > Yes. But, how do we align with v18.11 release? > I think I have spent enough time optimizing the code. Please provide the feedback and I will work on completing the fix. However, if the new patch is not satisfactory enough, we need another plan. You had mentioned about using function pointers. I suggest, we use the function pointer only for lookup function. Otherwise, it will be too much of code duplication. When lock-free is not used, the function with no memory-orderings will be called. However, I am not sure about the function pointer overhead. But this will be a simple change.