2014-09-11 07:48, Hiroshi Shimamoto: > x86 can keep store ordering with standard operations.
Are we sure it's always the case (including old 32-bit CPU)? I would prefer to have a reference here. I know we already discussed this kind of things but having a reference in commit log could help for future discussions. > Using memory barrier is much expensive in main packet processing loop. > Removing this improves xmit/recv packet performance. > > We can see performance improvements with memnic-tester. > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU. > size | before | after > 64 | 4.18Mpps | 4.59Mpps > 128 | 3.85Mpps | 4.87Mpps > 256 | 4.01Mpps | 4.72Mpps > 512 | 3.52Mpps | 4.41Mpps > 1024 | 3.18Mpps | 3.64Mpps > 1280 | 2.86Mpps | 3.15Mpps > 1518 | 2.59Mpps | 2.87Mpps > > Note: we have to take care if we use temporal cache. Please, could you explain this last sentence? Thanks -- Thomas