> > > > > From what I understand from the stackoverflow post you're right that > cmxpchg16b will not give a consistent view of the 16 bytes of memory > across multiple NUMA nodes. However, maybe two 4 byte values right next > to each other would be sufficient for your use case and could then be > casted to a 8 byte values for CAS? >
Thats a great idea. The offset shall never go out of range of 4 byte but still I will put a check there for ensuring. I will update you when I am done with testing on main base code. Regards, Mihir