From: Peter Zijlstra > Sent: 26 September 2018 12:01 > > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load. > > The basic idea is that we use xchg8 to test-and-set the pending bit > (when it is a byte) and then a load to fetch the whole word. Using > two instructions of course opens a window we previously did not have. ...
IIRC the load will be 'slow' because it will have to wait for the earlier store to actually complete - rather than being satisfied by data from the store buffer (because the widths are different). This may not matter for xchg ? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)