Il 24/04/2012 16:38, Michael S. Tsirkin ha scritto: >>> I think it is prudent to address this theoretical race condition. >> >> I think your fix is right, but it is not needed on x86. lfence is only >> required for weakly-ordered memory types, so says Intel, > > I see this in spec: > > The LFENCE instruction establishes a memory fence for loads. It > guarantees ordering between two loads and prevents speculative loads > from passing the load fence (that is, no speculative loads are allowed > until all loads specified before the load fence have been carried out). > > Speculative load is exactly what we are handling here.
In my copy of the Intel manual "6.2.2 Memory Ordering in P6 and More Recent Processor Families" says very explicitly "Reads are not reordered with other reads". At least this is the illusion that the processor gives; that is, speculative loads are discarded if another processor writes to the same location. But it's not just speculative loads, it can also be out-of-order execution as AMD says in "7.1.1 Read Ordering". For cacheable memory types "Out-of-order reads are allowed to the extent that they can be performed transparently to software, such that the appearance of in-order execution is maintained. Out-of-order reads can occur as a result of out-of-order instruction execution or speculative execution. The processor can read memory and perform cache refills out-of-order to allow out-of-order execution to proceed". >> and AMD is even clearer: "Loads from differing memory types may be >> performed out of order, in particular between WC/WC+ and other >> memory types". > > This does not say loads from same memory type are ordered. That would really be a bad trick for AMD to play. Luckily they say that in 7.1.1, as cited above. >> I would be grateful if, instead of fixing the qemu-barrier.h parts of >> the patches, you picked up the (sole) patch in the atomics branch of >> git://github.com/bonzini/qemu.git. The constructs there are more >> complete than what we have in qemu-barrier.h, > > Sorry this is just a bugfix in virtio, don't see a reason to make > it depend on a wholesale rework of atomics. The reason is that your fixes didn't work on PPC, and were suboptimal on x86. Paolo