On Mon, Jan 14, 2013 at 07:40:56PM +0100, Uros Bizjak wrote: > On Mon, Jan 14, 2013 at 7:06 PM, Andi Kleen <a...@firstfloor.org> wrote: > >> This cannot happen, we reject code that sets both __HLE* flags. > > > > BTW I found more HLE bugs, it looks like some of the fetch_op_* > > patterns do not match always and fall back to cmpxchg, which > > does not generate HLE code correctly. Not fully sure what's > > wrong, can you spot any obvious problems? You changed the > > > > (define_insn "atomic_<logic><mode>" > > > > pattern last. > > I don't think this is a target problem, these insns work as expected > and are covered by extensive testsuite in gcc.target/i386/hle-*.c.
Well the C++ test cases I wrote didn't work. It may be related to how complex the program is. Simple calls as in the original test suite seem to work. e.g. instead of xacquire lock and ... it ended up with a cmpxchg loop (which I think is a fallback path). The cmpxchg loop didn't include a HLE prefix (and simply adding one is not enoigh, would need more changes for successfull elision) Before HLE the cmpxchg code was correct, just somewhat inefficient. Even with HLE it is technically correct, just it'll never elide. I think I would like to fix and,or,xor and disallow HLE for nand. Here's a test case. Needs the libstdc++ HLE patch posted. #include <atomic> #define ACQ memory_order_acquire | __memory_order_hle_acquire #define REL memory_order_release | __memory_order_hle_release int main() { using namespace std; atomic_ulong au = ATOMIC_VAR_INIT(0); if (!au.fetch_and(1, ACQ)) au.fetch_and(-1, REL); unsigned lock = 0; __atomic_fetch_and(&lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE); return 0; } The first fetch_and generates: (wrong) .L2: movq %rax, %rcx movq %rax, %rdx andl $1, %ecx lock; cmpxchgq %rcx, -24(%rsp) jne .L2 the second __atomic_fetch_and generates (correct): lock; .byte 0xf2 andl $1, -28(%rsp) .LBE14: -Andi -- a...@linux.intel.com -- Speaking for myself only.