On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
On 02/06/16 23:36, Richard Henderson wrote:
On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
I think we need to extend TCG load/store instruction attributes to
provide information about guest ordering requirements and leave this TCG
operation only for explicit barrier instruction translation.
I do not agree. I think separate barriers are much cleaner and easier
to manage and reason with.
How are we going to emulate strongly-ordered guests on weakly-ordered
hosts then? I think if every load/store operation must specify which
ordering it implies then this task would be quite simple.
Hum. That does seem helpful-ish. But I'm not certain how helpful it is to
complicate the helper functions even further.
What if we have tcg_canonicalize_memop (or some such) split off the barriers
into separate opcodes. E.g.
MO_BAR_LD_B = 32 // prevent earlier loads from crossing current op
MO_BAR_ST_B = 64 // prevent earlier stores from crossing current op
MO_BAR_LD_A = 128 // prevent later loads from crossing current op
MO_BAR_ST_A = 256 // prevent later stores from crossing current op
MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
// Match Sparc MEMBAR as the most flexible host.
TCG_BAR_LD_LD = 1 // #LoadLoad barrier
TCG_BAR_ST_LD = 2 // #StoreLoad barrier
TCG_BAR_LD_ST = 4 // #LoadStore barrier
TCG_BAR_ST_ST = 8 // #StoreStore barrier
TCG_BAR_SYNC = 64 // SEQ_CST barrier
where
tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)
emits
mb TCG_BAR_LD_LD
qemu_ld_i32 x, y, i, m
mb TCG_BAR_LD_ST
We can then add an optimization pass which folds barriers with no memory
operations in between, so that duplicates are eliminated.
r~