On 10/19/23 12:46, Paolo Bonzini wrote:
This includes:

- implementing SHA and CMPccXADD instruction extensions

- introducing a new mechanism for flags writeback that avoids a
   tricky failure

- converting the more orthogonal parts of the one-byte opcode
   map, as well as the CMOVcc and SETcc instructions.

Tested by booting several 32-bit and 64-bit guests.

The new decoder produces roughly 2% more ops, but after optimization there
are just 0.5% more and almost all of them come from cmp instructions.
For some reason that I have not investigated, these end up with an extra
mov even after optimization:

                                 sub_i64 tmp0,rax,$0x33
  mov_i64 cc_src,$0x33           mov_i64 cc_dst,tmp0
  sub_i64 cc_dst,rax,$0x33       mov_i64 cc_src,$0x33
  discard cc_src2                discard cc_src2
  discard cc_op                  discard cc_op

It could be easily fixed by not reusing gen_SUB for cmp instructions,
or by debugging what goes on in the optimizer.  However, it does not
result in larger assembly.

Oops, I missed Richard's newer reviews.  Will send v3 sometime next week.

Paolo


Reply via email to