On 17/03/2025 1:52 pm, Jan Beulich wrote: > On 17.03.2025 14:34, Andrew Cooper wrote: >> On 17/03/2025 9:09 am, Jan Beulich wrote: >>> On 14.03.2025 21:49, Andrew Cooper wrote: >>>> ... which is more consise than the opencoded form, and more efficient when >>>> compiled. >>>> >>>> For production VMs, ~100% of emulations are simple MOVs, so it is likely >>>> that >>>> there are no segments to write back. >>>> >>>> Furthermore, now that find_{first,next}_bit() are no longer in use, the >>>> seg_reg_{accessed,dirty} fields aren't forced to be unsigned long, although >>>> they do need to remain unsigned int because of __set_bit() elsewhere. >>>> >>>> No functional change. >>>> >>>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> >>> Reviewed-by: Jan Beulich <jbeul...@suse.com> >>> >>>> I still can't persuade GCC to do the early exit prior to establishing the >>>> stack frame, and unlike do_livepatch_work(), it's not critical enough to >>>> require noinline games. >>> Then is ... >>> >>>> --- a/xen/arch/x86/hvm/emulate.c >>>> +++ b/xen/arch/x86/hvm/emulate.c >>>> @@ -3022,18 +3022,16 @@ void hvm_emulate_init_per_insn( >>>> void hvm_emulate_writeback( >>>> struct hvm_emulate_ctxt *hvmemul_ctxt) >>>> { >>>> - enum x86_segment seg; >>>> + struct vcpu *curr; >>>> + unsigned int dirty = hvmemul_ctxt->seg_reg_dirty; >>>> >>>> - seg = find_first_bit(&hvmemul_ctxt->seg_reg_dirty, >>>> - ARRAY_SIZE(hvmemul_ctxt->seg_reg)); >>>> + if ( likely(!dirty) ) >>>> + return; >>> ... this worthwhile at all? I'm surprised anyway that I see you use likely() >>> here, when generally you argue against its use. >> No, it's not worth it. In fact, simplifying makes the function smaller. >> >> void hvm_emulate_writeback( >> struct hvm_emulate_ctxt *hvmemul_ctxt) >> { >> struct vcpu *curr = current; >> unsigned int dirty = hvmemul_ctxt->seg_reg_dirty; >> >> for_each_set_bit ( seg, dirty ) >> hvm_set_segment_register(curr, seg, &hvmemul_ctxt->seg_reg[seg]); >> } >> >> gets a bloat-o-meter score of 131 down to 72 (-59). > That's surprisingly much.
Indeed. https://termbin.com/gzd7 is the full side-by-side diff. I think the key is the removal of ARRAY_SIZE(). With it, find_first_bit() is emitting code to deal with seg_reg_dirty having higher bits set. Also, both BSF expressions have unknown zero-ness, so are emitting the cmov form (__scanbit() is still on my list of bitops to work on). ~Andrew