On 17/03/2025 1:52 pm, Jan Beulich wrote:
> On 17.03.2025 14:34, Andrew Cooper wrote:
>> On 17/03/2025 9:09 am, Jan Beulich wrote:
>>> On 14.03.2025 21:49, Andrew Cooper wrote:
>>>> ... which is more consise than the opencoded form, and more efficient when
>>>> compiled.
>>>>
>>>> For production VMs, ~100% of emulations are simple MOVs, so it is likely 
>>>> that
>>>> there are no segments to write back.
>>>>
>>>> Furthermore, now that find_{first,next}_bit() are no longer in use, the
>>>> seg_reg_{accessed,dirty} fields aren't forced to be unsigned long, although
>>>> they do need to remain unsigned int because of __set_bit() elsewhere.
>>>>
>>>> No functional change.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
>>> Reviewed-by: Jan Beulich <jbeul...@suse.com>
>>>
>>>> I still can't persuade GCC to do the early exit prior to establishing the
>>>> stack frame, and unlike do_livepatch_work(), it's not critical enough to
>>>> require noinline games.
>>> Then is ...
>>>
>>>> --- a/xen/arch/x86/hvm/emulate.c
>>>> +++ b/xen/arch/x86/hvm/emulate.c
>>>> @@ -3022,18 +3022,16 @@ void hvm_emulate_init_per_insn(
>>>>  void hvm_emulate_writeback(
>>>>      struct hvm_emulate_ctxt *hvmemul_ctxt)
>>>>  {
>>>> -    enum x86_segment seg;
>>>> +    struct vcpu *curr;
>>>> +    unsigned int dirty = hvmemul_ctxt->seg_reg_dirty;
>>>>  
>>>> -    seg = find_first_bit(&hvmemul_ctxt->seg_reg_dirty,
>>>> -                         ARRAY_SIZE(hvmemul_ctxt->seg_reg));
>>>> +    if ( likely(!dirty) )
>>>> +        return;
>>> ... this worthwhile at all? I'm surprised anyway that I see you use likely()
>>> here, when generally you argue against its use.
>> No, it's not worth it.  In fact, simplifying makes the function smaller.
>>
>> void hvm_emulate_writeback(
>>     struct hvm_emulate_ctxt *hvmemul_ctxt)
>> {
>>     struct vcpu *curr = current;
>>     unsigned int dirty = hvmemul_ctxt->seg_reg_dirty;
>>
>>     for_each_set_bit ( seg, dirty )
>>         hvm_set_segment_register(curr, seg, &hvmemul_ctxt->seg_reg[seg]);
>> }
>>
>> gets a bloat-o-meter score of 131 down to 72 (-59).
> That's surprisingly much.

Indeed.  https://termbin.com/gzd7 is the full side-by-side diff.

I think the key is the removal of ARRAY_SIZE().  With it,
find_first_bit() is emitting code to deal with seg_reg_dirty having
higher bits set.  Also, both BSF expressions have unknown zero-ness, so
are emitting the cmov form (__scanbit() is still on my list of bitops to
work on).

~Andrew

Reply via email to