On 07.10.2022 21:36, Andrew Cooper wrote:
> On 26/08/2021 11:14, Jan Beulich wrote:
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -3289,17 +3292,15 @@ gnttab_get_status_frames(XEN_GUEST_HANDL
>>                   "status frames, but has only %u\n",
>>                   d->domain_id, op.nr_frames, nr_status_frames(gt));
>>          op.status = GNTST_general_error;
>> -        goto unlock;
>>      }
>>  
>> -    for ( i = 0; i < op.nr_frames; i++ )
>> +    for ( i = 0; op.status == GNTST_okay && i < op.nr_frames; i++ )
>>      {
>>          gmfn = gfn_x(gnttab_status_gfn(d, gt, i));
>>          if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) )
>>              op.status = GNTST_bad_virt_addr;
>>      }
>>  
>> - unlock:
>>      grant_read_unlock(gt);
>>   out2:
>>      rcu_unlock_domain(d);
>>
> 
> 
> If instead, this were written
> 
>     for ( i = 0; i < op.nr_frames; i++ )
>     {
>         gmfn = gfn_x(gnttab_status_gfn(d, gt, i));
>         if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) )
>         {
>             op.status = GNTST_bad_virt_addr;
>             goto unlock;
>         }
>     }
> 
> then the delta vs your version is -36 bytes, and faster to run because
> the loop doesn't need a memory read and compare on every iteration when
> you can exit based purely on existing control flow.
> 
> Furthermore, the version with a goto is clearer to follow, because the
> exit condition is much more obvious.

I know you and others deem "goto" okay to use; where possible (and where
the resulting code remains readable/maintainable) I'm aiming at avoiding
them. Nevertheless I'll follow the request here.

>  The compat change can do the same
> with breaks rather than gotos, for a slightly more modest -11 saving.

There of course the original if() around the the loop then also needs
retaining; on the positive side this means a smaller diff.

> A form with the op.status changes adjustments *not* added to the loop
> condition, Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>

Thanks.

> In reference to the hypercall ABI adjustments, it occurs to me that
> loops like this (which we have loads of, even in hypercall hotpaths) are
> horrifying for performance.  For HVM, we're redoing the nested pagewalk
> for every uint64_t element of an array. 
> 
> A "copy array to guest" primitive would more efficient still than a
> plain virt->phys translation, because we'd be able to drop the p2m walks
> too.

Generally we can copy arrays (the last parameter of the copying primitives
is a count, after all) but ...

> Obviously, we don't want every instance like this to be doing its own
> manual bounce buffering, so perhaps we should invest in some buffered
> copy helpers as part of trying to improve hypercall performance.

... avoiding bounce buffering is possible only where the data to copy out
is available in ready-to-copy form. Here in the native cases we need to
retrieve the GFN (from e.g. gnttab_status_gfn()), and in the compat case
we need to translate from 64 to 32 bits. Neither really lends itself to
the use of a generic helper, I think.

Jan

Reply via email to