On 26.11.2024 20:16, Andrew Cooper wrote:
> On 25/11/2024 2:28 pm, Jan Beulich wrote:
>> Move the function to its own assembly file. Having it in C just for the
>> entire body to be an asm() isn't really helpful. Then have two flavors:
>> A "basic" version using qword steps for the bulk of the operation, and an
>> ERMS version for modern hardware, to be substituted in via alternatives
>> patching.
>>
>> Alternatives patching, however, requires an extra precaution: It uses
>> memcpy() itself, and hence the function may patch itself. Luckily the
>> patched-in code only replaces the prolog of the original function. Make
>> sure this remains this way.
>>
>> Additionally alternatives patching, while supposedly safe via enforcing
>> a control flow change when modifying already prefetched code, may not
>> really be. Afaict a request is pending to drop the first of the two
>> options in the SDM's "Handling Self- and Cross-Modifying Code" section.
>> Insert a serializing instruction there.
>>
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>> ---
>> We may want to consider branching over the REP MOVSQ as well, if the
>> number of qwords turns out to be zero.
>> We may also want to consider using non-REP MOVS{L,W,B} for the tail.
> 
> My feedback for patch 2 is largely applicable here too.

Sure, and I'll apply here whatever we decide to do there.

>> --- a/xen/arch/x86/alternative.c
>> +++ b/xen/arch/x86/alternative.c
>> @@ -153,12 +153,14 @@ void init_or_livepatch add_nops(void *in
>>   * executing.
>>   *
>>   * "noinline" to cause control flow change and thus invalidate I$ and
>> - * cause refetch after modification.
>> + * cause refetch after modification.  While the SDM continues to suggest 
>> this
>> + * is sufficient, it may not be - issue a serializing insn afterwards as 
>> well.
> 
> Did you find a problem in practice, or is this just in case?

It's been too long, so I can now only guess that it's just in case. The
comment change, otoh, suggests otherwise.

> I suspect if you are seeing problems, then it's non-atomicity of the
> stores into memcpy() rather than serialisation.

How would atomicity (or not) matter here? There shouldn't be any difference
between a single and any number of stores into the (previously executed)
insn stream.

>>   */
>>  static void init_or_livepatch noinline
>>  text_poke(void *addr, const void *opcode, size_t len)
>>  {
>>      memcpy(addr, opcode, len);
>> +    cpuid_eax(0);
> 
> This whole function is buggy in a couple of ways, starting with the
> comments.
> 
> The comment about noinline and control flow changes is only really
> relevant to 32bit processors; we inherited that comment from Linux, and
> they're not applicable to Xen.
> 
> AMD64 (both the APM, and SDM) guarantee that Self Modifying Code will be
> dealt with on your behalf, with no serialisation needed.
> 
> Cross-modifying code needs far more severe serialisation than given
> here.  We get away with it because alternative_{instructions,branches}()
> are pre-SMP, and apply_alternatives() is on livepatches prior to them
> becoming live.
> 
> 
> I happen to know there's an AMD CPU which has an erratum regarding Self
> Modifying Code and genuinely does need a serialising instruction, but I
> don't know which exact CPU it is.

Maybe I ran into that on one of the two older AMD systems I routinely
test on every once in a while?

> If we're going to put a serialising instruction, it should be a write to
> CR2.  We don't care about 486 compatibility, and it's faster than CPUID
> and much much faster if virtualised because it's unlikely to be
> intercepted even under shadow paging.
> 
> But, it would be nice not to put serialisation in the general case to
> begin with, especially not into the livepatching case.

If you're aware of an erratum there, how can we get away without any
serialization? I can surely switch to a CR2 write, and I can also make
this dependent upon system_state (thus excluding the LP case).

I notice that arch_livepatch_{apply,revert}() indeed use plain memcpy()
with just the noinline "protection". I wonder how well that works if a
livepatch actually touched the tail of either of these functions
(however unlikely that may be).

Jan

Reply via email to