write split at page boundaries

Jan Beulich Thu, 23 Jan 2025 01:50:10 -0800

On 22.01.2025 18:45, Roger Pau Monné wrote:
> On Tue, Oct 01, 2024 at 10:49:40AM +0200, Jan Beulich wrote:
>> The MMIO cache is intended to have one entry used per independent memory
>> access that an insn does. This, in particular, is supposed to be
>> ignoring any page boundary crossing. Therefore when looking up a cache
>> entry, the access'es starting (linear) address is relevant, not the one
>> possibly advanced past a page boundary.
>>
>> In order for the same offset-into-buffer variable to be usable in
>> hvmemul_phys_mmio_access() for both the caller's buffer and the cache
>> entry's it is further necessary to have the un-adjusted caller buffer
>> passed into there.
>>
>> Fixes: 2d527ba310dc ("x86/hvm: split all linear reads and writes at page 
>> boundary")
>> Reported-by: Manuel Andreas <manuel.andr...@tum.de>
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>> ---
>> This way problematic overlaps are only reduced (to ones starting at the
>> same address), not eliminated: Assumptions in hvmemul_phys_mmio_access()
>> go further - if a subsequent access is larger than an earlier one, but
>> the splitting results in a chunk to cross the end "boundary" of the
>> earlier access, an assertion will still trigger. Explicit memory
>> accesses (ones encoded in an insn by explicit or implicit memory
>> operands) match the assumption afaict (i.e. all those accesses are of
>> uniform size, and hence they either fully overlap or are mapped to
>> distinct cache entries).
>>
>> Insns accessing descriptor tables, otoh, don't fulfill these
>> expectations: The selector read (if coming from memory) will always be
>> smaller than the descriptor being read, and if both (insanely) start at
>> the same linear address (in turn mapping MMIO), said assertion will kick
>> in. (The same would be true for an insn trying to access itself as data,
>> as long as certain size "restrictions" between insn and memory operand
>> are met. Except that linear_read() disallows insn fetches from MMIO.) To
>> deal with such, I expect we will need to further qualify (tag) cache
>> entries, such that reads/writes won't use insn fetch entries, and
>> implicit-supervisor-mode accesses won't use entries of ordinary
>> accesses. (Page table accesses don't need considering here for now, as
>> our page walking code demands page tables to be mappable, implying
>> they're in guest RAM; such accesses also don't take the path here.)
>> Thoughts anyone, before I get to making another patch?
>>
>> Considering the insn fetch aspect mentioned above I'm having trouble
>> following why the cache has 3 entries. With insn fetches permitted,
>> descriptor table accesses where the accessed bit needs setting may also
>> fail because of that limited capacity of the cache, due to the way the
>> accesses are done. The read and write (cmpxchg) are independent accesses
>> from the cache's perspective, and hence we'd need another entry there.
>> If, otoh, the 3 entries are there to account for precisely this (which
>> seems unlikely with commit e101123463d2 ["x86/hvm: track large memory
>> mapped accesses by buffer offset"] not saying anything at all), then we
>> should be fine in this regard. If we were to permit insn fetches, which
>> way to overcome this (possibly by allowing the write to re-use the
>> earlier read's entry in this special situation) would remain to be
>> determined.
> 
> I'm not that familiar with the emulator logic for memory accesses, but
> it seems like we are adding more and more complexity and special
> casing.  Maybe it's the only way to go forward, but I wonder if there
> could be some other way to solve this.  However, I don't think I
> will have time to look into it, and hence I'm not going to oppose to
> your proposal.


I'll see what I can do; it's been quite a while, so I'll first need to
swap context back in.

> Are there however some tests, possibly XTF, that we could use to
> ensure the behavior of accesses is as we expect?

Manuel's report included an XTF test, which I expect will become a part
of XTF once this fix went in. I fear though that there is an issue
Andrew has been pointing out, which may prevent this from happening
right away (even if with osstest having disappeared that's now only a
latent issue, until gitlab CI would start exercising XTF): With the
issue unfixed on older trees (i.e. those remaining after this series
was backported as appropriate), the new test would fail there.

>> @@ -1030,7 +1040,11 @@ static struct hvm_mmio_cache *hvmemul_fi
>>              return cache;
>>      }
>>  
>> -    if ( !create )
>> +    /*
>> +     * Bail if a new entry shouldn't be allocated, utilizing that ->space 
>> has
>                                                       ^rely on ->space having 
> ...
> Would be easier to read IMO.

Changed; I'm not overly fussed, yet at the same time I also don't really
agree with your comment.

>> @@ -1064,12 +1079,14 @@ static void latch_linear_to_phys(struct
>>  
>>  static int hvmemul_linear_mmio_access(
>>      unsigned long gla, unsigned int size, uint8_t dir, void *buffer,
>> -    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool known_gpfn)
>> +    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
>> +    unsigned long start, bool known_gpfn)
> 
> I think start is a bit ambiguous, start_gla might be clearer (same
> below for the start parameter).

Fine with me - changed for all three hvmemul_linear_mmio_*(). It wasn't
clear to me whether you also meant the local variables in
linear_{read,write}(); since you said "parameter" I assumed you didn't.
If you did, I fear I'd be less happy to make the change there too, for
"addr" then preferably also wanting to change to "gla". Yet that would
cause undue extra churn.

>> @@ -1182,8 +1202,17 @@ static int linear_read(unsigned long add
>>       * an access that was previously handled as MMIO. Thus it is imperative 
>> that
>>       * we handle this access in the same way to guarantee completion and 
>> hence
>>       * clean up any interim state.
>> +     *
>> +     * Care must be taken, however, to correctly deal with crossing 
>> RAM/MMIO or
>> +     * MMIO/RAM boundaries. While we want to use a single cache entry 
>> (tagged
>> +     * by the starting linear address), we need to continue issuing (i.e. 
>> also
>> +     * upon replay) the RAM access for anything that's ahead of or past 
>> MMIO,
>> +     * i.e. in RAM.
>>       */
>> -    if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_READ, false) )
>> +    cache = hvmemul_find_mmio_cache(hvio, start, IOREQ_READ, ~0);
>> +    if ( !cache ||
>> +         addr + bytes <= start + cache->skip ||
>> +         addr >= start + cache->size )
> 
> Seeing as this bound checks is also used below, could it be a macro or
> inline function?
> 
> is_cached() or similar?

Hmm. Yes, it's twice the same expression, yet that helper would require
four parameters. That's a little too much for my taste; I'd prefer to
keep things as they are. After all there are far more redundancies between
the two functions.

Jan

Re: [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries

Reply via email to