On 04/09/2024 2:29 pm, Jan Beulich wrote:
> Both caches may need higher capacity, and the upper bound will need to
> be determined dynamically based on CPUID policy (for AMX at least).

Is this to cope with TILE{LOAD,STORE}, or something else?

It's not exactly clear, even when looking at prior AMX series.

> While touching the check in hvmemul_phys_mmio_access() anyway, also
> tighten it: To avoid overrunning the internal buffer we need to take the
> offset into the buffer into account.

Does this really want to be mixed with a prep patch ?

>
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> ---
> This is a patch taken from the AMX series, which was part of the v3
> submission. All I did is strip out the actual AMX bits (from
> hvmemul_cache_init()), plus of course change the description. As a
> result some local variables there may look unnecessary, but this way
> it's going to be less churn when the AMX bits are added. The next patch
> pretty strongly depends on the changed approach (contextually, not so
> much functionally), and I'd really like to avoid rebasing that one ahead
> of this one, and then this one on top of that.

Fine by me.

> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -26,6 +26,18 @@
>  #include <asm/iocap.h>
>  #include <asm/vm_event.h>
>  
> +/*
> + * We may read or write up to m512 or up to a tile row as a number of
> + * device-model transactions.
> + */
> +struct hvm_mmio_cache {
> +    unsigned long gla;
> +    unsigned int size;
> +    unsigned int space:31;
> +    unsigned int dir:1;
> +    uint8_t buffer[] __aligned(sizeof(long));

I know this is a minor tangent, but you are turning a regular struct
into a flexible one.

Could we introduce __counted_by() and start using it here?

At the toolchain level, it lets the compiler understand the real size of
the object, so e.g. the sanitisers can spot out-of-bounds accesses
through the flexible member.

But, even in the short term, having

    /* TODO */
    # define __counted_by(member)

in compiler.h still leaves us with better code, because

    struct hvm_mmio_cache {
        unsigned long gla;
        unsigned int size;
        unsigned int space:31;
        unsigned int dir:1;
        uint8_t buffer[] __aligned(sizeof(long)) __counted_by(size);
    };

is explicitly clear in a case where the "space" field creates some
ambiguity.

> @@ -2978,16 +2991,21 @@ void hvm_dump_emulation_state(const char
>  int hvmemul_cache_init(struct vcpu *v)
>  {
>      /*
> -     * No insn can access more than 16 independent linear addresses (AVX512F
> -     * scatters/gathers being the worst). Each such linear range can span a
> -     * page boundary, i.e. may require two page walks. Account for each insn
> -     * byte individually, for simplicity.
> +     * AVX512F scatter/gather insns can access up to 16 independent linear
> +     * addresses, up to 8 bytes size. Each such linear range can span a page
> +     * boundary, i.e. may require two page walks.
> +     */
> +    unsigned int nents = 16 * 2 * (CONFIG_PAGING_LEVELS + 1);
> +    unsigned int i, max_bytes = 64;
> +    struct hvmemul_cache *cache;
> +
> +    /*
> +     * Account for each insn byte individually, both for simplicity and to
> +     * leave some slack space.
>       */

Hang on.  Do we seriously use a separate cache entry for each
instruction byte ?

~Andrew

Reply via email to