On Mar 20, 2021, at 14:31, Thomas Gleixner <t...@linutronix.de> wrote:
> On Sun, Feb 21 2021 at 10:56, Chang S. Bae wrote:
>> 
>> +static void check_xtile_data_against_struct(int size)
>> +{
>> +    u32 max_palid, palid, state_size;
>> +    u32 eax, ebx, ecx, edx;
>> +    u16 max_tile;
>> +
>> +    /*
>> +     * Check the maximum palette id:
>> +     *   eax: the highest numbered palette subleaf.
>> +     */
>> +    cpuid_count(TILE_CPUID, 0, &max_palid, &ebx, &ecx, &edx);
>> +
>> +    /*
>> +     * Cross-check each tile size and find the maximum
>> +     * number of supported tiles.
>> +     */
>> +    for (palid = 1, max_tile = 0; palid <= max_palid; palid++) {
>> +            u16 tile_size, max;
>> +
>> +            /*
>> +             * Check the tile size info:
>> +             *   eax[31:16]:  bytes per title
>> +             *   ebx[31:16]:  the max names (or max number of tiles)
>> +             */
>> +            cpuid_count(TILE_CPUID, palid, &eax, &ebx, &edx, &edx);
>> +            tile_size = eax >> 16;
>> +            max = ebx >> 16;
>> +
>> +            if (WARN_ONCE(tile_size != sizeof(struct xtile_data),
>> +                          "%s: struct is %zu bytes, cpu xtile %d bytes\n",
>> +                          __stringify(XFEATURE_XTILE_DATA),
>> +                          sizeof(struct xtile_data), tile_size))
>> +                    __xstate_dump_leaves();
>> +
>> +            if (max > max_tile)
>> +                    max_tile = max;
>> +    }
>> +
>> +    state_size = sizeof(struct xtile_data) * max_tile;
>> +    if (WARN_ONCE(size != state_size,
>> +                  "%s: calculated size is %u bytes, cpu state %d bytes\n",
>> +                  __stringify(XFEATURE_XTILE_DATA), state_size, size))
>> +            __xstate_dump_leaves();
> 
> So we have 2 warnings which complain about inconsistent state and that's
> it? Why has this absolutely no consequences? We just keep stuff enabled
> and jug along, right?
> 
> Which one of the two states is correct? Why don't we just disable that
> muck and be done with it to play it safe?
> 
> Failing to execute some workload by saying NO due to inconsistency is
> far more useful than taking the chance of potential silent data
> corruption.

This change in fact follows the mainline code [1], where this type of warning
is emitted with such mismatch.

Yes, disabling the feature looks to be the right way. Or, perhaps, taking a
large one is an option when mismatched ?

At least, given the feedback, the mainline needs to be revised before applying
this. Correct me if you don’t think so.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n567

Thanks,
Chang

Reply via email to