On 5/20/2024 1:33 AM, Przemek Kitszel wrote:
> On 5/18/24 01:22, Jacob Keller wrote:
>> The ice_get_pfa_module_tlv() function iterates over the TLVs in the
>> Preserved Fields Area (PFA) of the NVM. This is used to access data such as
>> the Part Board Assembly identifier.
>>
>> Some NVMs in the wild have been found with incorrect TLV lengths including
>> at least one which reports a TLV length of 0xFFFF. When trying to read the
>> PBA from such an NVM, the driver will compute a new offset for the next_tlv
>> which is lower, due to overflowing the 16-bit next_tlv variable.
>>
>> In the best case, the driver will incorrectly interpret values until it
>> finds one which has an offset greater than the PFA area without
>> overflowing. In the worst case, the values in the NVM result in an infinite
>> loop as the misinterpreted lengths result in checking offsets which are
>> valid within the PFA, and which ultimately point in an infinite loop.
>>
>> Fix this by using check_add_overflow when calculating the NVM offsets, and
>> bailing if we ever overflow. Additionally, use check_add_overflow when
>> calculating the initial maximum PFA size.
>>
>> This ensures that we bail immediately on encountering any TLV who's length
>> would have caused the naive addition to overflow, rather than entering an
>> infinite loop or otherwise misinterpreting NVM values.
>>
>> Fixes: e961b679fb0b ("ice: add board identifier info to devlink .info_get")
>> Co-developed-by: Paul Greenwalt <paul.greenw...@intel.com>
>> Signed-off-by: Paul Greenwalt <paul.greenw...@intel.com>
>> Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
>> ---
>> Changes in v3:
>> - Fix missing {
>> - Fix missing pfa_ptr variable to dev_warn()
>> - Add Fixes tag
>> - Link to v2: 
>> https://lore.kernel.org/r/20240517-iwl-net-2024-05-16-fix-nvm-tlv-issue-v2-1-fdee184ec...@intel.com
>>
>> Changes in v2:
>> - Use check_add_overflow instead of increasing the variables to u32
>> - Upgrade log messages to dev_warn()
>> - Link to v1: 
>> https://lore.kernel.org/r/20240516-iwl-net-2024-05-16-fix-nvm-tlv-issue-v1-1-ecbb6a759...@intel.com
>> ---
>>   drivers/net/ethernet/intel/ice/ice_nvm.c | 19 +++++++++++++++----
>>   1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c 
>> b/drivers/net/ethernet/intel/ice/ice_nvm.c
>> index 84eab92dc03c..be731b83d667 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_nvm.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
>> @@ -440,8 +440,7 @@ int
>>   ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 
>> *module_tlv_len,
>>                     u16 module_type)
>>   {
>> -    u16 pfa_len, pfa_ptr;
>> -    u16 next_tlv;
>> +    u16 pfa_len, pfa_ptr, next_tlv, max_pfa;
>>      int status;
>>   
>>      status = ice_read_sr_word(hw, ICE_SR_PFA_PTR, &pfa_ptr);
>> @@ -454,11 +453,18 @@ ice_get_pfa_module_tlv(struct ice_hw *hw, u16 
>> *module_tlv, u16 *module_tlv_len,
>>              ice_debug(hw, ICE_DBG_INIT, "Failed to read PFA length.\n");
>>              return status;
>>      }
>> +
>> +    if (check_add_overflow(pfa_ptr, pfa_len, &max_pfa)) {
>> +            dev_warn(ice_hw_to_dev(hw), "PFA starts at offset %u. PFA 
>> length of %u causes 16-bit arithmetic overflow.\n",
>> +                     pfa_ptr, pfa_len);
>> +            return -EINVAL;
>> +    }
>> +
>>      /* Starting with first TLV after PFA length, iterate through the list
>>       * of TLVs to find the requested one.
>>       */
>>      next_tlv = pfa_ptr + 1;
>> -    while (next_tlv < pfa_ptr + pfa_len) {
>> +    while (next_tlv < max_pfa) {
>>              u16 tlv_sub_module_type;
>>              u16 tlv_len;
>>   
>> @@ -485,7 +491,12 @@ ice_get_pfa_module_tlv(struct ice_hw *hw, u16 
>> *module_tlv, u16 *module_tlv_len,
>>              /* Check next TLV, i.e. current TLV pointer + length + 2 words
>>               * (for current TLV's type and length)
>>               */
>> -            next_tlv = next_tlv + tlv_len + 2;
>> +            if (check_add_overflow(next_tlv, 2, &next_tlv) ||
>> +                check_add_overflow(next_tlv, tlv_len, &next_tlv)) {
>> +                    dev_warn(ice_hw_to_dev(hw), "Failed to locate PFA TLV 
>> module of type %u due to arithmetic overflow. The PFA length is %u. The last 
>> TLV has length of %u\n",
>> +                             module_type, pfa_len, tlv_len);
>> +                    return -EINVAL;
>> +            }
>>      }
>>      /* Module does not exist */
>>      return -ENOENT;
>>
>> ---
>> base-commit: 83e93942796db58652288f0391ac00072401816f
>> change-id: 20240516-iwl-net-2024-05-16-fix-nvm-tlv-issue-99ebb2c55c52
>>
>> Best regards,
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kits...@intel.com>

It turns out after digging, that this appears to be caused by
erroneously reading from the NVM past the PFA TLVs:

I added additional logging and I see:

> Failed to locate PFA TLV module of type 22 due to arithmetic overflow. The 
> PFA length is 0x00002f00. The last TLV was 0x00002fff. The PFA area ends at 
> 0x00003000. The last TLV has length of 0x0000e7c1

We're trying to read a TLV starting at offset 0x2FFF, but the PFA is
supposed to end at 0x3000. I think this is actually just a case if an
off-by-one error when calculating the maximum offset of the PFA, that
the next_tlv < max_pfa value check doesn't trigger.

This results in the final loop trying to interpret whatever comes after
the PFA in the NVM as a TLV, which gets the incorrect length value and
causes the overflow.

I need to do a bit more investigation, but I believe the error comes
from the fact that the PFA length value includes the initial length word
as well.

I'm unsure if we would still need the check_add_overflow in that case.
It might be safe in the case of an actual bad NVM.

Reply via email to