On 5/17/19 9:36 AM, Daniel Walker wrote: > On Fri, May 17, 2019 at 08:16:34AM -0700, Alexander Duyck wrote: >> On Thu, May 16, 2019 at 6:48 PM Florian Fainelli <f.faine...@gmail.com> >> wrote: >>> >>> >>> >>> On 5/16/2019 6:03 PM, Daniel Walker wrote: >>>> On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote: >>>>> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote: >>>>>> >>>>>> >>>>>> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirs...@intel.com> wrote: >>>>>> >>>>>> On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote: >>>>>> >> Some of the broken NICs don't have EEPROM programmed correctly. It >>>>>> >> results >>>>>> >> in probe to fail. This change adds a module parameter that can be >>>>>> >> used to >>>>>> >> ignore nvm checksum validation. >>>>>> >> >>>>>> >> Cc: xe-linux-exter...@cisco.com >>>>>> >> Signed-off-by: Nikunj Kela <nk...@cisco.com> >>>>>> >> --- >>>>>> >> drivers/net/ethernet/intel/igb/igb_main.c | 28 >>>>>> >> ++++++++++++++++++++++------ >>>>>> >> 1 file changed, 22 insertions(+), 6 deletions(-) >>>>>> >>>>>> >NAK for two reasons. First, module parameters are not desirable >>>>>> >because their individual to one driver and a global solution should >>>>>> be >>>>>> >found so that all networking device drivers can use the solution. >>>>>> This >>>>>> >will keep the interface to change/setup/modify networking drivers >>>>>> >consistent for all drivers. >>>>>> >>>>>> >>>>>> >Second and more importantly, if your NIC is broken, fix it. Do not >>>>>> try >>>>>> >and create a software workaround so that you can continue to use a >>>>>> >broken NIC. There are methods/tools available to properly reprogram >>>>>> >the EEPROM on a NIC, which is the right solution for your issue. >>>>>> >>>>>> I am proposing this as a debug parameter. Obviously, we need to fix >>>>>> EEPROM but this helps us continuing the development while manufacturing >>>>>> fixes NIC. >>>>> >>>>> Then why even bother with sending this upstream? >>>> >>>> It seems rather drastic to disable the entire driver because the checksum >>>> doesn't match. It really should be a warning, even a big warning, to let >>>> people >>>> know something is wrong, but disabling the whole driver doesn't make sense. >>> >>> You could generate a random Ethernet MAC address if you don't have a >>> valid one, a lot of drivers do that, and that's a fairly reasonable >>> behavior. At some point in your product development someone will >>> certainly verify that the provisioned MAC address matches the network >>> interface's MAC address. >>> -- >>> Florian >> >> The thing is the EEPROM contains much more than just the MAC address. >> There ends up being configuration for some of the PCIe interface in >> the hardware as well as PHY configuration. If that is somehow mangled >> we shouldn't be bringing up the part because there are one or more >> pieces of the device configuration that are likely wrong. >> >> The checksum is being used to make sure the EEPROM is valid, without >> that we would need to go through and validate each individual section >> of the EEPROM before enabling the the portions of the device related >> to it. The concern is that this will become a slippery slope where we >> eventually have to code all the configuration of the EEPROM into the >> driver itself. > > > I don't think you can say because the checksum is valid that all data > contained > inside is also valid. You can have a valid checksum , and someone screwed up > the > data prior to the checksum getting computed. > > >> We need to make the checksum a hard stop. If the part is broken then >> it needs to be addressed. Workarounds just end up being used and >> forgotten, which makes it that much harder to support the product. >> Better to mark the part as being broken, and get it fixed now, than to >> have parts start shipping that require workarounds in order to >> function.o > > I don't think it's realistic to define the development process for large > corporations like Cisco, or like what your doing , to define the development > process for all corporations and products which may use intel parts. It's > better > to be flexible.
Nikunj indicated that "manufacturing fixes NIC" so that sounds like a workaround for an issue that would not affect a final product, in which case, keeping downstream changes for development boards/intermediate revisions of a product and focusing on relevant upstreaming changes for the actual product would make a lot more sense, no? -- Florian