On 10/18/2015 06:06 PM, Alex Forster wrote: > On 10/15/15, 3:53 PM, "Alexander Duyck" <alexander.duyck at gmail.com> wrote: > > >>>> It looks like you are probably seeing interfaces be unbound and then >>>> rebound. As such you are likely pushing things outside of the array >>>> boundary. One solution might just be to at more ",1"s if you are only >>>> going to be doing this kind of thing at boot up. The upper limit for >>>> the array is 32 entries so as long as you only are setting this up once >>>> you could probably get away with that. >>>> >>>> An alternative would be to modify the definition of the parameter in >>>> ixgbe_param.c. If you look through the file you should fine several >>>> likes like below: >>>> struct ixgbe_option opt = { >>>> .type = enable_option, >>>> .name = "allow_unsupported_sfp", >>>> .err = "defaulting to Disabled", >>>> .def = OPTION_DISABLED >>>> }; >>>> >>>> If you modify the .def value to "OPTION_ENABLED", and then rebuild and >>>> reinstall your driver you should be able have it install without any >>>> issues. >>>> >>>> - Alex >>>> >>> Yeah, I've had roughly the same thought process since you mentioned the >>> args array. My first idea was "maybe the driver can't fit all of my 1's" >>> but I saw it was defined at 32. Then I decided to just patch the whole >>> enable_unsupported_sfp option out >>> https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still >>> failing. >>> >>> I've been digging a bit, and I'm failing here in ixgbe_main.c... >>> >>> /* reset_hw fills in the perm_addr as well */ >>> hw->phy.reset_if_overtemp = true; >>> err = hw->mac.ops.reset_hw(hw); >>> hw->phy.reset_if_overtemp = false; >>> if (err == IXGBE_ERR_SFP_NOT_PRESENT) { >>> err = IXGBE_SUCCESS; >>> } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) { >>> e_dev_err("failed to load because an unsupported SFP+ or QSFP " >>> "module type was detected.\n"); >>> e_dev_err("Reload the driver after installing a supported " >>> "module.\n"); >>> goto err_sw_init; >>> } else if (err) { >>> e_dev_err("HW Init failed: %d\n", err); >>> goto err_sw_init; >>> } >>> >>> >>> I've attempted a hand-stacktrace and came up with the following... >>> >>> ixgbe_82599.c at 1016 >>> * ixgbe_reset_hw_82599() is defined >>> * calls phy->ops.init() which potentially returns >>> IXGBE_ERR_SFP_NOT_SUPPORTED >>> >>> ixgbe_82599.c at 102 >>> * ixgbe_init_phy_ops_82599() is defined >>> * IXGBE_ERR_SFP_NOT_SUPPORTED is returned after calling >>> phy->ops.identify() >>> >>> ixgbe_82599.c at 2085 >>> * ixgbe_identify_phy_82599() is defined >>> * calls ixgbe_identify_module_generic() >>> >>> ixgbe_phy.c at 1281 >>> * ixgbe_identify_module_generic() is defined >>> * calls ixgbe_identify_qsfp_module_generic() >>> >>> ixgbe_phy.c at 1663 >>> * ixgbe_identify_qsfp_module_generic() is defined >>> * We fail somewhere before the ending call to ixgbe_get_device_caps() >>> which does take allow_unsupported_sfp into account >>> >>> * Possibility: hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER, >>> &identifier) != IXGBE_SFF_IDENTIFIER_QSFP_PLUS >>> * Possibility: active_cable != true >>> >>> And then I'm over my head. Should I assume from here that the most >>> likely >>> explanation is a bad transceiver or bad fiber? >>> >>> Alex Forster >> >> Are you able to swap transceiver or fiber between the 4 ports that work >> and the 4 that don't? If you could do that then you should be able to >> tell if the issue is following the NIC ports, or if it is an issue with >> the external connections. If it is issue is following the transceiver >> or fiber then it is probably what is causing the issue. >> >> The other thing you could try doing is adding a printk to the spots >> where the status is set to SFP_NOT_SUPPORTED so that you could figure >> out exactly which spot is triggering the rejection of the module. >> >> - Alex > > I had remote hands swap fibers on the QSFP side and the issue moved to the > first card, so I'm going to have the fibers cleaned and tested. This > appears to be my issue. > > I'd like to submit a patch for ixgbe_identify_qsfp_module_generic() to > print more helpful errors in the two cases mentioned above, so that > hopefully nobody ever has to deal with this again. Would I be wasting my > time, or does something like this have a likelihood of being accepted? > > Thank you for all of your help! I wouldn't have figured this out nearly as > quickly without it. > > Alex Forster
I suspect there would be some value to such a patch, just make sure to explain the reason for needing it in the patch description. My advice would be to put such a patch together against what is in Jeff Kirsher's next queue (https://git.kernel.org/cgit/linux/kernel/git/jkirsher/next-queue.git/) and then base your patches off of that. The email lists for submitting patches to is intel-wired-lan at lists.osuosl.org and netdev at vger.kernel.org. - Alex - Alex