Re: [Intel-wired-lan] [iwl-net v2 2/2] igb: Fix missing time sync events
Hi, It appears this change breaks PTP on the 82580 controller, as ptp4l reports: > timed out while polling for tx timestamp increasing tx_timestamp_timeout or > increasing kworker priority may correct this issue, but a driver bug likely > causes it The 82580 controller has a hardware bug in which reading TSICR doesn't clear it. See this thread https://lore.kernel.org/all/cdcb8be0.1ec2c%25matthew.v...@intel.com/ where the bug was confirmed by an Intel employee. Any chance we could add back the ack for 82580 specifically? Thanks!
Re: [Intel-wired-lan] [iwl-net v2 2/2] igb: Fix missing time sync events
> Of course, I'll prepare a patch for that. Excellent, thank you! On Fri, Aug 9, 2024 at 9:39 AM Vinicius Costa Gomes wrote: > > Daiwei Li writes: > > > Hi, > > > > It appears this change breaks PTP on the 82580 controller, as ptp4l reports: > > > >> timed out while polling for tx timestamp increasing tx_timestamp_timeout or > >> increasing kworker priority may correct this issue, but a driver bug likely > >> causes it > > > > The 82580 controller has a hardware bug in which reading TSICR doesn't clear > > it. See this thread > > https://lore.kernel.org/all/cdcb8be0.1ec2c%25matthew.v...@intel.com/ where > > the > > bug was confirmed by an Intel employee. Any chance we could add back the ack > > for 82580 specifically? Thanks! > > Of course, I'll prepare a patch for that. > > Thanks for digging that one up. > > > Cheers, > -- > Vinicius
Re: [Intel-wired-lan] [PATCH iwl-net v1] igb: Fix not clearing TimeSync interrupts for 82580
> @Daiwei Li, I don't have a 82580 handy, please confirm that the patch fixes the issue you are having. Thank you for the patch! I can confirm it fixes my issue. Below I offer a patch that also works in response to Paul's feedback. > Please also add a description of the test case I am running ptp4l to serve PTP to a client device attached to the NIC. To test, I am rebuilding igb.ko and reloading it. Without this patch, I see repeatedly in the output of ptp4l: > timed out while polling for tx timestamp increasing tx_timestamp_timeout or > increasing kworker priority may correct this issue, but a driver bug likely > causes it as well as my client device failing to sync time. > and maybe the PCI vendor and device code of your network device. % lspci -nn | grep Network 17:00.0 Ethernet controller [0200]: Intel Corporation 82580 Gigabit Network Connection [8086:150e] (rev 01) 17:00.1 Ethernet controller [0200]: Intel Corporation 82580 Gigabit Network Connection [8086:150e] (rev 01) 17:00.2 Ethernet controller [0200]: Intel Corporation 82580 Gigabit Network Connection [8086:150e] (rev 01) 17:00.3 Ethernet controller [0200]: Intel Corporation 82580 Gigabit Network Connection [8086:150e] (rev 01) > Bug, or was it a feature? According to https://lore.kernel.org/all/cdcb8be0.1ec2c%25matthew.v...@intel.com/ it was a bug. It looks like the datasheet was not updated to acknowledge this bug: https://www.intel.com/content/www/us/en/content-details/333167/intel-82580-eb-82580-db-gbe-controller-datasheet.html (section 8.17.28.1). > Is there a nicer way to write this, so `ack` is only assigned in case > for the 82580? diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index ada42ba63549..87ec1258e22a 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6986,6 +6986,10 @@ static void igb_tsync_interrupt(struct igb_adapter *adapter) struct e1000_hw *hw = &adapter->hw; u32 tsicr = rd32(E1000_TSICR); struct ptp_clock_event event; + const u32 mask = (TSINTR_SYS_WRAP | E1000_TSICR_TXTS | + TSINTR_TT0 | TSINTR_TT1 | + TSINTR_AUTT0 | TSINTR_AUTT1); + if (tsicr & TSINTR_SYS_WRAP) { event.type = PTP_CLOCK_PPS; @@ -7009,6 +7013,13 @@ static void igb_tsync_interrupt(struct igb_adapter *adapter) if (tsicr & TSINTR_AUTT1) igb_extts(adapter, 1); + + if (hw->mac.type == e1000_82580) { + /* 82580 has a hardware bug that requires a explicit +* write to clear the TimeSync interrupt cause. +*/ + wr32(E1000_TSICR, tsicr & mask); + } } On Fri, Aug 9, 2024 at 10:04 PM Richard Cochran wrote: > > On Fri, Aug 09, 2024 at 05:23:02PM -0700, Vinicius Costa Gomes wrote: > > It was reported that 82580 NICs have a hardware bug that makes it > > necessary to write into the TSICR (TimeSync Interrupt Cause) register > > to clear it. > > Bug, or was it a feature? > > Or IOW, maybe i210 changed the semantics of the TSICR? > > And what about the 82576? > > Thanks, > Richard
[Intel-wired-lan] [PATCH iwl-net v2] igb: Fix not clearing TimeSync interrupts for 82580
82580 NICs have a hardware bug that makes it necessary to write into the TSICR (TimeSync Interrupt Cause) register to clear it: https://lore.kernel.org/all/cdcb8be0.1ec2c%25matthew.v...@intel.com/ Add a conditional so only for 82580 we write into the TSICR register, so we don't risk losing events for other models. This (partially) reverts commit ee14cc9ea19b ("igb: Fix missing time sync events"). Fixes: ee14cc9ea19b ("igb: Fix missing time sync events") Closes: https://lore.kernel.org/intel-wired-lan/CAN0jFd1kO0MMtOh8N2Ztxn6f7vvDKp2h507sMryobkBKe=xk=w...@mail.gmail.com/ Tested-by: Daiwei Li Signed-off-by: Daiwei Li --- @Vinicius Gomes, this is my first time submitting a Linux kernel patch, so apologies if I missed any part of the procedure (e.g. this is currently on top of 6.7.12, the kernel I am running; should I be rebasing on inline?). Also, is there any way to annotate the patch to give you credit for the original change? drivers/net/ethernet/intel/igb/igb_main.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index ada42ba63549..1210ddc5d81e 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6986,6 +6986,16 @@ static void igb_tsync_interrupt(struct igb_adapter *adapter) struct e1000_hw *hw = &adapter->hw; u32 tsicr = rd32(E1000_TSICR); struct ptp_clock_event event; + const u32 mask = (TSINTR_SYS_WRAP | E1000_TSICR_TXTS | + TSINTR_TT0 | TSINTR_TT1 | + TSINTR_AUTT0 | TSINTR_AUTT1); + + if (hw->mac.type == e1000_82580) { + /* 82580 has a hardware bug that requires a explicit +* write to clear the TimeSync interrupt cause. +*/ + wr32(E1000_TSICR, tsicr & mask); + } if (tsicr & TSINTR_SYS_WRAP) { event.type = PTP_CLOCK_PPS; -- 2.46.0.76.ge559c4bf1a-goog
[Intel-wired-lan] [PATCH iwl-net v3] igb: Fix not clearing TimeSync interrupts for 82580
82580 NICs have a hardware bug that makes it necessary to write into the TSICR (TimeSync Interrupt Cause) register to clear it: https://lore.kernel.org/all/cdcb8be0.1ec2c%25matthew.v...@intel.com/ Add a conditional so only for 82580 we write into the TSICR register, so we don't risk losing events for other models. Without this change, when running ptp4l with an Intel 82580 card, I get the following output: > timed out while polling for tx timestamp increasing tx_timestamp_timeout or > increasing kworker priority may correct this issue, but a driver bug likely > causes it This goes away with this change. This (partially) reverts commit ee14cc9ea19b ("igb: Fix missing time sync events"). Fixes: ee14cc9ea19b ("igb: Fix missing time sync events") Closes: https://lore.kernel.org/intel-wired-lan/CAN0jFd1kO0MMtOh8N2Ztxn6f7vvDKp2h507sMryobkBKe=xk=w...@mail.gmail.com/ Tested-by: Daiwei Li Suggested-by: Vinicius Costa Gomes Signed-off-by: Daiwei Li --- drivers/net/ethernet/intel/igb/igb_main.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index ada42ba63549..69d7e8b16437 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6984,9 +6984,19 @@ static void igb_extts(struct igb_adapter *adapter, int tsintr_tt) static void igb_tsync_interrupt(struct igb_adapter *adapter) { struct e1000_hw *hw = &adapter->hw; + const u32 mask = (TSINTR_SYS_WRAP | E1000_TSICR_TXTS | + TSINTR_TT0 | TSINTR_TT1 | + TSINTR_AUTT0 | TSINTR_AUTT1); u32 tsicr = rd32(E1000_TSICR); struct ptp_clock_event event; + if (hw->mac.type == e1000_82580) { + /* 82580 has a hardware bug that requires an explicit +* write to clear the TimeSync interrupt cause. +*/ + wr32(E1000_TSICR, tsicr & mask); + } + if (tsicr & TSINTR_SYS_WRAP) { event.type = PTP_CLOCK_PPS; if (adapter->ptp_caps.pps) -- 2.46.0.76.ge559c4bf1a-goog
Re: [Intel-wired-lan] [PATCH iwl-net v2] igb: Fix not clearing TimeSync interrupts for 82580
Thank you for the review! I've sent out another patch that hopefully addresses the comments. On Tue, Aug 13, 2024 at 3:26 PM Vinicius Costa Gomes wrote: > > Daiwei Li writes: > > > 82580 NICs have a hardware bug that makes it > > necessary to write into the TSICR (TimeSync Interrupt Cause) register > > to clear it: > > https://lore.kernel.org/all/cdcb8be0.1ec2c%25matthew.v...@intel.com/ > > > > Add a conditional so only for 82580 we write into the TSICR register, > > so we don't risk losing events for other models. > > Please add some information in the commit message about how to reproduce > the issue, as Paul suggested. > > > > > This (partially) reverts commit ee14cc9ea19b ("igb: Fix missing time sync > > events"). > > > > Fixes: ee14cc9ea19b ("igb: Fix missing time sync events") > > Closes: > > https://lore.kernel.org/intel-wired-lan/CAN0jFd1kO0MMtOh8N2Ztxn6f7vvDKp2h507sMryobkBKe=xk=w...@mail.gmail.com/ > > Tested-by: Daiwei Li > > Signed-off-by: Daiwei Li > > --- > > > > @Vinicius Gomes, this is my first time submitting a Linux kernel patch, > > so apologies if I missed any part of the procedure (e.g. this is > > currently on top of 6.7.12, the kernel I am running; should I be > > rebasing on inline?). Also, is there any way to annotate the patch > > to give you credit for the original change? > > Your submission format looks fine. Just a couple details: > - No need for setting in-reply-to (or something like it); > > - For this particular patch, you got lucky and it applies cleanly > against current tip, but for future submissions, for intel-wired-lan > and patches intended for the stable tree, please rebase against: > > https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git/ > > For credits, you can add something like: > > Suggested-by: Vinicius Costa Gomes > > > > > drivers/net/ethernet/intel/igb/igb_main.c | 10 ++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > > b/drivers/net/ethernet/intel/igb/igb_main.c > > index ada42ba63549..1210ddc5d81e 100644 > > --- a/drivers/net/ethernet/intel/igb/igb_main.c > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > > @@ -6986,6 +6986,16 @@ static void igb_tsync_interrupt(struct igb_adapter > > *adapter) > > struct e1000_hw *hw = &adapter->hw; > > u32 tsicr = rd32(E1000_TSICR); > > struct ptp_clock_event event; > > + const u32 mask = (TSINTR_SYS_WRAP | E1000_TSICR_TXTS | > > + TSINTR_TT0 | TSINTR_TT1 | > > + TSINTR_AUTT0 | TSINTR_AUTT1); > > + > > Please move the declaration of 'mask' up, to follow the convention, the > "reverse christmas tree" rule. Or separate the attribution from the > declaration. > > > + if (hw->mac.type == e1000_82580) { > > + /* 82580 has a hardware bug that requires a explicit > > And as pointed by Paul, "*an* explicit". > > > + * write to clear the TimeSync interrupt cause. > > + */ > > + wr32(E1000_TSICR, tsicr & mask); > > + } > > > > if (tsicr & TSINTR_SYS_WRAP) { > > event.type = PTP_CLOCK_PPS; > > -- > > 2.46.0.76.ge559c4bf1a-goog > > > > -- > Vinicius -- Daiwei Li Software Engineer Mobile: 415-736-8670 waymo.com