Quoting Jakub Kicinski (2020-06-29 15:11:17) > On Mon, 29 Jun 2020 13:51:32 -0700 Andre Guedes wrote: > > > > @@ -435,6 +432,9 @@ static void igc_ptp_tx_hwtstamp(struct igc_adapter > > > > *adapter) > > > > struct igc_hw *hw = &adapter->hw; > > > > u64 regval; > > > > > > > > + if (WARN_ON_ONCE(!skb)) > > > > + return; > > > > + > > > > regval = rd32(IGC_TXSTMPL); > > > > regval |= (u64)rd32(IGC_TXSTMPH) << 32; > > > > igc_ptp_systim_to_hwtstamp(adapter, &shhwtstamps, regval); > > > > @@ -466,7 +466,7 @@ static void igc_ptp_tx_work(struct work_struct > > > > *work) > > > > struct igc_hw *hw = &adapter->hw; > > > > u32 tsynctxctl; > > > > > > > > - if (!adapter->ptp_tx_skb) > > > > + if (!test_bit(__IGC_PTP_TX_IN_PROGRESS, &adapter->state)) > > > > return; > > > > > > Not that reading ptp_tx_skb is particularly correct here, but I think > > > it's better. See how they get set: > > > > > > if (adapter->tstamp_config.tx_type == HWTSTAMP_TX_ON && > > > !test_and_set_bit_lock(__IGC_PTP_TX_IN_PROGRESS, > > > &adapter->state)) { > > > skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; > > > tx_flags |= IGC_TX_FLAGS_TSTAMP; > > > > > > adapter->ptp_tx_skb = skb_get(skb); > > > adapter->ptp_tx_start = jiffies; > > > > > > bit is set first and other fields after. Since there is no locking here > > > we may just see the bit but none of the fields set. > > > > I see your point, but note that the code within the if-block and the code in > > igc_ptp_tx_work() don't execute concurrently. adapter->ptp_tx_work is > > scheduled > > only on a time-sync interrupt, which is triggered if IGC_TX_FLAGS_TSTAMP is > > set (so adapter->ptp_tx_skb is valid). > > What if timeout happens, igc_ptp_tx_hang() starts cleaning up and then > irq gets delivered half way through? Perhaps we should just add a spin > lock around the ptp_tx_s* fields?
Yep, I think this other scenario is possible indeed, and we should probably protect ptp_tx_s* with a lock. Thanks for pointing that out. In fact, it seems this issue can happen even with current net-next code. Since that issue is not introduced by this patch, would it be OK we move forward with it, and fix the issue in a separate patch? - Andre