From: Mateusz Polchlopek <mateusz.polchlo...@intel.com> Date: Tue, 30 Jul 2024 05:15:09 -0400
> From: Jacob Keller <jacob.e.kel...@intel.com> > > Add support for receive timestamps to the Rx hotpath. This support only > works when using the flexible descriptor format, so make sure that we > request this format by default if we have receive timestamp support > available in the PTP capabilities. > > In order to report the timestamps to userspace, we need to perform > timestamp extension. The Rx descriptor does actually contain the "40 > bit" timestamp. However, upper 32 bits which contain nanoseconds are > conveniently stored separately in the descriptor. We could extract the > 32bits and lower 8 bits, then perform a bitwise OR to calculate the > 40bit value. This makes no sense, because the timestamp extension > algorithm would simply discard the lower 8 bits anyways. > > Thus, implement timestamp extension as iavf_ptp_extend_32b_timestamp(), > and extract and forward only the 32bits of nominal nanoseconds. > > Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> > Reviewed-by: Wojciech Drewek <wojciech.dre...@intel.com> > Reviewed-by: Rahul Rameshbabu <rrameshb...@nvidia.com> > Reviewed-by: Sunil Goutham <sgout...@marvell.com> > Reviewed-by: Simon Horman <ho...@kernel.org> > Signed-off-by: Mateusz Polchlopek <mateusz.polchlo...@intel.com> > --- > drivers/net/ethernet/intel/iavf/iavf_main.c | 9 +++ > drivers/net/ethernet/intel/iavf/iavf_ptp.c | 69 +++++++++++++++++++++ > drivers/net/ethernet/intel/iavf/iavf_ptp.h | 4 ++ > drivers/net/ethernet/intel/iavf/iavf_txrx.c | 47 ++++++++++++++ > drivers/net/ethernet/intel/iavf/iavf_type.h | 3 + > 5 files changed, 132 insertions(+) > > diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c > b/drivers/net/ethernet/intel/iavf/iavf_main.c > index 61720b27c8f1..03deb3e02279 100644 > --- a/drivers/net/ethernet/intel/iavf/iavf_main.c > +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c > @@ -729,6 +729,15 @@ static u8 iavf_select_rx_desc_format(struct iavf_adapter > *adapter) > if (!RXDID_ALLOWED(adapter)) > return VIRTCHNL_RXDID_1_32B_BASE; > > + /* Rx timestamping requires the use of flexible NIC descriptors */ > + if (iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_RX_TSTAMP)) { > + if (supported_rxdids & BIT(VIRTCHNL_RXDID_2_FLEX_SQ_NIC)) > + return VIRTCHNL_RXDID_2_FLEX_SQ_NIC; > + > + dev_dbg(&adapter->pdev->dev, > + "Unable to negotiate flexible descriptor format.\n"); 1. Remember about pci_*(). 2. We usually don't put '.' to the end of kernel messages. > + } > + > /* Warn if the PF does not list support for the default legacy > * descriptor format. This shouldn't happen, as this is the format > * used if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is not supported. It is > diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.c > b/drivers/net/ethernet/intel/iavf/iavf_ptp.c > index 7754f4f24052..5fd17f8d1f36 100644 > --- a/drivers/net/ethernet/intel/iavf/iavf_ptp.c > +++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.c > @@ -440,6 +440,9 @@ void iavf_ptp_release(struct iavf_adapter *adapter) > } > adapter->aq_required &= ~IAVF_FLAG_AQ_SEND_PTP_CMD; > mutex_unlock(&adapter->ptp.aq_cmd_lock); > + > + adapter->ptp.hwtstamp_config.rx_filter = HWTSTAMP_FILTER_NONE; > + iavf_ptp_disable_rx_tstamp(adapter); > } > > /** > @@ -473,3 +476,69 @@ void iavf_ptp_process_caps(struct iavf_adapter *adapter) > iavf_ptp_disable_rx_tstamp(adapter); > } > } > + > +/** > + * iavf_ptp_extend_32b_timestamp - Convert a 32b nanoseconds timestamp to 64b > + * nanoseconds > + * @cached_phc_time: recently cached copy of PHC time > + * @in_tstamp: Ingress/egress 32b nanoseconds timestamp value > + * > + * Hardware captures timestamps which contain only 32 bits of nominal > + * nanoseconds, as opposed to the 64bit timestamps that the stack expects. > + * > + * Extend the 32bit nanosecond timestamp using the following algorithm and > + * assumptions: > + * > + * 1) have a recently cached copy of the PHC time > + * 2) assume that the in_tstamp was captured 2^31 nanoseconds (~2.1 > + * seconds) before or after the PHC time was captured. > + * 3) calculate the delta between the cached time and the timestamp > + * 4) if the delta is smaller than 2^31 nanoseconds, then the timestamp was > + * captured after the PHC time. In this case, the full timestamp is just > + * the cached PHC time plus the delta. > + * 5) otherwise, if the delta is larger than 2^31 nanoseconds, then the > + * timestamp was captured *before* the PHC time, i.e. because the PHC > + * cache was updated after the timestamp was captured by hardware. In this > + * case, the full timestamp is the cached time minus the inverse delta. > + * > + * This algorithm works even if the PHC time was updated after a Tx timestamp > + * was requested, but before the Tx timestamp event was reported from > + * hardware. > + * > + * This calculation primarily relies on keeping the cached PHC time up to > + * date. If the timestamp was captured more than 2^31 nanoseconds after the > + * PHC time, it is possible that the lower 32bits of PHC time have > + * overflowed more than once, and we might generate an incorrect timestamp. > + * > + * This is prevented by (a) periodically updating the cached PHC time once > + * a second, and (b) discarding any Tx timestamp packet if it has waited for > + * a timestamp for more than one second. > + * > + * Return: extended timestamp (to 64b) ...but here you actually need a period :D > + */ > +u64 iavf_ptp_extend_32b_timestamp(u64 cached_phc_time, u32 in_tstamp) > +{ > + const u64 mask = GENMASK_ULL(31, 0); > + u32 delta; > + u64 ns; > + > + /* Calculate the delta between the lower 32bits of the cached PHC > + * time and the in_tstamp value > + */ > + delta = (in_tstamp - (u32)(cached_phc_time & mask)); `cached_phc_time & GENMASK_ULL(31, 0)` == lower_32_bits(cached_phc_time) > + > + /* Do not assume that the in_tstamp is always more recent than the > + * cached PHC time. If the delta is large, it indicates that the > + * in_tstamp was taken in the past, and should be converted > + * forward. > + */ > + if (delta > (mask / 2)) { 1. `mask / 2` == S32_MAX. 2. Parenthesis are redundant here. > + /* reverse the delta calculation here */ > + delta = ((u32)(cached_phc_time & mask) - in_tstamp); Here as well. > + ns = cached_phc_time - delta; > + } else { > + ns = cached_phc_time + delta; > + } 1. (u32) casts are not needed since your mask ends with bit 31. 2. (cached_phc_time & mask) is used two times. Perhaps you needed u32 low = lower_32_bits(cached_phc_time); delta = in_tstamp - low; if (delta > S32_MAX) ns = cached_phc_time - (low - in_tstamp); else ns = cached_phc_time + delta; > + > + return ns; > +} [...] > diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c > b/drivers/net/ethernet/intel/iavf/iavf_txrx.c > index 997fd0d520a9..8d74549c3535 100644 > --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c > +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c > @@ -1085,6 +1085,52 @@ static void iavf_flex_rx_hash(const struct iavf_ring > *ring, > } > } > > +/** > + * iavf_flex_rx_tstamp - Capture Rx timestamp from the descriptor > + * @rx_ring: descriptor ring > + * @rx_desc: specific descriptor > + * @skb: skb currently being received > + * > + * Read the Rx timestamp value from the descriptor and pass it to the stack. > + * > + * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible > + * descriptor writeback format. > + */ > +static void iavf_flex_rx_tstamp(const struct iavf_ring *rx_ring, > + const struct iavf_rx_desc *rx_desc, > + struct sk_buff *skb) > +{ > + struct skb_shared_hwtstamps *skb_tstamps; > + struct iavf_adapter *adapter; > + __le64 qw2 = rx_desc->qw2; > + __le64 qw3 = rx_desc->qw3; > + bool tstamp_valid; > + u32 tstamp; > + u64 ns; > + > + /* Skip processing if timestamps aren't enabled */ > + if (!(rx_ring->flags & IAVF_TXRX_FLAGS_HW_TSTAMP)) > + return; > + > + /* Check if this Rx descriptor has a valid timestamp */ > + tstamp_valid = le64_get_bits(qw2, IAVF_PTP_40B_TSTAMP_VALID); > + if (!tstamp_valid) > + return; Read/convert descriptor qwords when you actually need them. if (!(rx_ring->flags & HW_TSTAMP)) return; if (!le64_get_bits(rx_desc->qw2, TSTAMP_VALID)) return; tstamp = le64_get_bits(rx_desc->qw3, ... > + > + adapter = netdev_priv(rx_ring->netdev); > + > + /* the ts_low field only contains the valid bit and sub-nanosecond > + * precision, so we don't need to extract it. > + */ > + tstamp = le64_get_bits(qw3, IAVF_RXD_FLEX_QW3_TSTAMP_HIGH_M); > + ns = iavf_ptp_extend_32b_timestamp(adapter->ptp.cached_phc_time, > + tstamp); > + > + skb_tstamps = skb_hwtstamps(skb); > + memset(skb_tstamps, 0, sizeof(*skb_tstamps)); > + skb_tstamps->hwtstamp = ns_to_ktime(ns); Ah, so many times I've seen this pattern :D &skb_shared_hwtstamps is 2 fields. One field you always initialize above, the other one needs to be cleared. IOW, just do *skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){ .hwtstamp = ns_to_ktime(ns), }; That's it! Thanks, Olek