Hi Stephen,
On 07/06/2018 02:38 PM, Stephen Hemminger wrote: > On Tue, 3 Jul 2018 15:42:46 -0700 > Jesus Sanchez-Palencia <jesus.sanchez-palen...@intel.com> wrote: > >> Changes since v1: >> - moved struct sock_txtime from socket.h to uapi net_tstamp.h; >> - sk_clockid was changed from u16 to u8; >> - sk_txtime_flags was changed from u16 to a u8 bit field in struct sock; >> - the socket option flags are now validated in sock_setsockopt(); >> - added SO_EE_ORIGIN_TXTIME; >> - sockc.transmit_time is now initialized from all IPv4 Tx paths; >> - added support for the IPv6 Tx path; >> >> >> Overview >> ======== >> >> This work consists of a set of kernel interfaces that can be used by >> applications that require (time-based) Scheduled Tx of packets. >> It is comprised by 3 new components to the kernel: >> >> - SO_TXTIME: socket option + cmsg programming interfaces. >> >> - etf: the "earliest txtime first" qdisc, that provides per-queue >> TxTime-based scheduling. This has been renamed from 'tbs' to >> 'etf' to better describe its functionality. >> >> - taprio: the "time-aware priority scheduler" qdisc, that provides >> per-port Time-Aware scheduling; >> >> This patchset is providing the first 2 components, which have been >> developed for longer. The taprio qdisc will be shared as an RFC separately >> (shortly). >> >> Note that this series is a follow up of the "Time based packet >> transmission" RFCv3 [1]. >> >> >> >> etf (formerly known as 'tbs') >> ============================= >> >> For applications/systems that the concept of time slices isn't precise >> enough, the etf qdisc allows applications to control the instant when >> a packet should leave the network controller. When used in conjunction >> with taprio, it can also be used in case the application needs to >> control with greater guarantee the offset into each time slice a packet >> will be sent. Another use case of etf, is when only a small number of >> applications on a system are time sensitive, so it can then be used >> with a more traditional root qdisc (like mqprio). >> >> The etf qdisc is designed so it buffers packets until a configurable >> time before their deadline (Tx time). The qdisc uses a rbtree internally >> so the buffered packets are always 'ordered' by their txtime (deadline) >> and will be dequeued following the earliest txtime first. >> >> It relies on the SO_TXTIME API set for receiving the per-packet timestamp >> (txtime) as well as the config flags for each socket: the clockid to be >> used as a reference, if the expected mode of txtime for that socket is >> deadline or strict mode, and if packet drops should be reported on the >> socket's error queue or not. >> >> The qdisc will drop any packets with a Tx time in the past, or if a >> packet expires while waiting for being dequeued. Drops can be reported >> as errors back to userspace through the socket's error queue. >> >> Example configuration: >> >> $ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 200000 \ >> clockid CLOCK_TAI >> >> Here, the Qdisc will use HW offload for the txtime control. >> Packets will be dequeued by the qdisc "delta" (200000) nanoseconds before >> their transmission time. Because this will be using HW offload and >> since dynamic clocks are not supported by hrtimers, the system clock >> and the PHC clock must be synchronized for this mode to behave as expected. >> >> A more complete example can be found here, with instructions of how to >> test it: >> >> https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [2] >> >> >> Note that we haven't modified the qdisc so it uses a timerqueue because >> the modification needed was increasing the number of cachelines of a sk_buff. >> >> >> >> This series is also hosted on github and can be found at [3]. >> The companion iproute2 patches can be found at [4]. >> >> >> [1] https://patchwork.ozlabs.org/cover/882342/ >> >> [2] github doesn't make it clear, but the gist can be cloned like this: >> $ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f >> scheduled-tx-tests >> >> [3] https://github.com/jeez/linux/tree/etf-v2 >> >> [4] https://github.com/jeez/iproute2/tree/etf-v2 >> >> >> >> Jesus Sanchez-Palencia (10): >> net: Clear skb->tstamp only on the forwarding path >> net: ipv4: Hook into time based transmission >> net: ipv6: Hook into time based transmission >> net/sched: Add HW offloading capability to ETF >> igb: Refactor igb_configure_cbs() >> igb: Only change Tx arbitration when CBS is on >> igb: Refactor igb_offload_cbs() >> igb: Only call skb_tx_timestamp after descriptors are ready >> igb: Add support for ETF offload >> net/sched: Make etf report drops on error_queue >> >> Richard Cochran (2): >> net: Add a new socket option for a future transmit time. >> net: packet: Hook into time based transmission. >> >> Vinicius Costa Gomes (2): >> net/sched: Allow creating a Qdisc watchdog with other clocks >> net/sched: Introduce the ETF Qdisc >> >> arch/alpha/include/uapi/asm/socket.h | 3 + >> arch/ia64/include/uapi/asm/socket.h | 3 + >> arch/mips/include/uapi/asm/socket.h | 3 + >> arch/parisc/include/uapi/asm/socket.h | 3 + >> arch/s390/include/uapi/asm/socket.h | 3 + >> arch/sparc/include/uapi/asm/socket.h | 3 + >> arch/xtensa/include/uapi/asm/socket.h | 3 + >> .../net/ethernet/intel/igb/e1000_defines.h | 16 + >> drivers/net/ethernet/intel/igb/igb.h | 1 + >> drivers/net/ethernet/intel/igb/igb_main.c | 256 ++++++--- >> include/linux/netdevice.h | 1 + >> include/net/inet_sock.h | 1 + >> include/net/pkt_sched.h | 7 + >> include/net/sock.h | 11 + >> include/uapi/asm-generic/socket.h | 3 + >> include/uapi/linux/errqueue.h | 4 + >> include/uapi/linux/net_tstamp.h | 18 + >> include/uapi/linux/pkt_sched.h | 18 + >> net/core/skbuff.c | 2 +- >> net/core/sock.c | 39 ++ >> net/ipv4/icmp.c | 2 + >> net/ipv4/ip_output.c | 3 + >> net/ipv4/ping.c | 1 + >> net/ipv4/raw.c | 2 + >> net/ipv4/udp.c | 1 + >> net/ipv6/ip6_output.c | 11 +- >> net/ipv6/raw.c | 7 +- >> net/ipv6/udp.c | 1 + >> net/packet/af_packet.c | 6 + >> net/sched/Kconfig | 11 + >> net/sched/Makefile | 1 + >> net/sched/sch_api.c | 11 +- >> net/sched/sch_etf.c | 484 ++++++++++++++++++ >> 33 files changed, 864 insertions(+), 75 deletions(-) >> create mode 100644 net/sched/sch_etf.c >> > > Why support different clockid's in the API? > I think the clock used in API should be either nanoseconds or USER_HZ (ie 100) > and the kernel components should use ktime. If you need to translate that to > some > other value in the hardware driver, then let the device driver do it. > > Exposing multiple choices in userspace API, leads to more error paths and does > not provide direct benefits. The kernel components already use ktime_t. The clockid_t here is to define the time source (i.e. which clock must be used to read the ktime from) and not the unit of time. I hope that clarifies. Regards, Jesus