Hi Stephen, Sorry, NACK.
1. Overflow issue As you declare cycles_per_byte as uint32_t, for a CPU frequency of 2-3 GHz, the line of code below results in overflow: port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) / params->rate; Therefore, there is most likely a significant accuracy loss, which might result in more packets allowed to go out than it should. 2. Integer division has a higher cost than floating point division My understanding is we are considering a performance improvement by replacing the double precision floating point division in: double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte; with an integer division: uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) / port->cycles_per_byte; I don't think this is going to have the claimed benefit, as acording to "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Appendix C), the latency of the integer division instruction is significantly bigger than the latency of integer division: Instruction FDIV double precision: latency = 38-40 cycles Instruction IDIV: latency = 56 - 80 cycles 3. Alternative I hear though your suggestion about replacing the floating point division with a more performant construction. One suggestion would be to replace it with an integer multiplication followed by a shift right, probably by using a uint64_t bytes_per_cycle_scaled_up (the inverse of cycles_per_bytes). I need to prototype this code myself. Would you be OK to look into providing an alternative implementation? Thanks, Cristian -----Original Message----- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger Sent: Thursday, February 5, 2015 6:14 AM To: dev at dpdk.org Cc: Stephen Hemminger Subject: [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock From: Stephen Hemminger <shemm...@brocade.com> The old code was doing a floating point divide for each rte_dequeue() which is very expensive. Change to using fixed point scaled math instead. This improved performance from 5Gbit/sec to 10 Gbit/sec Signed-off-by: Stephen Hemminger <stephen at networkplumber.org> --- lib/librte_sched/rte_sched.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c index 55fbc14..3023457 100644 --- a/lib/librte_sched/rte_sched.c +++ b/lib/librte_sched/rte_sched.c @@ -102,6 +102,9 @@ #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX +/* For cycles_per_byte calculation */ +#define RTE_SCHED_TIME_SHIFT 20 + struct rte_sched_subport { /* Token bucket (TB) */ uint64_t tb_time; /* time of last update */ @@ -239,7 +242,7 @@ struct rte_sched_port { uint64_t time_cpu_cycles; /* Current CPU time measured in CPU cyles */ uint64_t time_cpu_bytes; /* Current CPU time measured in bytes */ uint64_t time; /* Current NIC TX time measured in bytes */ - double cycles_per_byte; /* CPU cycles per byte */ + uint32_t cycles_per_byte; /* CPU cycles per byte (scaled) */ /* Scheduling loop detection */ uint32_t pipe_loop; @@ -657,7 +660,9 @@ rte_sched_port_config(struct rte_sched_port_params *params) port->time_cpu_cycles = rte_get_tsc_cycles(); port->time_cpu_bytes = 0; port->time = 0; - port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double) params->rate); + + port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) + / params->rate; /* Scheduling loop detection */ port->pipe_loop = RTE_SCHED_PIPE_INVALID; @@ -2156,11 +2161,12 @@ rte_sched_port_time_resync(struct rte_sched_port *port) { uint64_t cycles = rte_get_tsc_cycles(); uint64_t cycles_diff = cycles - port->time_cpu_cycles; - double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte; + uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) + / port->cycles_per_byte; /* Advance port time */ port->time_cpu_cycles = cycles; - port->time_cpu_bytes += (uint64_t) bytes_diff; + port->time_cpu_bytes += bytes_diff; if (port->time < port->time_cpu_bytes) { port->time = port->time_cpu_bytes; } -- 2.1.4 -------------------------------------------------------------- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.