[dpdk-dev] [PATCH v6 0/3] app/test: unit test to measure cycles per packet
Hi Thomas, Gentle remind, in case you've too much mails to process. -Liang Cunming > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming > Sent: Wednesday, October 29, 2014 1:06 PM > To: Thomas Monjalon > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 0/3] app/test: unit test to measure cycles > per > packet > > Hi Thomas, > > All the open issues from the former patches are closed. > Could you please have a look and get it applied ? > > -Liang Cunming > > > -Original Message- > > From: Liang, Cunming > > Sent: Monday, October 27, 2014 9:20 AM > > To: dev at dpdk.org > > Cc: nhorman at tuxdriver.com; Ananyev, Konstantin; Richardson, Bruce; De > > Lara > > Guarch, Pablo; Liang, Cunming > > Subject: [PATCH v6 0/3] app/test: unit test to measure cycles per packet > > > > v6 update: > > # leave FUNC_PTR_OR_*_RET unmodified > > > > v5 update: > > # fix the confusing of retval in some API of rte_ethdev > > > > v4 ignore > > > > v3 update: > > # Codes refine according to the feedback. > > 1. add ether_format_addr to rte_ether.h > > 2. fix typo in code comments. > > 3. %lu to %PRIu64, fixing 32-bit targets compilation err > > # merge 2 small incremental patches to the first one. > > The whole unit test as a single patch in [PATCH v3 2/2] > > # rebase code to the latest master > > > > v2 update: > > Rebase code to the latest master branch. > > > > It provides unit test to measure cycles/packet in NIC loopback mode. > > It simply gives the average cycles of IO used per packet without test > > equipment. > > When doing the test, make sure the link is UP. > > > > There's two stream control mode support, one is continues, another is burst. > > The former continues to forward the injected packets until reaching a > > certain > > amount of number. > > The latter one stop when all the injected packets are received. > > In burst stream, now measure two situations, with or without desc. cache > conflict. > > By default, it runs in continues stream mode to measure the whole rxtx. > > > > Usage Example: > > 1. Run unit test app in interactive mode > > app/test -c f -n 4 -- -i > > 2. Set stream control mode, by default is continuous > > set_rxtx_sc [continuous|poll_before_xmit|poll_after_xmit] > > 3. If choose continuous stream, there are another two options can configure > > 3.1 choose rx/tx pair, default is vector > > set_rxtx_mode [vector|scalar|full|hybrid] > > Note: To get acurate scalar fast, plz choose 'vector' or 'hybrid' > > without > > INC_VEC=y in config > > 3.2 choose the area of masurement, default is rxtx > > set_rxtx_anchor [rxtx|rxonly|txonly] > > 4. Run and wait for the result > > pmd_perf_autotest > > > > For who simply just want to see how much cycles cost per packet. > > Compile DPDK, Run 'app/test', and type 'pmd_perf_autotest', that's it. > > Nothing else needs to configure. > > Using other options when you understand and what to measures more. > > > > > > BTW, [1/3] is the same patch as below one. > > http://dpdk.org/dev/patchwork/patch/817 > > > > *** BLURB HERE *** > > > > Cunming Liang (3): > > app/test: allow to create packets in different sizes > > app/test: measure the cost of rx/tx routines by cycle number > > ethdev: fix wrong error return refer to API definition > > > > app/test/Makefile |1 + > > app/test/commands.c | 111 + > > app/test/packet_burst_generator.c | 26 +- > > app/test/packet_burst_generator.h | 11 +- > > app/test/test.h |6 + > > app/test/test_link_bonding.c| 39 +- > > app/test/test_pmd_perf.c| 922 > > +++ > > lib/librte_ether/rte_ethdev.c |6 +- > > lib/librte_ether/rte_ether.h| 25 + > > lib/librte_pmd_ixgbe/ixgbe_ethdev.c |6 + > > 10 files changed, 1117 insertions(+), 36 deletions(-) > > create mode 100644 app/test/test_pmd_perf.c > > > > -- > > 1.7.4.1
[dpdk-dev] [PATCH v6 2/3] app/test: measure the cost of rx/tx routines by cycle number
Hi Thomas, I've split the patch in v7 and also do cleanup by the new API. Thanks. -Liang Cunming > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, November 12, 2014 7:29 AM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 2/3] app/test: measure the cost of rx/tx > routines by cycle number > > Hi Cunming, > > 2014-10-27 09:20, Cunming Liang: > > --- a/lib/librte_ether/rte_ether.h > > +++ b/lib/librte_ether/rte_ether.h > > @@ -45,6 +45,7 @@ extern "C" { > > #endif > > > > #include > > +#include > > > > #include > > #include > > @@ -266,6 +267,30 @@ static inline void ether_addr_copy(const struct > ether_addr *ea_from, > > #endif > > } > > > > +#define ETHER_ADDR_FMT_SIZE 18 > > +/** > > + * Format 48bits Ethernet address in pattern xx:xx:xx:xx:xx:xx. > > + * > > + * @param buf > > + * A pointer to buffer contains the formatted MAC address. > > + * @param size > > + * The format buffer size. > > + * @param ea_to > > + * A pointer to a ether_addr structure. > > + */ > > +static inline void > > +ether_format_addr(char *buf, uint16_t size, > > + const struct ether_addr *eth_addr) > > +{ > > + snprintf(buf, size, "%02X:%02X:%02X:%02X:%02X:%02X", > > +eth_addr->addr_bytes[0], > > +eth_addr->addr_bytes[1], > > +eth_addr->addr_bytes[2], > > +eth_addr->addr_bytes[3], > > +eth_addr->addr_bytes[4], > > +eth_addr->addr_bytes[5]); > > +} > > Please, could you do a separate patch for this new API? > Could it be used in some apps or PMDs? It would be a nice cleanup. > > > --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c > > +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c > > @@ -1600,6 +1600,9 @@ ixgbe_dev_stop(struct rte_eth_dev *dev) > > > > ixgbe_dev_clear_queues(dev); > > > > + /* Clear stored conf */ > > + dev->data->scattered_rx = 0; > > + > > /* Clear recorded link status */ > > memset(&link, 0, sizeof(link)); > > rte_ixgbe_dev_atomic_write_link_status(dev, &link); > > @@ -2888,6 +2891,9 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev) > > */ > > ixgbevf_set_vfta_all(dev,0); > > > > + /* Clear stored conf */ > > + dev->data->scattered_rx = 0; > > + > > ixgbe_dev_clear_queues(dev); > > } > > Please, this patch needs a separate patch with a clear explanation in the log. > > Thanks > -- > Thomas
[dpdk-dev] [PATCH v7 2/7] ixgbe:clean scattered_rx configure in dev_stop
The scattered_rx is update in dev_start. In this unit test, we will re-configure and change the scatter mode. When we stop, re-configure and then re-start, it expect using the new configure. But during re-configure, the stored data may still old. The patch clean the configure anyway in dev_stop. For em, igb and i40e, we haven't provide so much rx/tx pair for switching. > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, November 12, 2014 3:53 PM > To: Liang, Cunming > Cc: dev at dpdk.org; nhorman at tuxdriver.com; Ananyev, Konstantin; > Richardson, > Bruce; De Lara Guarch, Pablo > Subject: Re: [PATCH v7 2/7] ixgbe:clean scattered_rx configure in dev_stop > > Hi Cunming, > > Please, could you provide an explanation for the commit log? > It should answer to the question "what was the issue?" > If it's a fix, the title should start with "fix". > > Maybe that the same kind of fix is needed for em, igb and i40e? > > Thanks > -- > Thomas
[dpdk-dev] [PATCH v7 2/7] ixgbe:clean scattered_rx configure in dev_stop
Maybe pair is not accurate, I means the different rx/tx register function, like: Ixgbe_recv_bulk_alloc/ixgbe_recv_(scattered_)pkts_vec/ixgbe_recv_scattered_pkts ixgbe_xmit_pkts_simple/Ixgbe_xmit_pkts_vec/ixgbe_xmit_pkts -Liang Cunming > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, November 12, 2014 5:25 PM > To: Liang, Cunming > Cc: dev at dpdk.org; nhorman at tuxdriver.com; Ananyev, Konstantin; > Richardson, > Bruce; De Lara Guarch, Pablo > Subject: Re: [PATCH v7 2/7] ixgbe:clean scattered_rx configure in dev_stop > > 2014-11-12 08:21, Liang, Cunming: > > For em, igb and i40e, we haven't provide so much rx/tx pair for switching. > > Sorry, I don't understand. Which pair are you telling about? > em, igb and i40e have scattered Rx functions. > > -- > Thomas
[dpdk-dev] [PATCH v7 2/7] ixgbe:clean scattered_rx configure in dev_stop
scatter/non-scatter always be checked during dev_start. For others, it's only have the two. Won't do additional check during rx/tx_queue_setup(before dev_start). So they won't have problem, I think. But for ixgbe, it will check it meets vector condition or not, then choose the best performance function. As the happens before dev_start, so the old value will impact the condition check. -Liang Cunming > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, November 12, 2014 6:33 PM > To: Liang, Cunming > Cc: dev at dpdk.org; nhorman at tuxdriver.com; Ananyev, Konstantin; > Richardson, > Bruce; De Lara Guarch, Pablo > Subject: Re: [PATCH v7 2/7] ixgbe:clean scattered_rx configure in dev_stop > > 2014-11-12 10:29, Liang, Cunming: > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > 2014-11-12 08:21, Liang, Cunming: > > > > For em, igb and i40e, we haven't provide so much rx/tx pair for > > > > switching. > > > > > > Sorry, I don't understand. Which pair are you telling about? > > > em, igb and i40e have scattered Rx functions. > > > > Maybe pair is not accurate, I means the different rx/tx register function, > > like: > > > Ixgbe_recv_bulk_alloc/ixgbe_recv_(scattered_)pkts_vec/ixgbe_recv_scattered_p > kts > > ixgbe_xmit_pkts_simple/Ixgbe_xmit_pkts_vec/ixgbe_xmit_pkts > > OK that's what I understood. > However, you should check the scatter/non-scatter Rx functions of other > drivers. > I think they need the same fix. > > Thanks > -- > Thomas
[dpdk-dev] [PATCH] i40e: support autoneg or force link speed
Hi, Any plan to merge this patch ? BRs, Steve > -Original Message- > From: Liang, Cunming > Sent: Friday, August 01, 2014 4:44 AM > To: dev at dpdk.org > Cc: Liang, Cunming > Subject: [PATCH] i40e: support autoneg or force link speed > > - i40e force link up/down > - i40e autoneg/force speed > > Signed-off-by: Cunming Liang > Acked-by: Helin Zhang > Acked-by: Chen Jing D(Mark) > Tested-by: Xu HuilongX > --- > app/test-pmd/cmdline.c| 17 +++-- > lib/librte_pmd_i40e/i40e_ethdev.c | 139 > ++ > 2 files changed, 150 insertions(+), 6 deletions(-) > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c > index 345be11..0abc233 100644 > --- a/app/test-pmd/cmdline.c > +++ b/app/test-pmd/cmdline.c > @@ -527,7 +527,8 @@ static void cmd_help_long_parsed(void *parsed_result, > "port close (port_id|all)\n" > "Close all ports or port_id.\n\n" > > - "port config (port_id|all) speed > (10|100|1000|1|auto)" > + "port config (port_id|all)" > + " speed (10|100|1000|1|4|auto)" > " duplex (half|full|auto)\n" > "Set speed and duplex for all ports or port_id\n\n" > > @@ -801,7 +802,9 @@ cmd_config_speed_all_parsed(void *parsed_result, > else if (!strcmp(res->value1, "1000")) > link_speed = ETH_LINK_SPEED_1000; > else if (!strcmp(res->value1, "1")) > - link_speed = ETH_LINK_SPEED_1; > + link_speed = ETH_LINK_SPEED_10G; > + else if (!strcmp(res->value1, "4")) > + link_speed = ETH_LINK_SPEED_40G; > else if (!strcmp(res->value1, "auto")) > link_speed = ETH_LINK_SPEED_AUTONEG; > else { > @@ -839,7 +842,7 @@ cmdline_parse_token_string_t > cmd_config_speed_all_item1 = > TOKEN_STRING_INITIALIZER(struct cmd_config_speed_all, item1, > "speed"); > cmdline_parse_token_string_t cmd_config_speed_all_value1 = > TOKEN_STRING_INITIALIZER(struct cmd_config_speed_all, value1, > - "10#100#1000#1#auto"); > + > "10#100#1000#1#4#auto"); > cmdline_parse_token_string_t cmd_config_speed_all_item2 = > TOKEN_STRING_INITIALIZER(struct cmd_config_speed_all, item2, > "duplex"); > cmdline_parse_token_string_t cmd_config_speed_all_value2 = > @@ -849,7 +852,7 @@ cmdline_parse_token_string_t > cmd_config_speed_all_value2 = > cmdline_parse_inst_t cmd_config_speed_all = { > .f = cmd_config_speed_all_parsed, > .data = NULL, > - .help_str = "port config all speed 10|100|1000|1|auto duplex " > + .help_str = "port config all speed 10|100|1000|1|4|auto duplex > " > "half|full|auto", > .tokens = { > (void *)&cmd_config_speed_all_port, > @@ -901,6 +904,8 @@ cmd_config_speed_specific_parsed(void *parsed_result, > link_speed = ETH_LINK_SPEED_1000; > else if (!strcmp(res->value1, "1")) > link_speed = ETH_LINK_SPEED_1; > + else if (!strcmp(res->value1, "4")) > + link_speed = ETH_LINK_SPEED_40G; > else if (!strcmp(res->value1, "auto")) > link_speed = ETH_LINK_SPEED_AUTONEG; > else { > @@ -939,7 +944,7 @@ cmdline_parse_token_string_t > cmd_config_speed_specific_item1 = > "speed"); > cmdline_parse_token_string_t cmd_config_speed_specific_value1 = > TOKEN_STRING_INITIALIZER(struct cmd_config_speed_specific, value1, > - "10#100#1000#1#auto"); > + > "10#100#1000#1#4#auto"); > cmdline_parse_token_string_t cmd_config_speed_specific_item2 = > TOKEN_STRING_INITIALIZER(struct cmd_config_speed_specific, item2, > "duplex"); > @@ -950,7 +955,7 @@ cmdline_parse_token_string_t > cmd_config_speed_specific_value2 = > cmdline_parse_inst_t cmd_config_speed_specific = { > .f = cmd_config_speed_specific_parsed, > .data = NULL, > - .help_str = "port config X speed 10|100|1000|1|auto duplex " > + .help_str = "port config X speed 10|100|1000|1|4|auto duplex " >
[dpdk-dev] overcommitting CPUs
PMD is combined of 'PM' - a thread model and 'D' - a user space driver. DPDK provides optimized RX and TX in Driver on fast path. DPDK provides a single thread core affinity model to demonstrate the best IO with minimum noisy penalty. They are not tight coupling as Venky said. In some cases, you may only pick up the RX/TX but give up the thread model DPDK provided. Just take care to well handle the penalty may exist in the specific thread model. For DPDK, we do think on it, and start to deal with the negative factor. In another perspective, the more cycles we gain on 'D' side the more we could spend on 'PM' side to cancel the penalty out. Maybe a sample using RX/TX without dead polling is a good start. But cannot expect more on user space wake up latency so far. Regards, Liang Cunming > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Venkatesan, Venky > Sent: Wednesday, August 27, 2014 10:54 PM > To: dev at dpdk.org > Subject: Re: [dpdk-dev] overcommitting CPUs > > DPDK currently isn't exactly poll mode - it has an API that receives and > transmits packets. How you enter that API could be interrupt or polled > -we've left that up to the application to decide, rather than force a > interrupt/NAPI type architecture. I do agree with Alex in that > implementing a interrupt/load driven entry point as an option will make > it usable more widely. There are multiple challenges here - managing the > latency of an interrupt driven scheme in a user-space context, not to > mention very high jitter rates to mention a few. > > That said, overcommitment of CPUs can be achieved in other ways as well. > You could allocate and enforce CPU sharing via cgroups, and allocate x% > of a core to the DPDK pthread. It does introduce a degree of > indeterminism to when the DPDK pthread gets scheduled back in (depending > on how many other threads are running on that core). But it is another > option ... > > Regards, > -Venky > > On 8/27/2014 1:40 AM, Alex Markuze wrote: > > IMHO adding "Interrupt Mode" to dpdk is important as this can open > > DPDK to a larger public of consumers, I can easily imagine someone > > trying to find user space networking solution (And deciding against > > verbs - RDMA) for the obvious reasons and not needing deterministic > > latency. > > > > A few thoughts: > > > > Deterministic Latency: Its a fiction in a sence that this something > > you will be able to see only in a small controlled environment. As > > network latencies in Data Centres(DC) are dominated by switch queuing > > (One good reference is http://fastpass.mit.edu that Vincent shared a > > few days back). > > > > Virtual environments: In virtual environments this is especially > > interesting as the NIC driver(Hypervisor) is working in IRQ mode which > > unless the Interrupts are pinned to different cpus then the VM will > > have a disruptive effect on the VM's performance. Moving to interrupt > > mode mode in paravirtualised environments makes sense as in any > > environment that is not carefully crafted you should not expect any > > deterministic guaranties and would opt for a simpler programming model > > - like interrupt mode. > > > > NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message > > rate resulting in 1:1 napi_poll callback to IRQ ratio this is true > > even with small packets. In some cases where the CPU is working slower > > - for example when intel_iommu=on,strict is set , you can actually see > > a performance inversion where the "slower" CPU can reach higher B/W > > because the slowdown makes NAPI work with the kernel effectively > > moving to polling mode. > > > > I think that a smarter DPDK-NAPI is important, but it is a next step > > IFF the interrupt mode is adopted. > > > > On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N > > wrote: > >> You're right and I've felt the same harder part of determinism with other > hypervisors' soft switch solutions as well. I think it's worth thinking about. > >> > >> Thanks, > >> Rashmin > >> > >> On Aug 26, 2014 9:15 PM, Stephen Hemminger > wrote: > >> The way to handle switch between out of poll mode is to use IRQ coalescing > >> parameters. > >> You want to hold off IRQ until there are a couple packets or a short delay. > >> Going out of poll mode > >> is harder to determine. > >> > >> > >> On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny > wrote: > >> > >>>> -Original Messag
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
Thanks Mirek. That's a good point which wasn't mentioned in cover letter. For 'rte_timer', I only expect it be used within the 'legacy per-lcore' pthread. I'm appreciate if you can give me some cases which can't use it to fit. In case have to use 'rte_timer' in multi-pthread, there are some prerequisites and limitations. 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do pthread init by rte_pthread_prepare) 2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in multi-pthread, make sure they're not on the same core. -Cunming > -Original Message- > From: Walukiewicz, Miroslaw > Sent: Thursday, December 11, 2014 5:57 PM > To: Liang, Cunming; dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > Thank you Cunming for explanation. > > What about DPDK timers? They also depend on rte_lcore_id() to avoid spinlocks. > > Mirek > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > > Sent: Thursday, December 11, 2014 3:05 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > > > Scope & Usage Scenario > > > > > > DPDK usually pin pthread per core to avoid task switch overhead. It gains > > performance a lot, but it's not efficient in all cases. In some cases, it > > may > > too expensive to use the whole core for a lightweight workload. It's a > > reasonable demand to have multiple threads per core and each threads > > share CPU > > in an assigned weight. > > > > In fact, nothing avoid user to create normal pthread and using cgroup to > > control the CPU share. One of the purpose for the patchset is to clean the > > gaps of using more DPDK libraries in the normal pthread. In addition, it > > demonstrates performance gain by proactive 'yield' when doing idle loop > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life. > > > > > > Changes to DPDK libraries > > == > > > > Some of DPDK libraries must run in DPDK environment. > > > > # rte_mempool > > > > In rte_mempool doc, it mentions a thread not created by EAL must not use > > mempools. The root cause is it uses a per-lcore cache inside mempool. > > And 'rte_lcore_id()' will not return a correct value. > > > > The patchset changes this a little. The index of mempool cache won't be a > > lcore_id. Instead of it, using a linear number generated by the allocator. > > For those legacy EAL per-lcore thread, it apply for an unique linear id > > during creation. For those normal pthread expecting to use rte_mempool, it > > requires to apply for a linear id explicitly. Now the mempool cache looks > > like > > a per-thread base. The linear ID actually identify for the linear thread id. > > > > However, there's another problem. The rte_mempool is not preemptable. > > The > > problem comes from rte_ring, so talk together in next section. > > > > # rte_ring > > > > rte_ring supports multi-producer enqueue and multi-consumer dequeue. > > But it's > > not preemptable. There's conversation talking about this before. > > http://dpdk.org/ml/archives/dev/2013-November/000714.html > > > > Let's say there's two pthreads running on the same core doing enqueue on > > the > > same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it > > has > > already modified the prod.head, the 2nd pthread will spin until the 1st one > > scheduled agian. It causes time wasting. In addition, if the 2nd pthread has > > absolutely higer priority, it's more terrible. > > > > But it doesn't means we can't use. Just need to narrow down the situation > > when > > it's used by multi-pthread on the same core. > > - It CAN be used for any single-producer or single-consumer situation. > > - It MAY be used by multi-producer/consumer pthread whose scheduling > > policy > > are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty > > befor > > using it. > > - It MUST not be used by multi-producer/consumer pthread, while some of > > their > > scheduling policies is SCHED_FIFO or SCHED_RR. > > > > > > Performance > > == > > > > It loses performance by introducing task switching. On packet IO > > perspective, >
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
Hi Mirek, That sounds great. Looking forward to it. -Cunming > -Original Message- > From: Walukiewicz, Miroslaw > Sent: Monday, December 15, 2014 7:11 PM > To: Liang, Cunming; dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > Hi Cunming, > > The timers could be used by any application/library started as a standard > pthread. > Each pthread needs to have assigned some identifier same way as you are doing > it for mempools (the rte_linear_thread_id and rte_lcore_id are good examples) > > I made series of patches extending the rte timers API to use with such kind of > identifier keeping existing API working also. > > I will send it soon. > > Mirek > > > > -Original Message- > > From: Liang, Cunming > > Sent: Friday, December 12, 2014 6:45 AM > > To: Walukiewicz, Miroslaw; dev at dpdk.org > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > Thanks Mirek. That's a good point which wasn't mentioned in cover letter. > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore' > > pthread. > > I'm appreciate if you can give me some cases which can't use it to fit. > > In case have to use 'rte_timer' in multi-pthread, there are some > > prerequisites and limitations. > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do > > pthread > > init by rte_pthread_prepare) > > 2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in > > multi-pthread, make sure they're not on the same core. > > > > -Cunming > > > > > -Original Message- > > > From: Walukiewicz, Miroslaw > > > Sent: Thursday, December 11, 2014 5:57 PM > > > To: Liang, Cunming; dev at dpdk.org > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > > > Thank you Cunming for explanation. > > > > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid > > spinlocks. > > > > > > Mirek > > > > > > > -Original Message- > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > > > > Sent: Thursday, December 11, 2014 3:05 AM > > > > To: dev at dpdk.org > > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > > > > > > > > > Scope & Usage Scenario > > > > > > > > > > > > DPDK usually pin pthread per core to avoid task switch overhead. It > > > > gains > > > > performance a lot, but it's not efficient in all cases. In some cases, > > > > it may > > > > too expensive to use the whole core for a lightweight workload. It's a > > > > reasonable demand to have multiple threads per core and each threads > > > > share CPU > > > > in an assigned weight. > > > > > > > > In fact, nothing avoid user to create normal pthread and using cgroup to > > > > control the CPU share. One of the purpose for the patchset is to clean > > > > the > > > > gaps of using more DPDK libraries in the normal pthread. In addition, it > > > > demonstrates performance gain by proactive 'yield' when doing idle loop > > > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy > > > > life. > > > > > > > > > > > > Changes to DPDK libraries > > > > == > > > > > > > > Some of DPDK libraries must run in DPDK environment. > > > > > > > > # rte_mempool > > > > > > > > In rte_mempool doc, it mentions a thread not created by EAL must not > > use > > > > mempools. The root cause is it uses a per-lcore cache inside mempool. > > > > And 'rte_lcore_id()' will not return a correct value. > > > > > > > > The patchset changes this a little. The index of mempool cache won't be > > > > a > > > > lcore_id. Instead of it, using a linear number generated by the > > > > allocator. > > > > For those legacy EAL per-lcore thread, it apply for an unique linear id > > > > during creation. For those normal pthread expecting to use > > rte_mempool, it > > > > requires to apply for a linear id explicitly. Now the mempool cache &g
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> -Original Message- > From: Walukiewicz, Miroslaw > Sent: Thursday, December 18, 2014 8:20 PM > To: Liang, Cunming; dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > I have another question regarding your patch. > > Could we extend values returned by rte_lcore_id() to set them per thread > (really > the DPDK lcore is a pthread but started on specific core) instead of creating > linear > thread id. [Liang, Cunming] As you said, __lcore_id is already per thread. Per the semantic meaning, it stands for logic cpu id. When multi-thread running on the same lcore, they should get the same value return by rte_lcore_id(). The same effective like 'schedu_getcpu()', but less using cost. > > The patch would be much simpler and will work same way. The only change > would be extending rte_lcore_id when rte_pthread_create() is called. [Liang, Cunming] I ever think about it which using rte_lcore_id() to get unique id per pthread rather than have a new API. But the name lcore actually no longer identify for cpu id. It may impact all existing user application who use the exact meaning of it. How do you think ? > > The value __lcore_id has really an attribute __thread that means it is valid > not > only per CPU core but also per thread. > > The mempools, timers, statistics would work without any modifications in that > environment. > > I do not see any reason why old legacy DPDK applications would not work in > that > model. > > Mirek > > > -Original Message- > > From: Liang, Cunming > > Sent: Monday, December 15, 2014 12:53 PM > > To: Walukiewicz, Miroslaw; dev at dpdk.org > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > Hi Mirek, > > > > That sounds great. > > Looking forward to it. > > > > -Cunming > > > > > -Original Message- > > > From: Walukiewicz, Miroslaw > > > Sent: Monday, December 15, 2014 7:11 PM > > > To: Liang, Cunming; dev at dpdk.org > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > > > Hi Cunming, > > > > > > The timers could be used by any application/library started as a standard > > > pthread. > > > Each pthread needs to have assigned some identifier same way as you are > > doing > > > it for mempools (the rte_linear_thread_id and rte_lcore_id are good > > examples) > > > > > > I made series of patches extending the rte timers API to use with such > > > kind > > of > > > identifier keeping existing API working also. > > > > > > I will send it soon. > > > > > > Mirek > > > > > > > > > > -Original Message- > > > > From: Liang, Cunming > > > > Sent: Friday, December 12, 2014 6:45 AM > > > > To: Walukiewicz, Miroslaw; dev at dpdk.org > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > > > > > Thanks Mirek. That's a good point which wasn't mentioned in cover > > letter. > > > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore' > > pthread. > > > > I'm appreciate if you can give me some cases which can't use it to fit. > > > > In case have to use 'rte_timer' in multi-pthread, there are some > > > > prerequisites and limitations. > > > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do > > pthread > > > > init by rte_pthread_prepare) > > > > 2. As 'rte_timer' is not preemptable, when using > > rte_timer_manager/reset in > > > > multi-pthread, make sure they're not on the same core. > > > > > > > > -Cunming > > > > > > > > > -Original Message- > > > > > From: Walukiewicz, Miroslaw > > > > > Sent: Thursday, December 11, 2014 5:57 PM > > > > > To: Liang, Cunming; dev at dpdk.org > > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per > > lcore > > > > > > > > > > Thank you Cunming for explanation. > > > > > > > > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid > > > > spinlocks. > > > > > > > > > > Mirek > > > > > > > > > > > -Original Message- > > > > > > From: dev [mailto:dev-bounc
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
... > I'm conflicted on this one. However, I think far more applications would be > broken > to start having to use thread_id in place of an lcore_id than would be broken > by having the lcore_id no longer actually correspond to a core. > I'm actually struggling to come up with a large number of scenarios where it's > important to an app to determine the cpu it's running on, compared to the > large > number of cases where you need to have a data-structure per thread. In DPDK > libs > alone, you see this assumption that lcore_id == thread_id a large number of > times. > > Despite the slight logical inconsistency, I think it's better to avoid > introducing > a thread-id and continue having lcore_id representing a unique thread. > > /Bruce Ok, I understand it. I list the implicit meaning if using lcore_id representing the unique thread. 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical core id. 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique id for thread. 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used only in CASE 1) 4). rte_lcore_id() can be used in CASE 2), but the return value no matter represent a logical core id. If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this conclusion. /Cunming
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> -Original Message- > From: Walukiewicz, Miroslaw > Sent: Monday, December 22, 2014 6:02 PM > To: Richardson, Bruce; Liang, Cunming > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > -Original Message- > > From: Richardson, Bruce > > Sent: Monday, December 22, 2014 10:46 AM > > To: Liang, Cunming > > Cc: Walukiewicz, Miroslaw; dev at dpdk.org > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote: > > > ... > > > > I'm conflicted on this one. However, I think far more applications would > > be > > > > broken > > > > to start having to use thread_id in place of an lcore_id than would be > > broken > > > > by having the lcore_id no longer actually correspond to a core. > > > > I'm actually struggling to come up with a large number of scenarios > > > > where > > it's > > > > important to an app to determine the cpu it's running on, compared to > > the large > > > > number of cases where you need to have a data-structure per thread. In > > DPDK > > > > libs > > > > alone, you see this assumption that lcore_id == thread_id a large number > > of > > > > times. > > > > > > > > Despite the slight logical inconsistency, I think it's better to avoid > > introducing > > > > a thread-id and continue having lcore_id representing a unique thread. > > > > > > > > /Bruce > > > > > > Ok, I understand it. > > > I list the implicit meaning if using lcore_id representing the unique > > > thread. > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical > > core id. > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an > > unique id for thread. > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used > > only in CASE 1) > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter > > represent a logical core id. > > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on > > > this > > conclusion. > > > > > > /Cunming > > > > Sorry, I don't like that suggestion either, as having lcore_id values > > greater > > than RTE_MAX_LCORE is terrible, as how will people know how to dimension > > arrays > > to be indexes by lcore id? [Liang, Cunming] For dimension array, we shall have RTE_MAX_THREAD_ID. Lcore id no longer means logical core, so why still use RTE_MAX_LCORE as the dimension ? In my previous mind, I don't expect to change lcore_config. RTE_MAX_LCORE is only used to identify the legal id for logical core. So there's no any change when id < RTE_MAX_LCORE, while id > RTE_MAX_LCORE cause fail in lcore API. >> Given the choice, if we are not going to just use > > lcore_id as a generic thread id, which is always between 0 and > > RTE_MAX_LCORE > > we can look to define a new thread_id variable to hold that. However, it > > should > > have a bounded range. [Liang, Cunming] Agree, if we merge lcore id with linear thread id, anyway we require RTE_MAX_THREAD_ID. > > From an ease-of-porting perspective, I still think that the simplest option > > is to > > use the existing lcore_id and accept the fact that it's now a thread id > > rather > > than an actual physical lcore. [Liang, Cunming] Not sure do you means propose to extend lcore_config as a per thread context instead of per lcore ? If accepts the fact lcore_id is now a thread id, how to make decision the physical lcore is in core mask or not ? Question is, is would that cause us lots of issues > > in the future? [Liang, Cunming] Personally I don't like this way that lcore id sometimes stand for logical core id, sometimes stand for thread id. The benefit of it looks like avoid trivial change. Actually will change the meaning of API and implement. What I propose linear thread id is new, but we can control and estimate such limited change where it happens. > > > I would prefer keeping the RTE_MAX_LCORES as Bruce suggests and > determine the HW core on base of following condition if we really have to know > this. > > int num_cores_online = count of cores encountered in the core mask provided by > cmdline parameter [Liang, Cunming] In this way, if we have core mask 0xf0. num_cores_online will be 4. rte_lcore_id() value for logical core will be 0, 1, 2, 3, which is no longer 4,5,6,7. That's probably all right if trying to give up the origin meaning of lcore_id, and change to identify a unique thread id. But I don't think having a dynamic num_cores_online is a good idea. If in one day, we plan to support lcore hot plug, the num_cores_online will change in the fly. It's bad to get the id which already occupied by some thread. > > Rte_lcore_id() < num_cores_online -> physical core (pthread first started on > the > core) > > Rte_lcore_id() >= num_cores_online -> pthread created by rte_pthread_create > > Mirek > > > /Bruce
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Tuesday, December 23, 2014 2:29 AM > To: Richardson, Bruce > Cc: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > On Mon, 22 Dec 2014 09:46:03 + > Bruce Richardson wrote: > > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote: > > > ... > > > > I'm conflicted on this one. However, I think far more applications > > > > would be > > > > broken > > > > to start having to use thread_id in place of an lcore_id than would be > broken > > > > by having the lcore_id no longer actually correspond to a core. > > > > I'm actually struggling to come up with a large number of scenarios > > > > where > it's > > > > important to an app to determine the cpu it's running on, compared to > > > > the > large > > > > number of cases where you need to have a data-structure per thread. In > DPDK > > > > libs > > > > alone, you see this assumption that lcore_id == thread_id a large number > of > > > > times. > > > > > > > > Despite the slight logical inconsistency, I think it's better to avoid > introducing > > > > a thread-id and continue having lcore_id representing a unique thread. > > > > > > > > /Bruce > > > > > > Ok, I understand it. > > > I list the implicit meaning if using lcore_id representing the unique > > > thread. > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical > core id. > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique > id for thread. > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used > > > only > in CASE 1) > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter > represent a logical core id. > > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on > > > this > conclusion. > > > > > > /Cunming > > > > Sorry, I don't like that suggestion either, as having lcore_id values > > greater > > than RTE_MAX_LCORE is terrible, as how will people know how to dimension > arrays > > to be indexes by lcore id? Given the choice, if we are not going to just use > > lcore_id as a generic thread id, which is always between 0 and > RTE_MAX_LCORE > > we can look to define a new thread_id variable to hold that. However, it > > should > > have a bounded range. > > From an ease-of-porting perspective, I still think that the simplest option > > is to > > use the existing lcore_id and accept the fact that it's now a thread id > > rather > > than an actual physical lcore. Question is, is would that cause us lots of > > issues > > in the future? > > > > /Bruce > > The current rte_lcore_id() has different meaning the thread. Your proposal > will > break code that uses lcore_id to do per-cpu statistics and the lcore_config > code in the samples. > q [Liang, Cunming] +1.
[dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable
Thanks Konstantin, it makes sense. > -Original Message- > From: Ananyev, Konstantin > Sent: Tuesday, December 23, 2014 3:02 AM > To: Liang, Cunming; dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as > pthread-local > variable > > Hi Steve, > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > > Sent: Thursday, December 11, 2014 2:05 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as > > pthread-local > variable > > > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_eal/common/include/rte_eal.h | 5 ++ > > lib/librte_eal/common/include/rte_lcore.h | 12 > > lib/librte_eal/linuxapp/eal/eal_thread.c | 115 > -- > > 3 files changed, 126 insertions(+), 6 deletions(-) > > > > diff --git a/lib/librte_eal/common/include/rte_eal.h > b/lib/librte_eal/common/include/rte_eal.h > > index f4ecd2e..2640167 100644 > > --- a/lib/librte_eal/common/include/rte_eal.h > > +++ b/lib/librte_eal/common/include/rte_eal.h > > @@ -262,6 +262,11 @@ rte_set_application_usage_hook( rte_usage_hook_t > usage_func ); > > */ > > int rte_eal_has_hugepages(void); > > > > +#ifndef RTE_MAX_THREAD > > +#define RTE_MAX_THREADRTE_MAX_LCORE > > +#endif > > + > > + > > #ifdef __cplusplus > > } > > #endif > > diff --git a/lib/librte_eal/common/include/rte_lcore.h > b/lib/librte_eal/common/include/rte_lcore.h > > index 49b2c03..cd83d47 100644 > > --- a/lib/librte_eal/common/include/rte_lcore.h > > +++ b/lib/librte_eal/common/include/rte_lcore.h > > @@ -73,6 +73,7 @@ struct lcore_config { > > extern struct lcore_config lcore_config[RTE_MAX_LCORE]; > > > > RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */ > > +RTE_DECLARE_PER_LCORE(unsigned, _thread_id); /**< Per thread "linear tid". > */ > > > > /** > > * Return the ID of the execution unit we are running on. > > @@ -86,6 +87,17 @@ rte_lcore_id(void) > > } > > > > /** > > + * Return the linear thread ID of the cache unit we are running on. > > + * @return > > + * core ID > > + */ > > +static inline unsigned long > > +rte_linear_thread_id(void) > > +{ > > + return RTE_PER_LCORE(_thread_id); > > +} > > + > > +/** > > * Get the id of the master lcore > > * > > * @return > > diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c > b/lib/librte_eal/linuxapp/eal/eal_thread.c > > index 80a985f..52478d6 100644 > > --- a/lib/librte_eal/linuxapp/eal/eal_thread.c > > +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c > > @@ -39,6 +39,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -51,12 +52,19 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > #include "eal_private.h" > > #include "eal_thread.h" > > > > +#define LINEAR_THREAD_ID_POOL"THREAD_ID_POOL" > > + > > RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); > > > > +/* define linear thread id as thread-local variables */ > > +RTE_DEFINE_PER_LCORE(unsigned, _thread_id); > > + > > /* > > * Send a message to a slave lcore identified by slave_id to call a > > * function f with argument arg. Once the execution is done, the > > @@ -94,12 +102,13 @@ rte_eal_remote_launch(int (*f)(void *), void *arg, > unsigned slave_id) > > return 0; > > } > > > > + > > /* set affinity for current thread */ > > static int > > -eal_thread_set_affinity(void) > > +__eal_thread_set_affinity(pthread_t thread, unsigned lcore) > > { > > + > > int s; > > - pthread_t thread; > > > > /* > > * According to the section VERSIONS of the CPU_ALLOC man page: > > @@ -126,9 +135,8 @@ eal_thread_set_affinity(void) > > > > size = CPU_ALLOC_SIZE(RTE_MAX_LCORE); > > CPU_ZERO_S(size, cpusetp); > > - CPU_SET_S(rte_lcore_id(), size, cpusetp); > > + CPU_SET_S(lcore, size, cpusetp); > > > > - thread = pthread_self(); > > s = pthread_setaffinity_np(thread, size, cpusetp); > > if (s != 0) { > > RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n"); > > @@ -140,9 +148,8 @@ e
[dpdk-dev] [PATCH] i40e: fix no effect wait_to_complete on link_get
Hi Thomas, > What is the relation between link status timeout and qos_sched? [LCM] Validation team found qos_sched test failure on i40e. The sample depends on link speed to calc the percentage. The root cause comes from that i40e link_get hasn't support wait_to_complete well. I agree with you it should add more description in test report why 'Used QoS example to verified'. > > +---+--+ > > | Subport output rate | Subport output rate | > > | (% line rate) | (Mpps) | > > +---+---+--+---+ > > | Expected | Actual| Expected | Actual| > > +---+---+--+---+ > > This table is empty. [LCM] It's useless, should be omitted I think. Cunming > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, April 02, 2015 3:52 AM > To: Zhang, XiaonanX; Cao, Waterman > Cc: dev at dpdk.org; Zhang, Helin; Liang, Cunming > Subject: Re: [dpdk-dev] [PATCH] i40e: fix no effect wait_to_complete on > link_get > > Hi, > > 2015-04-01 06:10, Zhang, XiaonanX: > > > > Tested-by: Xiaonan zhang > > > > - OS: Fedora21 3.19.1-201.fc21.x86_64 > > - GCC: gcc version 4.9.1 20140930 (Red Hat 4.9.1-11) (GCC) > > - CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz > > - NIC: Ethernet controller [0200]: Intel Corporation Ethernet Controller > > X710 for > 10GbE SFP+ [8086:1572] (rev 01) > > - Default x86_64-native-linuxapp-gcc configuration > > - Total 1 cases, 1 passed, 0 failed > > > > - Test case: Used Qos example to verified > > - > > What is the relation between link status timeout and qos_sched? > > > Traffic shaping for subport. Check that the subport rate is enforced. > > > > Set the subport output rate to x% of line rate (x = 10 .. 100). Set the > > subport TC > limits high (100% line rate each), so they do not constitute limitations. > Input traffic > is 100% line rate. > > > > Different tb period and tb credits, therefore different output rate, are > > tried: > 25%, 50%, 75%, 90% and 100% the lineal rate. (The output for subport is Tb > credits > per period / Tb period.) > > The traffic is injected change subport value random. > > > > Other parameters are same before tests and they don't change here. > > > > Cmdline: ./examples/qos_sched/build/qos_sched -c 0xe -n 4 -- --pfc > "0,1,2,3,3" --cfg "/root/profile_sched_pipe_1.cfg" > > > > The result is this table: > > > > > > +---+--+ > > | Subport output rate | Subport output rate | > > | (% line rate) | (Mpps) | > > +---+---+--+---+ > > | Expected | Actual| Expected | Actual| > > +---+---+--+---+ > > This table is empty. > > > > > Signed-off-by: Xiaonan Zhang > > It seems that this test report is not relevant. > It will be ignored in the commit message. Sorry
[dpdk-dev] [PATCH] eal/linux: fix negative value for undetermined numa_node
Hi, On 8/1/2015 11:56 AM, Matthew Hall wrote: > I asked about this many months ago and was informed that "-1" is a "standard > error value" that I should expect from these APIs when NUMA is not present. > Now we're saying I have to change my code again to handle a zero value? > > Also not sure how to tell the difference between no NUMA, something running on > socket zero, and something with multiple sockets. Seems like we need a bit of > thought about how the NUMA APIs should behave overall. > > Matthew. > > On Fri, Jul 31, 2015 at 09:36:12AM +0800, Cunming Liang wrote: >> The patch sets zero as the default value of pci device numa_node >> if the socket could not be determined. >> It provides the same default value as FreeBSD which has no NUMA support, >> and makes the return value of rte_eth_dev_socket_id() be consistent >> with the API description. >> >> Signed-off-by: Cunming Liang >> >> /* * Return the NUMA socket to which an Ethernet device is connected * * @param port_id * The port identifier of the Ethernet device * @return * The NUMA socket id to which the Ethernet device is connected or * a default of zero if the socket could not be determined. * -1 is returned is the port_id value is out of range. */ extern int rte_eth_dev_socket_id(uint8_t port_id); According to the API definition, if the socket could not be determined, a default of zero will take. The '-1' is returned when the port_id value is out of range. To your concern, "difference between no NUMA, something running on socket zero, and something with multiple sockets.". The latter two belongs to the same situation, that is the numa_node stores the NUMA id. So in fact the concern is about using '-1' or '0' when there's no NUMA detect. If we won't plan to redefine the API return value, the fix patch is reasonable. Btw, if it returns '-1' when no NUMA is detected, what will you do, do condition check '-1' and then use node 0 instead ? In that way, you can't distinguish '-'1 is out of range port_id error or no NUMA detection error. If it is, why not follow the API definition. /Steve
[dpdk-dev] [PATCH v1] ixgbe: remove vector pmd burst size restriction
Hi, [...] > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > > index 3f808b3..dbdb761 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > @@ -4008,7 +4008,8 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev) > > */ > > } else if (adapter->rx_vec_allowed) { > > PMD_INIT_LOG(DEBUG, "Vector rx enabled, please make sure > RX " > > - "burst size no less than 32."); > > + "burst size no less than 4 (port=%d).", > > +dev->data->port_id); > > I think it would be better to use RTE_IXGBE_DESCS_PER_LOOP instead of a > constant 4. > > [...] > > > > /* > > - * vPMD receive routine, now only accept (nb_pkts == > RTE_IXGBE_VPMD_RX_BURST) > > - * in one loop > > + * vPMD raw receive routine > I would keep some warning there, like "(if nb_pkts < > RTE_IXGBE_DESCS_PER_LOOP, won't receive anything)" > > >* > >* Notice: > > - * - nb_pkts < RTE_IXGBE_VPMD_RX_BURST, just return no packet > > - * - nb_pkts > RTE_IXGBE_VPMD_RX_BURST, only scan > RTE_IXGBE_VPMD_RX_BURST > > - * numbers of DD bit > > + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two > > + * - 'nb_pkts < 4' causes 0 packet receiving > Again, RTE_IXGBE_DESCS_PER_LOOP would be better than 4 > [...] > > uint16_t > > ixgbe_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, > > uint16_t nb_pkts) > > { > > struct ixgbe_rx_queue *rxq = rx_queue; > > - uint8_t split_flags[RTE_IXGBE_VPMD_RX_BURST] = {0}; > > + uint8_t split_flags[nb_pkts]; > > + > > + memset(split_flags, 0, nb_pkts); > > > > /* get some new buffers */ > > uint16_t nb_bufs = _recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts, > > After this _recv_raw_pkts_vec it checks 32 bytes in split_flags (4x8 > bytes), that can overrun or miss some flags. > Btw. Bruce just fixed that part in "ixgbe: fix check for split packets" > > Thanks for all these valuable comments, will keep the max burst size 32.
Re: [dpdk-dev] [RFC 1/2] doc: introduction to prgdev
Hi, On 2/1/2017 7:41 PM, Jan Blunck wrote: On Fri, Jan 20, 2017 at 4:21 AM, Chen Jing D(Mark) wrote: This is the documentation to describe what prgdev is, how to use prgdev API and accomplish an image download. Signed-off-by: Chen Jing D(Mark) --- doc/guides/prog_guide/prgdev_lib.rst | 457 ++ 1 files changed, 457 insertions(+), 0 deletions(-) create mode 100644 doc/guides/prog_guide/prgdev_lib.rst [...] From my point of view this doesn't belong into the DPDK. On Linux this is traditionally handled by udev and you already have the freedom to use userspace applications to program a device requiring firmware in that case. I don't believe that modeling this in the DPDK explicitly is the right thing to do. Good point, but not sure udev has user space device driver support or not. Especially if the device supports changing personality it is required to unplug the existing personality before reprogramming. You can do this already today. Also writing OOB firmware data that changes configuration should be possible today by handling interrupts. It's going to allow changing personality in DPDK user space runtime. If the personality is not belong to a device but part of the component, unplug isn't helpful too much. Maybe we can come up with an example application that demonstrates how the different infrastructure components could get orchestrated. Do you have a device in mind that supports this? The coming Purley platform has SKU for Xeon-FPGA. The FPGA connecting with Xeon has dedicated pcie device id. The AFU personality for packet I/O depends on the RTL image. Changing the personality in runtime could be one of the situation. Regards, Cunming Regards, Jan [...]
Re: [dpdk-dev] [RFC 1/2] doc: introduction to prgdev
On 1/20/2017 11:21 AM, Chen Jing D(Mark) wrote: This is the documentation to describe what prgdev is, how to use prgdev API and accomplish an image download. Signed-off-by: Chen Jing D(Mark) --- doc/guides/prog_guide/prgdev_lib.rst | 457 ++ 1 files changed, 457 insertions(+), 0 deletions(-) create mode 100644 doc/guides/prog_guide/prgdev_lib.rst diff --git a/doc/guides/prog_guide/prgdev_lib.rst b/doc/guides/prog_guide/prgdev_lib.rst new file mode 100644 index 000..3917c18 --- /dev/null +++ b/doc/guides/prog_guide/prgdev_lib.rst @@ -0,0 +1,457 @@ +Overview + + [...] +When the set of APIs is introduced, a general question is why we need it in +DPDK community? Why we can't use offline tool to perform same thing? The answer +is the prgdev provide a generic, online API to applications, in the meanwile, +offers a capability to switch driver dynamically when downloaded image changed +personality and a new driver is required. Comparing offline tool, it have online +programmability (see below examples); Comparing online tool, it provides a +generic API for many HWs; Comparing generic online tool/API for many products, +it provides a capability to switch driver dynamically. + +There are various means to reach same goal, we'll try to find the best/better +way to approach. All above advantages will help prgdev to be a 'better choice'. + One more notes. DPDK takes over the devices in user space. The legacy tools usually download the personality by kernel driver. They runs out of DPDK context. When a DPDK process is running on top of the device, operation to the device by the solo tool during the time causes resource conflict. It's one of the motivations to have the native API allowing programming the personality within DPDK context. Otherwise, it has to exit the process if any personality change is required. Manually detaching the device before using the solo tool may ease the conflict. However it still limits the situation if the device allows multiple programmable instances(e.g. AFUs in FPGA) which are working independently and shouldn't be impact by each other. [...]
[dpdk-dev] [PATCH v1] ixgbe: remove vector pmd burst size restriction
Hi, [...] > > > uint16_t > > > ixgbe_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, > > > uint16_t nb_pkts) > > > { > > > struct ixgbe_rx_queue *rxq = rx_queue; > > > - uint8_t split_flags[RTE_IXGBE_VPMD_RX_BURST] = {0}; > > > + uint8_t split_flags[nb_pkts]; > > > + > > > + memset(split_flags, 0, nb_pkts); > > > > > > /* get some new buffers */ > > > uint16_t nb_bufs = _recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts, > > > > After this _recv_raw_pkts_vec it checks 32 bytes in split_flags (4x8 > > bytes), that can overrun or miss some flags. > > Btw. Bruce just fixed that part in "ixgbe: fix check for split packets" > > Ah yes, missed that when reviewing, that code would be broken if nb_bufs > 32: > > const uint64_t *split_fl64 = (uint64_t *)split_flags; > if (rxq->pkt_first_seg == NULL && > split_fl64[0] == 0 && split_fl64[1] == 0 && > split_fl64[2] == 0 && split_fl64[3] == 0) > return nb_bufs; > > right? We can either rollback and only allow 'nb_pkts<=32', or do some broken fix as below diff. By the result of performance test (4*10GE 64B burst_size(32) iofwd by scattered_pkts_vec), there's no drop. But I'm not sure it is important or not to allow burst size larger than 32. Your comments will be important. diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c index e94c68b..8f34236 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c @@ -537,26 +537,35 @@ uint16_t ixgbe_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) { +#define NB_SPLIT_ELEM (8) struct ixgbe_rx_queue *rxq = rx_queue; uint8_t split_flags[nb_pkts]; + uint32_t i, nb_scan; + uint16_t nb_bufs; + uint64_t *split_fl64 = (uint64_t *)split_flags; memset(split_flags, 0, nb_pkts); /* get some new buffers */ - uint16_t nb_bufs = _recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts, - split_flags); + nb_bufs = _recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts, +split_flags); if (nb_bufs == 0) return 0; /* happy day case, full burst + no packets to be joined */ - const uint64_t *split_fl64 = (uint64_t *)split_flags; - if (rxq->pkt_first_seg == NULL && - split_fl64[0] == 0 && split_fl64[1] == 0 && - split_fl64[2] == 0 && split_fl64[3] == 0) + nb_scan = RTE_ALIGN(nb_bufs, NB_SPLIT_ELEM); + if (rxq->pkt_first_seg == NULL) { + for (i = 0; i < nb_scan; +i += NB_SPLIT_ELEM, split_fl64++) { + if (*split_fl64 != 0) + goto reassemble; + } return nb_bufs; + } +reassemble: /* reassemble any packets that need reassembly*/ - unsigned i = 0; + i = 0; if (rxq->pkt_first_seg == NULL) { /* find the first split flag, and only reassemble then*/ while (i < nb_bufs && !split_flags[i]) /Steve > > Another thing, that I just thought about: > Right now we invoke ixgbe_rxq_rearm() only at the start of > _recv_raw_pkts_vec(). > Before it was ok, as _recv_raw_pkts_vec() would never try to read more then 32 > RXDs. > But what would happen if nb_pkts > rxq->nb_desc and rxq->rxrearm_nb == 0? > I suppose, _recv_raw_pkts_vec() can wrpa around RXD ring and 'receive' same > packet twice? > So we probably better still limit nb_pkts <= 32 at _recv_raw_pkts_vec(). The _recv_raw_pkts_vec() won't wrap around RXD ring. When it reaches the last one, the DD bit of padding desc. always 0. So in the case nb_pkts > rxq->nb_desc, the '_recv_raw_pkts_vec()' can only get no more than 'rxq->nb_desc' packets. > > Konstantin > > >
[dpdk-dev] [PATCH v1] ixgbe: remove vector pmd burst size restriction
Hi, [...] > > Another thing, that I just thought about: > > Right now we invoke ixgbe_rxq_rearm() only at the start of > > _recv_raw_pkts_vec(). > > Before it was ok, as _recv_raw_pkts_vec() would never try to read more then > 32 > > RXDs. > > But what would happen if nb_pkts > rxq->nb_desc and rxq->rxrearm_nb == 0? > > I suppose, _recv_raw_pkts_vec() can wrpa around RXD ring and 'receive' > same > > packet twice? > > So we probably better still limit nb_pkts <= 32 at _recv_raw_pkts_vec(). > > The _recv_raw_pkts_vec() won't wrap around RXD ring. When it reaches the last > one, the DD bit of padding desc. always 0. > So in the case nb_pkts > rxq->nb_desc, the '_recv_raw_pkts_vec()' can only get > no more than 'rxq->nb_desc' packets. > I think the violation is true when rx_id in some middle position of desc_ring, and nb_pkts > rxq->nb_desc. The DD checking may exceed the boundary (access the entry whose DD is set and waiting for rearm). So I agree to keep the max burst size as 32. /Steve
[dpdk-dev] [PATCH v2] ixgbe: remove vector pmd burst size restriction
Hi Zoltan, > > } else if (adapter->rx_vec_allowed) { > > PMD_INIT_LOG(DEBUG, "Vector rx enabled, please make sure RX " > > - "burst size no less than 32."); > > + "burst size no less than " > > + "RTE_IXGBE_DESCS_PER_LOOP(=4) (port=%d).", > > I think you should write in this line: > > "%d (port=%d)", RTE_IXGBE_DESCS_PER_LOOP, > > +dev->data->port_id); > > Ok, it looks better, will take it. [...] > > uint16_t > > ixgbe_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, > > uint16_t nb_pkts) > > { > > struct ixgbe_rx_queue *rxq = rx_queue; > > - uint8_t split_flags[RTE_IXGBE_VPMD_RX_BURST] = {0}; > > + uint8_t split_flags[RTE_IXGBE_MAX_RX_BURST] = {0}; > > > > /* get some new buffers */ > > uint16_t nb_bufs = _recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts, > > I don't know if it actually matters from performance point of view, but > you check the whole split_flags array, even if you received only 4 > packets. Though the overhead of the a for loop might be bigger. > v2 here just roll back the change. The size of array is constant. It won't loop much, always compare 4 times 'split_fl64[]==0'. As you said, I ever sent another variable aplit_flags with loop, only very tiny performance difference. As the patch is not trying to improve the performance here, any improvement I propose to make it in another patch.
[dpdk-dev] [PATCH v3] ixgbe: remove vector pmd burst size restriction
Hi Zoltan, > -Original Message- > From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > Sent: Wednesday, August 05, 2015 12:26 AM > To: Liang, Cunming; dev at dpdk.org > Cc: Ananyev, Konstantin > Subject: Re: [PATCH v3] ixgbe: remove vector pmd burst size restriction > > > > On 04/08/15 12:47, Cunming Liang wrote: > > On receive side, the burst size now floor aligns to RTE_IXGBE_DESCS_PER_LOOP > power of 2. > > According to this rule, the burst size less than 4 still won't receive > > anything. > > (Before this change, the burst size less than 32 can't receive anything.) > > _recv_*_pkts_vec returns no more than 32(RTE_IXGBE_RXQ_REARM_THRESH) > packets. > > > > On transmit side, the max burst size no longer bind with a constant, > > however it > still > > require to check the cross tx_rs_thresh violation. > > > > There's no obvious performance drop found on both recv_pkts_vec > > and recv_scattered_pkts_vec on burst size 32. > > > > Signed-off-by: Cunming Liang > > --- > > v3 change: > >- reword the init print log > > > > v2 change: > >- keep max rx burst size in 32 > >- reword some comments > > > > drivers/net/ixgbe/ixgbe_rxtx.c | 4 +++- > > drivers/net/ixgbe/ixgbe_rxtx.h | 5 ++--- > > drivers/net/ixgbe/ixgbe_rxtx_vec.c | 39 > +- > > 3 files changed, 27 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > > index 91023b9..03eb45d 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > @@ -4008,7 +4008,9 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev) > > */ > > } else if (adapter->rx_vec_allowed) { > > PMD_INIT_LOG(DEBUG, "Vector rx enabled, please make sure RX " > > - "burst size no less than 32."); > > + "burst size no less than %d (port=%d).", > > +RTE_IXGBE_DESCS_PER_LOOP, > > +dev->data->port_id); > > A tab seems to be missing from the indentation, otherwise: > > Reviewed-by: Zoltan Kiss > Thanks for the review. I double checked indentation agian, it looks fine. 1st string line 4x/tab intention + space alignment, the other variable lines 3x/tab indentation + space alignment. According to the 'coding_style.rst' Indentation section - 'As with all style guideline, code should match style already in use in an existing file.' The style keeps the same as its following condition check. It passes 'checkpatch.pl' checking as well. Thanks, /Steve > > > > dev->rx_pkt_burst = ixgbe_recv_pkts_vec; > > } else if (adapter->rx_bulk_alloc_allowed) { > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h > > index 113682a..b9eca67 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h > > @@ -47,9 +47,8 @@ > > (uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM) > > > > #ifdef RTE_IXGBE_INC_VECTOR > > -#define RTE_IXGBE_VPMD_RX_BURST 32 > > -#define RTE_IXGBE_VPMD_TX_BURST 32 > > -#define RTE_IXGBE_RXQ_REARM_THRESH RTE_IXGBE_VPMD_RX_BURST > > +#define RTE_IXGBE_RXQ_REARM_THRESH 32 > > +#define RTE_IXGBE_MAX_RX_BURST > RTE_IXGBE_RXQ_REARM_THRESH > > #define RTE_IXGBE_TX_MAX_FREE_BUF_SZ64 > > #endif > > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > > index cf25a53..2ca0e4c 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > > @@ -245,13 +245,13 @@ desc_to_olflags_v(__m128i descs[4], struct > rte_mbuf **rx_pkts) > > #endif > > > > /* > > - * vPMD receive routine, now only accept (nb_pkts == > RTE_IXGBE_VPMD_RX_BURST) > > - * in one loop > > + * vPMD raw receive routine, only accept(nb_pkts >= > RTE_IXGBE_DESCS_PER_LOOP) > >* > >* Notice: > > - * - nb_pkts < RTE_IXGBE_VPMD_RX_BURST, just return no packet > > - * - nb_pkts > RTE_IXGBE_VPMD_RX_BURST, only scan > RTE_IXGBE_VPMD_RX_BURST > > + * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet > > + * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan > RTE_IXGBE_MAX_RX_BURST > >* numbers of DD bit > > + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two > >
[dpdk-dev] [PATCH] eal/linux: fix rte_epoll_wait
Hi, > -Original Message- > From: Robert Sanford [mailto:rsanford2 at gmail.com] > Sent: Tuesday, August 18, 2015 11:54 PM > To: Liang, Cunming; dev at dpdk.org > Subject: [PATCH] eal/linux: fix rte_epoll_wait > > Function rte_epoll_wait should return when underlying call > to epoll_wait times out. > > Signed-off-by: Robert Sanford > --- > lib/librte_eal/linuxapp/eal/eal_interrupts.c |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c > b/lib/librte_eal/linuxapp/eal/eal_interrupts.c > index 3f87875..25cae6a 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c > +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c > @@ -1012,6 +1012,9 @@ rte_epoll_wait(int epfd, struct rte_epoll_event *events, > strerror(errno)); > rc = -1; > break; > + } else { > + /* rc == 0, epoll_wait timed out */ > + break; > } > } > > -- > 1.7.1 Acked-by: Cunming Liang
[dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics
On 1/29/2015 4:48 PM, Yerden Zhumabekov wrote: > Added: > - crc32c_sse42_u32() emits 'crc32l' asm instruction; > - crc32c_sse42_u64() emits 'crc32q' asm instruction; > - crc32c_sse42_u64_mimic(), wrapper in case of run on 32-bit platform. > > Signed-off-by: Yerden Zhumabekov > --- > lib/librte_hash/rte_hash_crc.h | 34 ++ > 1 file changed, 34 insertions(+) > > diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h > index 4da7ca4..fe35996 100644 > --- a/lib/librte_hash/rte_hash_crc.h > +++ b/lib/librte_hash/rte_hash_crc.h > @@ -363,6 +363,40 @@ crc32c_2words(uint64_t data, uint32_t init_val) > return crc; > } > > +static inline uint32_t > +crc32c_sse42_u32(uint32_t data, uint32_t init_val) > +{ > + __asm__ volatile( > + "crc32l %[data], %[init_val];" > + : [init_val] "+r" (init_val) > + : [data] "rm" (data)); > + return init_val; > +} > + > +static inline uint32_t > +crc32c_sse42_u64(uint64_t data, uint64_t init_val) > +{ > + __asm__ volatile( > + "crc32q %[data], %[init_val];" > + : [init_val] "+r" (init_val) > + : [data] "rm" (data)); > + return init_val; > +} [LCM] I'm curious about the benefit of replacing CRC32 intrinsic "_mm_crc32_u32/64". > + > +static inline uint32_t > +crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val) > +{ > + union { > + uint32_t u32[2]; > + uint64_t u64; > + } d; > + > + d.u64 = data; > + init_val = crc32c_sse42_u32(d.u32[0], init_val); > + init_val = crc32c_sse42_u32(d.u32[1], init_val); > + return init_val; > +} > + > /** >* Use single crc32 instruction to perform a hash on a 4 byte value. >*
[dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics
Got it, thanks. > -Original Message- > From: Yerden Zhumabekov [mailto:e_zhumabekov at sts.kz] > Sent: Monday, February 02, 2015 1:34 PM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of > CRC32 intrinsics > > > 02.02.2015 11:15, Liang, Cunming ?: > > > >> +static inline uint32_t > >> +crc32c_sse42_u64(uint64_t data, uint64_t init_val) > >> +{ > >> +__asm__ volatile( > >> +"crc32q %[data], %[init_val];" > >> +: [init_val] "+r" (init_val) > >> +: [data] "rm" (data)); > >> +return init_val; > >> +} > > [LCM] I'm curious about the benefit of replacing CRC32 intrinsic > > "_mm_crc32_u32/64". > > These intrinsics are not available on a platform which has no SSE4.2 > support so the build would fail. > > See previous suggestion from Neil: > http://dpdk.org/ml/archives/dev/2014-November/008353.html > > -- > Sincerely, > > Yerden Zhumabekov > State Technical Service > Astana, KZ
Re: [dpdk-dev] [dpdk-techboard] A new bus for mediated devices
Hi Alejandro, From: Alejandro Lucero [mailto:alejandro.luc...@netronome.com] Sent: Wednesday, January 16, 2019 1:59 AM To: Liang, Cunming ; dev Cc: Richardson, Bruce ; Lu, Xiuchun Subject: Re: [dpdk-techboard] A new bus for mediated devices Hi Steve, On Tue, Jan 15, 2019 at 2:19 PM Liang, Cunming mailto:cunming.li...@intel.com>> wrote: Hi Alejandro, Good to know we have common interest in DPDK native mdev support. We’re working on something which mdev based PMD driver is part of. It was going to collect others’ interest & feedback on DPDK summit before we start upstream effort. Which DPDK summit do you refer to? the last one is Santa Jose in December? [LC] Yes, it is. You can find it from the link https://schd.ws/hosted_files/dpdksummitnorthamerica2018/7b/DPDK_Summit18_MDEV_Fine-Grained-Slicing_Steve_John.pdf There was a few considerations. - VT-d Spec 3.0 is publish, but no platform available to support even PCIe device might have the ability - Except Intel, not sure other network IHVs is going to design their device by the new spec. - w/o available platform, it only supports singleton mdev instance per parent device - even in singleton mdev support, it requires IOMMU aware mediate device which is WIP in kernel Yes, I know this is new stuff and it will not be usable as I have previously commented by now, but I think this is going to be really important in the near future. It adds a lot of flexibility for creating ad-hoc net devices to be used by VMs. [LC] Fully agree. In our initial case, we just need one mdev per parent device, and the IOMMU mapping would be managed by the parent device after the proper ioctl call from user space (NFP PMD for mediated device). [LC] I see, so essentially it’s singleton mdev instance base on IOMMU & SR-IOV platform. It requires mdev being capable to use parent device’ IOMMU domain, which does WIP. There’s no extra platform need by this usage, it’s good. For these reason, we hold on the upstream effort on DPDK side. I understand. However, I think this should be discussed asap and to figure out which is what is needed. When implementing the mdev bus for DPDK myself, I found the mdev interface is so flexible (or maybe undefined), it is not clear how it should be done. I’m actually quite interest in your use case, what’s the benefit you’re looking forward for kernel vfio mdev. If you don’t mind, could you share with us? We need to use the PF and VFs in user space, this is DPDK, and the VF creation is not possible when PF is bound to the VFIO driver (vfio-pci). Mu idea is just to create a mediated device for allowing this, with the kernel driver helping with mmaping the right BAR areas. After that, the PMD will work almost as current NFP PMD, although certain things like link up/down or getting extended stats will be through the kernel netdev. [LC] Yeah, it separates device control and packet I/O, which is the most straight forward benefit introduced by mdev. It’s definitely a good usage as you mentioned. Our initial minimum goals to DPDK native mdev support, - scan/probe/… kernel mdev bus sysfs - keep consistent vfio uapi in DPDK - reuse/unmodified any existing PMD previous built for pci bus This last point seems quite complicated if not impossible, at least in our case. [LC] That’s for case having exact the same device function but only being different on the granularity (e.g. number of queue-pairs). It sucks to have a duplicated PMD just for mdev bus. It’s not your case, which is good to build from scratch a lightweight PMD and preserve the device control by kernel. We had patch set base on DPDK 18.05 and haven’t rebased yet to main stream, which includes - intro new rte_mdev_bus for kernel mdev bus - intro new rte_mdev_driver for ‘vfio-pci’ mdev type (allows to register other bus driver according to mdev type -- ‘device_api’) - whitelist & blacklist uuid support - a pci vfio change to map resource according to general sysfs Good. I have almost a mdev bus driver implemented and a specific NFP PMD for a NFP mediated device. But I have been working for the shake of probing this as an option for our purposes. Of course, my idea was to work on a full mdev support for DPDK so that was the reason of my email to the techboard. [LC] The goal is fully aligned. Mdev is much easier to manage the device lifecycle, is able to support different bus layout (e.g. pci, platform, ccw) and etc. We’d like DPDK mdev enabling preserve most of the benefits. Knowing you have been working on this longer than me, and likely having a more complete implementation, I will not try to duplicate work here, and I hope I can contribute to the final implementation once I see your design. [LC] That’s great. We’ll initialize a RFC, your input from different view would be really helpful, looking forward to the coll
Re: [dpdk-dev] [dpdk-techboard] A new bus for mediated devices
Resend in plain text. -Original Message- From: Liang, Cunming Sent: Wednesday, January 16, 2019 6:49 PM To: Alejandro Lucero ; dev Cc: Richardson, Bruce ; Lu, Xiuchun Subject: RE: [dpdk-techboard] A new bus for mediated devices Hi Alejandro, > > From: Alejandro Lucero [mailto:alejandro.luc...@netronome.com] > Sent: Wednesday, January 16, 2019 1:59 AM > To: Liang, Cunming ; dev > Cc: Richardson, Bruce ; Lu, Xiuchun > > Subject: Re: [dpdk-techboard] A new bus for mediated devices > > Hi Steve, > >> On Tue, Jan 15, 2019 at 2:19 PM Liang, Cunming >> wrote: >> Hi Alejandro, >> >> Good to know we have common interest in DPDK native mdev support. >> >> We’re working on something which mdev based PMD driver is part of. It was >> going >> to collect others’ interest & feedback on DPDK summit before we start >> upstream >> effort. > > Which DPDK summit do you refer to? the last one is Santa Jose in December? [LC] Yes, it is. You can find it from the link https://schd.ws/hosted_files/dpdksummitnorthamerica2018/7b/DPDK_Summit18_MDEV_Fine-Grained-Slicing_Steve_John.pdf > >> There was a few considerations. >> - VT-d Spec 3.0 is publish, but no platform available to support >> even PCIe device >> might have the ability >> - Except Intel, not sure other network IHVs is going to design >> their device by the >> new spec. >> - w/o available platform, it only supports singleton mdev instance >> per parent >> device >> - even in singleton mdev support, it requires IOMMU aware mediate >> device >> which is WIP in kernel > > > Yes, I know this is new stuff and it will not be usable as I have previously > commented > by now, but I think this is going to be really important in the near future. > It adds a lot > of flexibility for creating ad-hoc net devices to be used by VMs. [LC] Fully agree. > In our initial case, we just need one mdev per parent device, and the IOMMU > mapping would be managed by the parent device after the proper ioctl call from > user space (NFP PMD for mediated device). [LC] I see, so essentially it’s singleton mdev instance base on IOMMU & SR-IOV platform. It requires mdev being capable to use parent device’ IOMMU domain, which does WIP. There’s no extra platform need by this usage, it’s good. >> >> For these reason, we hold on the upstream effort on DPDK side. >> > I understand. However, I think this should be discussed asap and to figure > out which > is what is needed. When implementing the mdev bus for DPDK myself, I found the > mdev interface is so flexible (or maybe undefined), it is not clear how it > should be > done. > >> I’m actually quite interest in your use case, what’s the benefit you’re >> looking >> forward for kernel vfio mdev. If you don’t mind, could you share with us? > > We need to use the PF and VFs in user space, this is DPDK, and the VF > creation is not > possible when PF is bound to the VFIO driver (vfio-pci). Mu idea is just to > create a > mediated device for allowing this, with the kernel driver helping with > mmaping the > right BAR areas. After that, the PMD will work almost as current NFP PMD, > although > certain things like link up/down or getting extended stats will be through > the kernel > netdev. [LC] Yeah, it separates device control and packet I/O, which is the most straight forward benefit introduced by mdev. It’s definitely a good usage as you mentioned. >> >> Our initial minimum goals to DPDK native mdev support, >> - scan/probe/… kernel mdev bus sysfs >> - keep consistent vfio uapi in DPDK >> - reuse/unmodified any existing PMD previous built for pci bus >> > This last point seems quite complicated if not impossible, at least in our > case. [LC] That’s for case having exact the same device function but only being different on the granularity (e.g. number of queue-pairs). It sucks to have a duplicated PMD just for mdev bus. It’s not your case, which is good to build from scratch a lightweight PMD and preserve the device control by kernel. >> >> We had patch set base on DPDK 18.05 and haven’t rebased yet to main stream, >> which includes >> - intro new rte_mdev_bus for kernel mdev bus >> - intro new rte_mdev_driver for ‘vfio-pci’ mdev type >> (allows to register other bus driver according to mdev type -- ‘device_api’) >> - whitelist & blacklist uuid support >> - a pci vfio change to map resource according to general sysfs >> >> > Good. I have almost a mdev bus driver implemented and
[dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread
> -Original Message- > From: Walukiewicz, Miroslaw > Sent: Thursday, January 22, 2015 5:53 PM > To: Liang, Cunming; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL > thread > > > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > > Sent: Thursday, January 22, 2015 9:17 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL > > thread > > > > For non-EAL thread, bypass per lcore cache, directly use ring pool. > > It allows using rte_mempool in either EAL thread or any user pthread. > > As in non-EAL thread, it directly rely on rte_ring and it's none preemptive. > > It doesn't suggest to run multi-pthread/cpu which compete the > > rte_mempool. > > It will get bad performance and has critical risk if scheduling policy is > > RT. > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_mempool/rte_mempool.h | 18 +++--- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/lib/librte_mempool/rte_mempool.h > > b/lib/librte_mempool/rte_mempool.h > > index 3314651..4845f27 100644 > > --- a/lib/librte_mempool/rte_mempool.h > > +++ b/lib/librte_mempool/rte_mempool.h > > @@ -198,10 +198,12 @@ struct rte_mempool { > > * Number to add to the object-oriented statistics. > > */ > > #ifdef RTE_LIBRTE_MEMPOOL_DEBUG > > -#define __MEMPOOL_STAT_ADD(mp, name, n) do { \ > > - unsigned __lcore_id = rte_lcore_id(); \ > > - mp->stats[__lcore_id].name##_objs += n; \ > > - mp->stats[__lcore_id].name##_bulk += 1; \ > > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\ > > + unsigned __lcore_id = rte_lcore_id(); \ > > + if (__lcore_id < RTE_MAX_LCORE) { \ > > + mp->stats[__lcore_id].name##_objs += n; \ > > + mp->stats[__lcore_id].name##_bulk += 1; \ > > + } \ > > } while(0) > > #else > > #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0) > > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp, > > void * const *obj_table, > > __MEMPOOL_STAT_ADD(mp, put, n); > > > > #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > > - /* cache is not enabled or single producer */ > > - if (unlikely(cache_size == 0 || is_mp == 0)) > > + /* cache is not enabled or single producer or none EAL thread */ > > I don't understand this limitation. > > I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below > #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > /** Per-lcore local cache. */ > struct rte_mempool_cache local_cache[RTE_MAX_LCORE]; > #endif > > But why we cannot extent the size of the local cache table to something like > RTE_MAX_THREADS that does not exceed max value of rte_lcore_id() > > Keeping this condition here is a real performance killer!!. > I saw in my test application spending more 95% of CPU time reading the atomic > in M C/MP ring utilizing access to mempool. [Liang, Cunming] This is the first step to make it work. By Konstantin's comments, shall prevent to allocate unique id by ourselves. And the return value from gettid() is too large as an index. For non-EAL thread performance gap, will think about additional fix patch here. If care about performance, still prefer to choose EAL thread now. > > Same comment for get operation below > > > + if (unlikely(cache_size == 0 || is_mp == 0 || > > +lcore_id >= RTE_MAX_LCORE)) > > goto ring_enqueue; > > > > /* Go straight to ring if put would overflow mem allocated for cache > > */ > > @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void > > **obj_table, > > uint32_t cache_size = mp->cache_size; > > > > /* cache is not enabled or single consumer */ > > - if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size)) > > + if (unlikely(cache_size == 0 || is_mc == 0 || > > +n >= cache_size || lcore_id >= RTE_MAX_LCORE)) > > goto ring_dequeue; > > > > cache = &mp->local_cache[lcore_id]; > > -- > > 1.8.1.4
[dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread
> -Original Message- > From: Walukiewicz, Miroslaw > Sent: Thursday, January 22, 2015 5:58 PM > To: Liang, Cunming; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread > > > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > > Sent: Thursday, January 22, 2015 9:17 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread > > > > Allow to setup timers only for EAL (lcore) threads (__lcore_id < > > MAX_LCORE_ID). > > E.g. ? dynamically created thread will be able to reset/stop timer for lcore > > thread, > > but it will be not allowed to setup timer for itself or another non-lcore > > thread. > > rte_timer_manage() for non-lcore thread would simply do nothing and > > return straightway. > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_timer/rte_timer.c | 40 +++ > > - > > lib/librte_timer/rte_timer.h | 2 +- > > 2 files changed, 32 insertions(+), 10 deletions(-) > > > > diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c > > index 269a992..601c159 100644 > > --- a/lib/librte_timer/rte_timer.c > > +++ b/lib/librte_timer/rte_timer.c > > @@ -79,9 +79,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE]; > > > > Why not extend the priv_timer size to value being in range returned by > rte_lcore_id(). > > All timer stuff will work automatically after such change without any change > in > timer logic including stats. [Liang, Cunming] The same reason as mempool does. It won't expect to involve dynamic unique id allocation for user thread on the first step. The failure secondary won't release the reserved id which cause potential unexpected leak. So will look for other approach to improve the libraries in the next step. > > > /* when debug is enabled, store some statistics */ > > #ifdef RTE_LIBRTE_TIMER_DEBUG > > -#define __TIMER_STAT_ADD(name, n) do { \ > > - unsigned __lcore_id = rte_lcore_id(); \ > > - priv_timer[__lcore_id].stats.name += (n); \ > > +#define __TIMER_STAT_ADD(name, n) do { > > \ > > + unsigned __lcore_id = rte_lcore_id(); \ > > + if (__lcore_id < RTE_MAX_LCORE) > > \ > > + priv_timer[__lcore_id].stats.name += (n); \ > > } while(0) > > #else > > #define __TIMER_STAT_ADD(name, n) do {} while(0) > > @@ -127,15 +128,26 @@ timer_set_config_state(struct rte_timer *tim, > > unsigned lcore_id; > > > > lcore_id = rte_lcore_id(); > > + if (lcore_id >= RTE_MAX_LCORE) > > + lcore_id = LCORE_ID_ANY; > > > > /* wait that the timer is in correct status before update, > > * and mark it as being configured */ > > while (success == 0) { > > prev_status.u32 = tim->status.u32; > > > > + /* > > +* prevent race condition of non-EAL threads > > +* to update the timer. When 'owner == LCORE_ID_ANY', > > +* it means updated by a non-EAL thread. > > +*/ > > + if (lcore_id == (unsigned)LCORE_ID_ANY && > > + (uint16_t)lcore_id == prev_status.owner) > > + return -1; > > + > > /* timer is running on another core, exit */ > > if (prev_status.state == RTE_TIMER_RUNNING && > > - (unsigned)prev_status.owner != lcore_id) > > + prev_status.owner != (uint16_t)lcore_id) > > return -1; > > > > /* timer is being configured on another core */ > > @@ -366,9 +378,13 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t > > expire, > > > > /* round robin for tim_lcore */ > > if (tim_lcore == (unsigned)LCORE_ID_ANY) { > > - tim_lcore = > > rte_get_next_lcore(priv_timer[lcore_id].prev_lcore, > > - 0, 1); > > - priv_timer[lcore_id].prev_lcore = tim_lcore; > > + if (lcore_id < RTE_MAX_LCORE) { > > + tim_lcore = rte_get_next_lcore( > > + priv_timer[lcore_id].prev_lcore, > > + 0, 1); > > + priv_timer[lcore_id].prev_lcore = tim_lcore; > > + } else >
[dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for cpu assignment
> -Original Message- > From: Richardson, Bruce > Sent: Thursday, January 22, 2015 8:19 PM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for > cpu > assignment > > On Thu, Jan 22, 2015 at 04:16:25PM +0800, Cunming Liang wrote: > > It supports one new eal long option '--lcores' for EAL thread cpuset > > assignment. > > > > The format pattern: > > --lcores='lcores[@cpus]<,lcores[@cpus]>' > > lcores, cpus could be a single digit or a group. > > '(' and ')' are necessary if it's a group. > > If not supply '@cpus', the value of cpus uses the same as lcores. > > > > e.g. '1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below > > lcore 0 runs on cpuset 0x41 (cpu 0,6) > > lcore 1 runs on cpuset 0x2 (cpu 1) > > lcore 2 runs on cpuset 0xe0 (cpu 5,6,7) > > lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2) > > lcore 6 runs on cpuset 0x41 (cpu 0,6) > > > > This strikes me as very confusing, though a couple of tweaks might help with > readability. The lcore 0 at the end is especially confusing. Perhaps we can > limit the allowed formats here, > * require the lcore_id to be specified - the lack of an lcore id for the last > part > makes having it as lcore 0 surprising. > * only allow one lcore id to be given for each set of cores. [Liang, Cunming] The last one lcore_set (0,6) without cpuset assigned is equal to '(0,6)@(0,6)' or '0@(0,6), 6@(0,6)'. It's not a typical use case but gives an aggressive sample, it shows the simple way to explain the map. > > I think it may still be readable if we allow the core set to be omitted if its > to be the same as the lcore_id. > > It's probably still not going to be very tidy, but I think we can improve > things. > > /Bruce > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_eal/common/eal_common_launch.c | 1 - > > lib/librte_eal/common/eal_common_options.c | 262 > - > > lib/librte_eal/common/eal_options.h| 2 + > > lib/librte_eal/linuxapp/eal/Makefile | 1 + > > 4 files changed, 261 insertions(+), 5 deletions(-) > > > > diff --git a/lib/librte_eal/common/eal_common_launch.c > b/lib/librte_eal/common/eal_common_launch.c > > index 599f83b..2d732b1 100644 > > --- a/lib/librte_eal/common/eal_common_launch.c > > +++ b/lib/librte_eal/common/eal_common_launch.c > > @@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void) > > rte_eal_wait_lcore(lcore_id); > > } > > } > > - > > diff --git a/lib/librte_eal/common/eal_common_options.c > b/lib/librte_eal/common/eal_common_options.c > > index e2810ab..fc47588 100644 > > --- a/lib/librte_eal/common/eal_common_options.c > > +++ b/lib/librte_eal/common/eal_common_options.c > > @@ -45,6 +45,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "eal_internal_cfg.h" > > #include "eal_options.h" > > @@ -85,6 +86,7 @@ eal_long_options[] = { > > {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM}, > > {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM}, > > {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM}, > > + {OPT_LCORES, 1, 0, OPT_LCORES_NUM}, > > {0, 0, 0, 0} > > }; > > > > @@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist) > > if (min == RTE_MAX_LCORE) > > min = idx; > > for (idx = min; idx <= max; idx++) { > > - cfg->lcore_role[idx] = ROLE_RTE; > > - lcore_config[idx].core_index = count; > > - count++; > > + if (cfg->lcore_role[idx] != ROLE_RTE) { > > + cfg->lcore_role[idx] = ROLE_RTE; > > + lcore_config[idx].core_index = count; > > + count++; > > + } > > } > > min = RTE_MAX_LCORE; > > } else > > @@ -289,6 +293,241 @@ eal_parse_master_lcore(const char *arg) > > return 0; > > } > > > > +/* > > + * Parse elem, the elem could be single number or '(' ')' group > > + * Within group elem, '-' used for a range seperator; > > + *',' used for a
[dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for cpu assignment
Hi Pawel, I don't see much different there. If replacing '@' to '.'; '()' to '[]'; and ',' to '/'; they're almost the same. Without having rx/tx case, so ':' is useless in our case. Considering the semantic, '@'(at) is more readable than '.' for core assignment. -Liang Cunming > -Original Message- > From: Wodkowski, PawelX > Sent: Thursday, January 22, 2015 11:17 PM > To: Ananyev, Konstantin; Richardson, Bruce; Liang, Cunming > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for > cpu > assignment > > Hi, > I want to mention that similar but for me much more readable syntax have > Pktgen-DPDK for defining core - port mapping. Maybe we can adopt this syntax > for new '--lcores' parameter. > > See '-m' parameter syntax on Pktgen readme. > https://github.com/pktgen/Pktgen-DPDK/blob/master/dpdk/examples/pktgen/R > EADME.md > > > -----Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin > > Sent: Thursday, January 22, 2015 3:34 PM > > To: Richardson, Bruce; Liang, Cunming > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for > > cpu > > assignment > > > > Hi Bruce, > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson > > > Sent: Thursday, January 22, 2015 12:19 PM > > > To: Liang, Cunming > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' > > > for cpu > > assignment > > > > > > On Thu, Jan 22, 2015 at 04:16:25PM +0800, Cunming Liang wrote: > > > > It supports one new eal long option '--lcores' for EAL thread cpuset > > assignment. > > > > > > > > The format pattern: > > > > --lcores='lcores[@cpus]<,lcores[@cpus]>' > > > > lcores, cpus could be a single digit or a group. > > > > '(' and ')' are necessary if it's a group. > > > > If not supply '@cpus', the value of cpus uses the same as lcores. > > > > > > > > e.g. '1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below > > > > lcore 0 runs on cpuset 0x41 (cpu 0,6) > > > > lcore 1 runs on cpuset 0x2 (cpu 1) > > > > lcore 2 runs on cpuset 0xe0 (cpu 5,6,7) > > > > lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2) > > > > lcore 6 runs on cpuset 0x41 (cpu 0,6) > > > > > > > > > > This strikes me as very confusing, though a couple of tweaks might help > > > with > > > readability. The lcore 0 at the end is especially confusing. > > > > Didn't get you here: do you find (0,6) confusing, right? > > Because braces implicitly specifies affinity for group of en-braced lcores? > > > > > Perhaps we can > > > limit the allowed formats here, > > > * require the lcore_id to be specified - the lack of an lcore id for the > > > last part > > > makes having it as lcore 0 surprising. > > > > Again, not sure I understand you properly: lcore_id(s) are always specified > > explicitly. > > Physical cpus part might be omitted. > > > > > * only allow one lcore id to be given for each set of cores. > > > > So you mean for '(3-5)@(0,2)' user would have to: '3@(0,2),4@(0,2),5@(0,2)'? > > I don't see big difference here, but imagine you'd like to create a pool of > > 32 EAL- > > threads running on same cpu set. > > With current syntax it is just something like: '(32-63)@(0-7)'. > > With what you proposing it will be a very long list. > > > > > > > > I think it may still be readable if we allow the core set to be omitted > > > if its > > > to be the same as the lcore_id. > > > > I think that is supported. > > See lcore_id=1 in Steve's example above. > > As I understand: --lcores='0,2,3-5' is equal to '-l 0,2,3-5' and to '-c > > 0x3d'. > > > > Konstantin > > > > > > > > It's probably still not going to be very tidy, but I think we can improve > > > things. > > > > > > /Bruce > > > > > > > Signed-off-by: Cunming Liang > > > > --- > > > > lib
[dpdk-dev] [PATCH v1 09/15] malloc: fix the issue of SOCKET_ID_ANY
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Sunday, January 25, 2015 4:05 PM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1 09/15] malloc: fix the issue of > SOCKET_ID_ANY > > On Thu, 22 Jan 2015 16:16:32 +0800 > Cunming Liang wrote: > > > - return rte_socket_id(); > > + unsigned socket_id = rte_socket_id(); > > + > > + if (socket_id == (unsigned)SOCKET_ID_ANY) > > I prefer not casting -1 to unsigned it will cause warnings. > It is better to make socket_id an integer and then have > the implicit cast in the return. [Liang, Cunming] I didn't got warning about it, in which version of compiler complain it ?
[dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling based on VFIO
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Danny Zhou > Sent: Wednesday, January 28, 2015 2:51 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling > based on VFIO > > Signed-off-by: Danny Zhou > Signed-off-by: Yong Liu > --- > lib/librte_eal/common/include/rte_eal.h| 9 + > lib/librte_eal/linuxapp/eal/eal_interrupts.c | 186 > - > lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 11 +- > .../linuxapp/eal/include/exec-env/rte_interrupts.h | 4 + > 4 files changed, 168 insertions(+), 42 deletions(-) > > diff --git a/lib/librte_eal/common/include/rte_eal.h > b/lib/librte_eal/common/include/rte_eal.h > index f4ecd2e..5f31aa5 100644 > --- a/lib/librte_eal/common/include/rte_eal.h > +++ b/lib/librte_eal/common/include/rte_eal.h > @@ -150,6 +150,15 @@ int rte_eal_iopl_init(void); > * - On failure, a negative error value. > */ > int rte_eal_init(int argc, char **argv); > + > +/** > + * @param port_id > + * the port id > + * @return > + * - On success, return 0 [LCM] It has changes to return -1. > + */ > +int rte_eal_wait_rx_intr(uint8_t port_id, uint8_t queue_id); > + > /** > * Usage function typedef used by the application usage function. > * > diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c > b/lib/librte_eal/linuxapp/eal/eal_interrupts.c > index dc2668a..b120303 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c > +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c > @@ -64,6 +64,7 @@ > #include > #include > #include > +#include > > #include "eal_private.h" > #include "eal_vfio.h" > @@ -127,6 +128,7 @@ static pthread_t intr_thread; > #ifdef VFIO_PRESENT > > #define IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int)) > +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int) * > (VFIO_MAX_QUEUE_ID + 1)) > > /* enable legacy (INTx) interrupts */ > static int > @@ -221,7 +223,7 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) { > /* enable MSI-X interrupts */ > static int > vfio_enable_msi(struct rte_intr_handle *intr_handle) { > - int len, ret; > + int len, ret, max_intr; > char irq_set_buf[IRQ_SET_BUF_LEN]; > struct vfio_irq_set *irq_set; > int *fd_ptr; > @@ -230,12 +232,19 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) > { > > irq_set = (struct vfio_irq_set *) irq_set_buf; > irq_set->argsz = len; > - irq_set->count = 1; > + if ((!intr_handle->max_intr) || > + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID)) > + max_intr = VFIO_MAX_QUEUE_ID + 1; > + else > + max_intr = intr_handle->max_intr; > + > + irq_set->count = max_intr; > irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | > VFIO_IRQ_SET_ACTION_TRIGGER; > irq_set->index = VFIO_PCI_MSI_IRQ_INDEX; > irq_set->start = 0; > fd_ptr = (int *) &irq_set->data; > - *fd_ptr = intr_handle->fd; > + memcpy(fd_ptr, intr_handle->queue_fd, sizeof(intr_handle->queue_fd)); > + fd_ptr[max_intr - 1] = intr_handle->fd; > > ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > > @@ -244,23 +253,6 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) { > intr_handle->fd); > return -1; > } > - > - /* manually trigger interrupt to enable it */ > - memset(irq_set, 0, len); > - len = sizeof(struct vfio_irq_set); > - irq_set->argsz = len; > - irq_set->count = 1; > - irq_set->flags = VFIO_IRQ_SET_DATA_NONE | > VFIO_IRQ_SET_ACTION_TRIGGER; > - irq_set->index = VFIO_PCI_MSI_IRQ_INDEX; > - irq_set->start = 0; > - > - ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > - > - if (ret) { > - RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n", > - intr_handle->fd); > - return -1; > - } > return 0; > } > > @@ -292,8 +284,8 @@ vfio_disable_msi(struct rte_intr_handle *intr_handle) { > /* enable MSI-X interrupts */ > static int > vfio_enable_msix(struct rte_intr_handle *intr_handle) { > - int len, ret; > - char irq_set_buf[IRQ_SET_BUF_LEN]; > + int len, ret, max_intr; > + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; > struct vfio_irq_set *irq_set; > int *fd_ptr; > > @@ -301,12 +293,19 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) > { > > irq_set = (struct vfio_irq_set *) irq_set_buf; > irq_set->argsz = len; > - irq_set->count = 1; > + if ((!intr_handle->max_intr) || > + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID)) > + max_intr = VFIO_MAX_QUEUE_ID + 1; > + else > + max_intr = intr_handle->max_intr; > + > + irq_set->count = max_intr; > irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | > VFIO_IRQ_SET_
[dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Danny Zhou > Sent: Wednesday, January 28, 2015 2:51 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt > and polling/interrupt mode switch > > Signed-off-by: Danny Zhou > --- > examples/l3fwd-power/main.c | 170 > +--- > 1 file changed, 129 insertions(+), 41 deletions(-) > > diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c > index f6b55b9..e6e4f55 100644 > --- a/examples/l3fwd-power/main.c > +++ b/examples/l3fwd-power/main.c > @@ -75,12 +75,13 @@ > #include > #include > #include > +#include > > #define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1 > > #define MAX_PKT_BURST 32 > > -#define MIN_ZERO_POLL_COUNT 5 > +#define MIN_ZERO_POLL_COUNT 10 > > /* around 100ms at 2 Ghz */ > #define TIMER_RESOLUTION_CYCLES 2ULL > @@ -188,6 +189,9 @@ struct lcore_rx_queue { > #define MAX_TX_QUEUE_PER_PORT RTE_MAX_ETHPORTS > #define MAX_RX_QUEUE_PER_PORT 128 > > +#define MAX_RX_QUEUE_INTERRUPT_PER_PORT 16 > + > + > #define MAX_LCORE_PARAMS 1024 > struct lcore_params { > uint8_t port_id; > @@ -214,7 +218,7 @@ static uint16_t nb_lcore_params = > sizeof(lcore_params_array_default) / > > static struct rte_eth_conf port_conf = { > .rxmode = { > - .mq_mode= ETH_MQ_RX_RSS, > + .mq_mode = ETH_MQ_RX_RSS, > .max_rx_pkt_len = ETHER_MAX_LEN, > .split_hdr_size = 0, > .header_split = 0, /**< Header Split disabled */ > @@ -226,11 +230,14 @@ static struct rte_eth_conf port_conf = { > .rx_adv_conf = { > .rss_conf = { > .rss_key = NULL, > - .rss_hf = ETH_RSS_IP, > + .rss_hf = ETH_RSS_UDP, > }, > }, > .txmode = { > - .mq_mode = ETH_DCB_NONE, > + .mq_mode = ETH_MQ_TX_NONE, > + }, > + .intr_conf = { > + .rxq = 1, /**< rxq interrupt feature enabled */ > }, > }; > > @@ -402,19 +409,22 @@ power_timer_cb(__attribute__((unused)) struct > rte_timer *tim, > /* accumulate total execution time in us when callback is invoked */ > sleep_time_ratio = (float)(stats[lcore_id].sleep_time) / > (float)SCALING_PERIOD; > - > /** >* check whether need to scale down frequency a step if it sleep a lot. >*/ > - if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) > - rte_power_freq_down(lcore_id); > + if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) { > + if (rte_power_freq_down) > + rte_power_freq_down(lcore_id); > + } > else if ( (unsigned)(stats[lcore_id].nb_rx_processed / > - stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) > + stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) { > /** >* scale down a step if average packet per iteration less >* than expectation. >*/ > - rte_power_freq_down(lcore_id); > + if (rte_power_freq_down) > + rte_power_freq_down(lcore_id); > + } > > /** >* initialize another timer according to current frequency to ensure > @@ -707,22 +717,20 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t > portid, > > } > > -#define SLEEP_GEAR1_THRESHOLD100 > -#define SLEEP_GEAR2_THRESHOLD1000 > +#define MINIMUM_SLEEP_TIME 1 > +#define SUSPEND_THRESHOLD 300 > > static inline uint32_t > power_idle_heuristic(uint32_t zero_rx_packet_count) > { > - /* If zero count is less than 100, use it as the sleep time in us */ > - if (zero_rx_packet_count < SLEEP_GEAR1_THRESHOLD) > - return zero_rx_packet_count; > - /* If zero count is less than 1000, sleep time should be 100 us */ > - else if ((zero_rx_packet_count >= SLEEP_GEAR1_THRESHOLD) && > - (zero_rx_packet_count < SLEEP_GEAR2_THRESHOLD)) > - return SLEEP_GEAR1_THRESHOLD; > - /* If zero count is greater than 1000, sleep time should be 1000 us */ > - else if (zero_rx_packet_count >= SLEEP_GEAR2_THRESHOLD) > - return SLEEP_GEAR2_THRESHOLD; > + /* If zero count is less than 100, sleep 1us */ > + if (zero_rx_packet_count < SUSPEND_THRESHOLD) > + return MINIMUM_SLEEP_TIME; > + /* If zero count is less than 1000, sleep 100 us which is the minimum > latency > + switching from C3/C6 to C0 > + */ > + else > + return SUSPEND_THRESHOLD; > > return 0; > } > @@ -762,6 +770,35 @@ power_freq_scaleup_heuristic(unsigned lcore_id, > return FREQ_CURRENT; > } > > +/** > + * force polling thread sleep until one-shot rx interrupt triggers > + * @param
[dpdk-dev] [BUG] ixgbe vector cannot compile without bulk alloc
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson > Sent: Thursday, January 29, 2015 4:28 PM > To: Thomas Monjalon > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [BUG] ixgbe vector cannot compile without bulk alloc > > On Thu, Jan 29, 2015 at 11:18:01PM +0100, Thomas Monjalon wrote: > > 2014-12-01 18:22, Thomas Monjalon: > > > 2014-12-01 17:18, Bruce Richardson: > > > > On Mon, Dec 01, 2014 at 06:10:18PM +0100, Thomas Monjalon wrote: > > > > > These 2 configuration options are incompatible: > > > > > CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=n > > > > > CONFIG_RTE_IXGBE_INC_VECTOR=y > > > > > Building this config gives this error: > > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:69:24: > > > > > error: ?struct igb_rx_queue? has no member named ?fake_mbuf? > > > > > > > > > > I'd like a confirmation that it will be always incompatible. > > > > > Thanks > > > > > > > > Hi Thomas, > > > > > > > > I don't think these options should always be incompatible, though as you > point > > > > out you do need to turn on bulk alloc support in order to use the vector > PMD. > > > > Why do you ask? There are no immediate plans to remove the dependency > on our end. > > > > So you confirm that the ixgbe vpmd really needs Rx bulk alloc and this kind > > of > > patch cannot work at all (I don't know the design of vpmd): > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > @@ -2119,12 +2119,12 @@ ixgbe_reset_rx_queue(struct igb_rx_queue *rxq) > > rxq->rx_ring[i] = zeroed_desc; > > } > > > > -#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC > > /* > > * initialize extra software ring entries. Space for these extra > > * entries is always allocated > > */ > > memset(&rxq->fake_mbuf, 0x0, sizeof(rxq->fake_mbuf)); > > +#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC > > for (i = 0; i < RTE_PMD_IXGBE_RX_MAX_BURST; ++i) { > > rxq->sw_ring[rxq->nb_rx_desc + i].mbuf = > &rxq->fake_mbuf; > > } > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h > > @@ -127,9 +127,9 @@ struct igb_rx_queue { > > uint8_t crc_len; /**< 0 if CRC stripped, 4 otherwise. > */ > > uint8_t drop_en; /**< If not 0, set SRRCTL.Drop_En. > */ > > uint8_t rx_deferred_start; /**< not in global dev start. > */ > > -#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC > > /** need to alloc dummy mbuf, for wraparound when scanning hw > ring */ > > struct rte_mbuf fake_mbuf; > > +#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC > > /** hold packets to return to application */ > > struct rte_mbuf *rx_stage[RTE_PMD_IXGBE_RX_MAX_BURST*2]; > > #endif > > > > > I think the compilation shouldn't fail without a proper message. > > > In order to distinguish a real compilation error from an incompatibility, > > > we should add a warning in the makefile. > > > Ideally, the build system should handle dependencies. But waiting this > > > ideal > > > time, a warning would be graceful. > > > > Do you agree that something like this would be OK? > > > > --- a/lib/librte_pmd_ixgbe/Makefile > > +++ b/lib/librte_pmd_ixgbe/Makefile > > @@ -114,4 +114,8 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += > lib/librte_eal lib/librte_ether > > DEPDIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += lib/librte_mempool > lib/librte_mbuf > > DEPDIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += lib/librte_net > lib/librte_malloc > > > > +ifeq > ($(CONFIG_RTE_IXGBE_INC_VECTOR)$(CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_B > ULK_ALLOC),yn) > > +$(error The ixgbe vpmd depends on Rx bulk alloc) > > +endif > > + > > include $(RTE_SDK)/mk/rte.lib.mk > > > > Something like the above looks like a good solution to me. > > /Bruce [Liang, Cunming] To avoid compile complain, this one is ok. It's doable to remove the dependence between two. We can submit it in a separate patch. > > > Thanks > > -- > > Thomas
[dpdk-dev] [PATCH 04/17] ixgbe: support of unified packet type
> -Original Message- > From: Richardson, Bruce > Sent: Thursday, January 29, 2015 4:30 PM > To: Zhang, Helin > Cc: dev at dpdk.org; Cao, Waterman; Liang, Cunming; Liu, Jijiang; Ananyev, > Konstantin > Subject: Re: [PATCH 04/17] ixgbe: support of unified packet type > > On Thu, Jan 29, 2015 at 11:15:52AM +0800, Helin Zhang wrote: > > To unify packet types among all PMDs, bit masks of packet type for > > ol_flags are replaced by unified packet type for Vector PMD. > > > > Two suggestions on the commit log: > 1. Can you add scalar and vector into the titles to make it clear how this > patch and the previous ones differ > 2. Can you add a note calling out performance impacts for this patch. If no > performance impacts, then please note that for reviewers. [Liang, Cunming] Accept, will update it in v2. For performance, lose 1 cycle per packet during 4x10GE io fwd loopback unit test. > > /Bruce > > > Signed-off-by: Cunming Liang > > Signed-off-by: Helin Zhang > > --- > > lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 39 > +++ > > 1 file changed, 21 insertions(+), 18 deletions(-) > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > index b54cb19..b3cf7dd 100644 > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > @@ -134,44 +134,35 @@ ixgbe_rxq_rearm(struct igb_rx_queue *rxq) > > */ > > #ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE > > > > -#define OLFLAGS_MASK ((uint16_t)(PKT_RX_VLAN_PKT | > PKT_RX_IPV4_HDR |\ > > -PKT_RX_IPV4_HDR_EXT | PKT_RX_IPV6_HDR |\ > > -PKT_RX_IPV6_HDR_EXT)) > > -#define OLFLAGS_MASK_V (((uint64_t)OLFLAGS_MASK << 48) | \ > > - ((uint64_t)OLFLAGS_MASK << 32) | \ > > - ((uint64_t)OLFLAGS_MASK << 16) | \ > > - ((uint64_t)OLFLAGS_MASK)) > > -#define PTYPE_SHIFT(1) > > +#define OLFLAGS_MASK_V (((uint64_t)PKT_RX_VLAN_PKT << 48) | \ > > + ((uint64_t)PKT_RX_VLAN_PKT << 32) | \ > > + ((uint64_t)PKT_RX_VLAN_PKT << 16) | \ > > + ((uint64_t)PKT_RX_VLAN_PKT)) > > #define VTAG_SHIFT (3) > > > > static inline void > > desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts) > > { > > - __m128i ptype0, ptype1, vtag0, vtag1; > > + __m128i vtag0, vtag1; > > union { > > uint16_t e[4]; > > uint64_t dword; > > } vol; > > > > - ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]); > > - ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]); > > vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]); > > vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]); > > > > - ptype1 = _mm_unpacklo_epi32(ptype0, ptype1); > > vtag1 = _mm_unpacklo_epi32(vtag0, vtag1); > > - > > - ptype1 = _mm_slli_epi16(ptype1, PTYPE_SHIFT); > > vtag1 = _mm_srli_epi16(vtag1, VTAG_SHIFT); > > > > - ptype1 = _mm_or_si128(ptype1, vtag1); > > - vol.dword = _mm_cvtsi128_si64(ptype1) & OLFLAGS_MASK_V; > > + vol.dword = _mm_cvtsi128_si64(vtag1) & OLFLAGS_MASK_V; > > > > rx_pkts[0]->ol_flags = vol.e[0]; > > rx_pkts[1]->ol_flags = vol.e[1]; > > rx_pkts[2]->ol_flags = vol.e[2]; > > rx_pkts[3]->ol_flags = vol.e[3]; > > } > > + > > #else > > #define desc_to_olflags_v(desc, rx_pkts) do {} while (0) > > #endif > > @@ -204,6 +195,8 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct > rte_mbuf **rx_pkts, > > 0/* ignore pkt_type field */ > > ); > > __m128i dd_check, eop_check; > > + __m128i desc_mask = _mm_set_epi32(0x, 0x, > > + 0x, 0x07F0); > > > > if (unlikely(nb_pkts < RTE_IXGBE_VPMD_RX_BURST)) > > return 0; > > @@ -239,7 +232,8 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct > rte_mbuf **rx_pkts, > > 0xFF, 0xFF, /* skip high 16 bits pkt_len, zero out */ > > 13, 12, /* octet 12~13, low 16 bits pkt_len */ > > 13, 12, /* octet 12~13, 16 bits data_len */ > > - 0xFF, 0xFF /* skip pkt_type field */ > > + 1, /* octet 1, 8 bits pkt_type field */ > > + 0/* octet 0, 4 bits offset 4 pkt_type field */ &g
[dpdk-dev] [BUG] ixgbe vector cannot compile without bulk alloc
> -Original Message- > From: Richardson, Bruce > Sent: Thursday, January 29, 2015 8:38 PM > To: Liang, Cunming > Cc: Thomas Monjalon; dev at dpdk.org > Subject: Re: [dpdk-dev] [BUG] ixgbe vector cannot compile without bulk alloc > > On Thu, Jan 29, 2015 at 11:39:37PM +, Liang, Cunming wrote: > > > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson > > > Sent: Thursday, January 29, 2015 4:28 PM > > > To: Thomas Monjalon > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [BUG] ixgbe vector cannot compile without bulk > > > alloc > > > > > > On Thu, Jan 29, 2015 at 11:18:01PM +0100, Thomas Monjalon wrote: > > > > 2014-12-01 18:22, Thomas Monjalon: > > > > > 2014-12-01 17:18, Bruce Richardson: > > > > > > On Mon, Dec 01, 2014 at 06:10:18PM +0100, Thomas Monjalon wrote: > > > > > > > These 2 configuration options are incompatible: > > > > > > > CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=n > > > > > > > CONFIG_RTE_IXGBE_INC_VECTOR=y > > > > > > > Building this config gives this error: > > > > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:69:24: > > > > > > > error: ?struct igb_rx_queue? has no member named > ?fake_mbuf? > > > > > > > > > > > > > > I'd like a confirmation that it will be always incompatible. > > > > > > > Thanks > > > > > > > > > > > > Hi Thomas, > > > > > > > > > > > > I don't think these options should always be incompatible, though as > you > > > point > > > > > > out you do need to turn on bulk alloc support in order to use the > > > > > > vector > > > PMD. > > > > > > Why do you ask? There are no immediate plans to remove the > dependency > > > on our end. > > > > > > > > So you confirm that the ixgbe vpmd really needs Rx bulk alloc and this > > > > kind > of > > > > patch cannot work at all (I don't know the design of vpmd): > > > > > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > > @@ -2119,12 +2119,12 @@ ixgbe_reset_rx_queue(struct igb_rx_queue > *rxq) > > > > rxq->rx_ring[i] = zeroed_desc; > > > > } > > > > > > > > -#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC > > > > /* > > > > * initialize extra software ring entries. Space for these extra > > > > * entries is always allocated > > > > */ > > > > memset(&rxq->fake_mbuf, 0x0, sizeof(rxq->fake_mbuf)); > > > > +#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC > > > > for (i = 0; i < RTE_PMD_IXGBE_RX_MAX_BURST; ++i) { > > > > rxq->sw_ring[rxq->nb_rx_desc + i].mbuf = > > > &rxq->fake_mbuf; > > > > } > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h > > > > @@ -127,9 +127,9 @@ struct igb_rx_queue { > > > > uint8_t crc_len; /**< 0 if CRC stripped, 4 > otherwise. > > > */ > > > > uint8_t drop_en; /**< If not 0, set > SRRCTL.Drop_En. > > > */ > > > > uint8_t rx_deferred_start; /**< not in global dev > start. > > > */ > > > > -#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC > > > > /** need to alloc dummy mbuf, for wraparound when scanning > hw > > > ring */ > > > > struct rte_mbuf fake_mbuf; > > > > +#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC > > > > /** hold packets to return to application */ > > > > struct rte_mbuf *rx_stage[RTE_PMD_IXGBE_RX_MAX_BURST*2]; > > > > #endif > > > > > > > > > I think the compilation shouldn't fail without a proper message. > > > > > In order to distinguish a real compilation error from an > > > > > incompatibility, > > > > > we should add a warning in the makefile. > > > > > Ideally, the build system should handle dependencies. But waiting this > ideal > > > > > time, a warning would be graceful. > > > >
[dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread lcore_config
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:00 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread > lcore_config > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > The patch adds 'cpuset' into per-lcore configure 'lcore_config[]', > > as the lcore no longer always 1:1 pinning with physical cpu. > > The lcore now stands for a EAL thread rather than a logical cpu. > > > > It doesn't change the default behavior of 1:1 mapping, but allows to > > affinity the EAL thread to multiple cpus. > > > > [...] > > diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c > b/lib/librte_eal/bsdapp/eal/eal_memory.c > > index 65ee87d..a34d500 100644 > > --- a/lib/librte_eal/bsdapp/eal/eal_memory.c > > +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c > > @@ -45,6 +45,8 @@ > > #include "eal_internal_cfg.h" > > #include "eal_filesystem.h" > > > > +/* avoid re-defined against with freebsd header */ > > +#undef PAGE_SIZE > > #define PAGE_SIZE (sysconf(_SC_PAGESIZE)) > > I don't see the link with the patch. Should this go somewhere else? > > > > > > /* > > diff --git a/lib/librte_eal/common/include/rte_lcore.h > b/lib/librte_eal/common/include/rte_lcore.h > > index 49b2c03..4c7d6bb 100644 > > --- a/lib/librte_eal/common/include/rte_lcore.h > > +++ b/lib/librte_eal/common/include/rte_lcore.h > > @@ -50,6 +50,13 @@ extern "C" { > > > > #define LCORE_ID_ANY -1/**< Any lcore. */ > > > > +#if defined(__linux__) > > + typedef cpu_set_t rte_cpuset_t; > > +#elif defined(__FreeBSD__) > > +#include > > + typedef cpuset_t rte_cpuset_t; > > +#endif > > + > > Should we also define RTE_CPU_SETSIZE? > For linux, should be included? [LCM] It uses the fix size cpuset, won't use CPU_ALLOC() to get the pointer of cpuset. The RTE_CPU_SETSIZE always equal to sizeof(rte_cpuset_t). > > If I understand well, after the patch series, the user of > rte_thread_set_affinity() and rte_thread_get_affinity() are > supposed to use the macros from sched.h to access to this > cpuset parameter. So I'm wondering if it's not better to > use cpu_set_t from libc instead of redefining rte_cpuset_t. > > To reword my question: what is the purpose of redefining > cpu_set_t in rte_cpuset_t if we still need to use all the > libc API to access to it? [LCM] In linux the type is *cpu_set_t*, but in freebsd it's *cpuset_t*. The purpose of *rte_cpuset_t* is to make the consistent type definition in EAL, and to avoid lots of #ifdef for this diff. In either linux or freebsd, it still can use the MACRO in libc to set the rte_cpuset_t. > > > Regards, > Olivier
[dpdk-dev] [PATCH v4 02/17] eal: new eal option '--lcores' for cpu assignment
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:00 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 02/17] eal: new eal option '--lcores' for > cpu > assignment > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > It supports one new eal long option '--lcores' for EAL thread cpuset > > assignment. > > > > The format pattern: > > --lcores='lcores[@cpus]<,lcores[@cpus]>' > > lcores, cpus could be a single digit/range or a group. > > '(' and ')' are necessary if it's a group. > > If not supply '@cpus', the value of cpus uses the same as lcores. > > > > e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below > > lcore 0 runs on cpuset 0x41 (cpu 0,6) > > lcore 1 runs on cpuset 0x2 (cpu 1) > > lcore 2 runs on cpuset 0xe0 (cpu 5,6,7) > > lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2) > > lcore 6 runs on cpuset 0x41 (cpu 0,6) > > lcore 7 runs on cpuset 0x80 (cpu 7) > > lcore 8 runs on cpuset 0x100 (cpu 8) > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_eal/common/eal_common_launch.c | 1 - > > lib/librte_eal/common/eal_common_options.c | 300 > - > > lib/librte_eal/common/eal_options.h| 2 + > > lib/librte_eal/linuxapp/eal/Makefile | 1 + > > 4 files changed, 299 insertions(+), 5 deletions(-) > > > > diff --git a/lib/librte_eal/common/eal_common_launch.c > b/lib/librte_eal/common/eal_common_launch.c > > index 599f83b..2d732b1 100644 > > --- a/lib/librte_eal/common/eal_common_launch.c > > +++ b/lib/librte_eal/common/eal_common_launch.c > > @@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void) > > rte_eal_wait_lcore(lcore_id); > > } > > } > > - > > > This line should be removed from the patch. [LCM] Accept. > > > > diff --git a/lib/librte_eal/common/eal_common_options.c > b/lib/librte_eal/common/eal_common_options.c > > index 67e02dc..29ebb6f 100644 > > --- a/lib/librte_eal/common/eal_common_options.c > > +++ b/lib/librte_eal/common/eal_common_options.c > > @@ -45,6 +45,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "eal_internal_cfg.h" > > #include "eal_options.h" > > @@ -85,6 +86,7 @@ eal_long_options[] = { > > {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM}, > > {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM}, > > {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM}, > > + {OPT_LCORES, 1, 0, OPT_LCORES_NUM}, > > {0, 0, 0, 0} > > }; > > > > @@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist) > > if (min == RTE_MAX_LCORE) > > min = idx; > > for (idx = min; idx <= max; idx++) { > > - cfg->lcore_role[idx] = ROLE_RTE; > > - lcore_config[idx].core_index = count; > > - count++; > > + if (cfg->lcore_role[idx] != ROLE_RTE) { > > + cfg->lcore_role[idx] = ROLE_RTE; > > + lcore_config[idx].core_index = count; > > + count++; > > + } > > } > > min = RTE_MAX_LCORE; > > } else > > @@ -292,6 +296,279 @@ eal_parse_master_lcore(const char *arg) > > return 0; > > } > > > > +/* > > + * Parse elem, the elem could be single number/range or '(' ')' group > > + * Within group elem, '-' used for a range seperator; > > + *',' used for a single number. > > + */ > > +static int > > +eal_parse_set(const char *input, uint16_t set[], unsigned num) > > It's not very clear what elem is. Maybe it could be a bit reworded. > What about naming the function "eal_parse_cpuset()" instead? [LCM] As it not only parse cpuset but also used for lcore set, so 'eal_parse_cpuset' is not accurate. The set/elem here identify for a single number (e.g. 1), a number range (e.g. 4-6) or a group (e.g. (3,4-8,9) ). I'll reword the comment for better understand. Thanks. > > > > +{ > > + unsigned idx; > > + const char *str = input; > > + char *end = NULL; >
[dpdk-dev] [PATCH v4 03/17] eal: fix wrong strnlen() return value in 32bit icc
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:00 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 03/17] eal: fix wrong strnlen() return > value in > 32bit icc > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > The problem is that strnlen() here may return invalid value with 32bit icc. > > (actually it returns it?s second parameter,e.g: sysconf(_SC_ARG_MAX)). > > It starts to manifest hwen max_len parameter is > 2M and using icc ?m32 ?O2 > (or above). > > > > Suggested-by: Konstantin Ananyev > > Signed-off-by: Cunming Liang > > --- > > lib/librte_eal/common/eal_common_options.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/lib/librte_eal/common/eal_common_options.c > b/lib/librte_eal/common/eal_common_options.c > > index 29ebb6f..22d5d37 100644 > > --- a/lib/librte_eal/common/eal_common_options.c > > +++ b/lib/librte_eal/common/eal_common_options.c > > @@ -227,7 +227,7 @@ eal_parse_corelist(const char *corelist) > > /* Remove all blank characters ahead and after */ > > while (isblank(*corelist)) > > corelist++; > > - i = strnlen(corelist, sysconf(_SC_ARG_MAX)); > > + i = strnlen(corelist, PATH_MAX); > > while ((i > 0) && isblank(corelist[i - 1])) > > i--; > > > > @@ -469,7 +469,7 @@ eal_parse_lcores(const char *lcores) > > /* Remove all blank characters ahead and after */ > > while (isblank(*lcores)) > > lcores++; > > - i = strnlen(lcores, sysconf(_SC_ARG_MAX)); > > + i = strnlen(lcores, PATH_MAX); > > while ((i > 0) && isblank(lcores[i - 1])) > > i--; > > > > > > I think PATH_MAX is not equivalent to _SC_ARG_MAX. > > But the main question is: why do we need to use strnlen() here instead > of strlen? We can expect that argv[] pointers are always nul-terminated. > Replacing them by strlen() would probably also solve the icc issue. [LCM] You're right, here strlen() also solve icc issue and no risk for argv[]. But follows practice suggestion, keeping using those with 'n' function in DPDK is not bad. There's additional two reason to keep strnlen and PATH_MAX. 1. PATH_MAX is defined as 4096 which is enough as our input. It doesn't matter to be _SC_ARG_MAX or not. 2. strnlen and PATH_MAX already used in eal_parse_coremask, to keep the style consistent in '-l' and '--lcores'. > > Regards, > Olivier
[dpdk-dev] [PATCH v4 04/17] eal: add support parsing socket_id from cpuset
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:00 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 04/17] eal: add support parsing socket_id > from cpuset > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > It returns the socket_id if all cpus in the cpuset belongs > > to the same NUMA node, otherwise it will return SOCKET_ID_ANY. > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_eal/bsdapp/eal/eal_lcore.c | 7 + > > lib/librte_eal/common/eal_thread.h | 52 > + > > lib/librte_eal/linuxapp/eal/eal_lcore.c | 7 + > > 3 files changed, 66 insertions(+) > > > > diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c > b/lib/librte_eal/bsdapp/eal/eal_lcore.c > > index 72f8ac2..162fb4f 100644 > > --- a/lib/librte_eal/bsdapp/eal/eal_lcore.c > > +++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c > > @@ -41,6 +41,7 @@ > > #include > > > > #include "eal_private.h" > > +#include "eal_thread.h" > > > > /* No topology information available on FreeBSD including NUMA info */ > > #define cpu_core_id(X) 0 > > @@ -112,3 +113,9 @@ rte_eal_cpu_init(void) > > > > return 0; > > } > > + > > +unsigned > > +eal_cpu_socket_id(__rte_unused unsigned cpu_id) > > +{ > > + return cpu_socket_id(cpu_id); > > +} > > diff --git a/lib/librte_eal/common/eal_thread.h > b/lib/librte_eal/common/eal_thread.h > > index b53b84d..a25ee86 100644 > > --- a/lib/librte_eal/common/eal_thread.h > > +++ b/lib/librte_eal/common/eal_thread.h > > @@ -34,6 +34,10 @@ > > #ifndef EAL_THREAD_H > > #define EAL_THREAD_H > > > > +#include > > + > > +#include > > + > > /** > > * basic loop of thread, called for each thread by eal_init(). > > * > > @@ -50,4 +54,52 @@ __attribute__((noreturn)) void *eal_thread_loop(void > *arg); > > */ > > void eal_thread_init_master(unsigned lcore_id); > > > > +/** > > + * Get the NUMA socket id from cpu id. > > + * This function is private to EAL. > > + * > > + * @param cpu_id > > + * The logical process id. > > + * @return > > + * socket_id or SOCKET_ID_ANY > > + */ > > +unsigned eal_cpu_socket_id(unsigned cpu_id); > > Wouldn't it be better to rename the existing function cpu_socket_id() > in eal_cpu_socket_id() and export it in eal_thread.h? > > In case of bsd where cpu_socket_id() is implemented using a #define, > a new function should be created returning 0. [LCM] In eal_lcore.c, the cpu_socket_id()/cpu_core_id() defined as static and only used in rte_eal_cpu_init(). I suppose the purpose of origin design is to make the sysfs parsing only visible in the file. No matter remove the 'static' prefix of cpu_core_id() or add a new wrap eal_cpu_socket_id(), it results in a new extern EAL API. So I prefer not change the visibility of the origin static function but have one as extern interface. > > > > + > > +/** > > + * Get the NUMA socket id from cpuset. > > + * This function is private to EAL. > > + * > > + * @param cpusetp > > + * The point to a valid cpu set. > > + * @return > > + * socket_id or SOCKET_ID_ANY > > + */ > > +static inline int > > +eal_cpuset_socket_id(rte_cpuset_t *cpusetp) > > +{ > > + unsigned cpu = 0; > > + int socket_id = SOCKET_ID_ANY; > > + int sid; > > + > > + if (cpusetp == NULL) > > + return SOCKET_ID_ANY; > > SOCKET_ID_ANY is not defined, maybe should be included > somewhere. [LCM] Agree with you, eal_cpuset_socket_id() can move into eal_common_thread.c. And add rte_memory.h for SOCKET_ID_ANY reference. > > > + > > + do { > > + if (!CPU_ISSET(cpu, cpusetp)) > > + continue; > > + > > + if (socket_id == SOCKET_ID_ANY) > > + socket_id = eal_cpu_socket_id(cpu); > > + > > + sid = eal_cpu_socket_id(cpu); > > + if (socket_id != sid) { > > + socket_id = SOCKET_ID_ANY; > > + break; > > + } > > + > > + } while (++cpu < RTE_MAX_LCORE); > > + > > + return socket_id; > > +} > > > I don't think this function should be inlined. > > As this function is not used, it could be interesting for reviewers > to understand when [LCM] It's used in eal_thread_set_affinity() of eal_thread.c. > > > + > > #endif /* EAL_THREAD_H */ > > diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c > b/lib/librte_eal/linuxapp/eal/eal_lcore.c > > index 29615f8..922af6d 100644 > > --- a/lib/librte_eal/linuxapp/eal/eal_lcore.c > > +++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c > > @@ -45,6 +45,7 @@ > > > > #include "eal_private.h" > > #include "eal_filesystem.h" > > +#include "eal_thread.h" > > > > #define SYS_CPU_DIR "/sys/devices/system/cpu/cpu%u" > > #define CORE_ID_FILE "topology/core_id" > > @@ -197,3 +198,9 @@ rte_eal_cpu_init(void) > > > > return 0; > > } > > + > > +unsigned > > +eal_cpu_socket_id(unsigned cpu_id) > > +{ > > + return cpu_socket_id(cpu_id); > > +} > >
[dpdk-dev] [PATCH v4 05/17] eal: new TLS definition and API declaration
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:00 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 05/17] eal: new TLS definition and API > declaration > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > 1. add two TLS *_socket_id* and *_cpuset* > > 2. add two external API rte_thread_set/get_affinity > > 3. add one internal API eal_thread_dump_affinity > > To me, it's a bit strage to add an API withtout the associated code. > Maybe you have a good reason to do that, but I think in this case it > should be explained in the commit log. [LCM] Accept. > > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_eal/bsdapp/eal/eal_thread.c| 2 ++ > > lib/librte_eal/common/eal_thread.h| 14 ++ > > lib/librte_eal/common/include/rte_lcore.h | 29 > +++-- > > lib/librte_eal/linuxapp/eal/eal_thread.c | 2 ++ > > 4 files changed, 45 insertions(+), 2 deletions(-) > > > > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c > b/lib/librte_eal/bsdapp/eal/eal_thread.c > > index ab05368..10220c7 100644 > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c > > @@ -56,6 +56,8 @@ > > #include "eal_thread.h" > > > > RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); > > +RTE_DEFINE_PER_LCORE(unsigned, _socket_id); > > +RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > > > > /* > > * Send a message to a slave lcore identified by slave_id to call a > > diff --git a/lib/librte_eal/common/eal_thread.h > b/lib/librte_eal/common/eal_thread.h > > index a25ee86..28edf51 100644 > > --- a/lib/librte_eal/common/eal_thread.h > > +++ b/lib/librte_eal/common/eal_thread.h > > @@ -102,4 +102,18 @@ eal_cpuset_socket_id(rte_cpuset_t *cpusetp) > > return socket_id; > > } > > > > +/** > > + * Dump the current pthread cpuset. > > + * This function is private to EAL. > > + * > > + * @param str > > + * The string buffer the cpuset will dump to. > > + * @param size > > + * The string buffer size. > > + */ > > +#define CPU_STR_LEN256 > > +void > > +eal_thread_dump_affinity(char str[], unsigned size); > > Although it's equivalent for function arguments, I think "char *str" is > usually preferred over "char str[]". See for instance in snprintf() or > fgets(). [LCM] Accept. > > What is the purpose of CPU_STR_LEN? [LCM] For default quick reference for str[] definition used in dump_affinity() > > What occurs if the size of the dump is greater than the size of the > given buffer? Is the string truncated? Is there a \0 at the end? [LCM] Yes, always have a '\0' in the end. > This should be described in the API comments. [LCM] Accept. > Maybe adding a return > value could help the user to determine if the string was truncated. [LCM] Good idea, so the user can continue to print '...' for the truncated part. > > > + > > + > > #endif /* EAL_THREAD_H */ > > diff --git a/lib/librte_eal/common/include/rte_lcore.h > b/lib/librte_eal/common/include/rte_lcore.h > > index 4c7d6bb..facdbdc 100644 > > --- a/lib/librte_eal/common/include/rte_lcore.h > > +++ b/lib/librte_eal/common/include/rte_lcore.h > > @@ -43,6 +43,7 @@ > > #include > > #include > > #include > > +#include > > > > #ifdef __cplusplus > > extern "C" { > > @@ -80,7 +81,9 @@ struct lcore_config { > > */ > > extern struct lcore_config lcore_config[RTE_MAX_LCORE]; > > > > -RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */ > > +RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per thread "lcore id". > */ > > +RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". > */ > > +RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". > */ > > > > /** > > * Return the ID of the execution unit we are running on. > > @@ -146,7 +149,7 @@ rte_lcore_index(int lcore_id) > > static inline unsigned > > rte_socket_id(void) > > { > > - return lcore_config[rte_lcore_id()].socket_id; > > + return RTE_PER_LCORE(_socket_id); > > } > > I don't see where the _socket_id variable is assigned. I think there > is probably an issue with the splitting of the patches. [LCM] The value initializes as SOCKET_ID_ANY when RTE_DEFINE_PER_LCORE(). And updated in eal_thread_set_affinity() for EAL thread and rte_thread_set_affinity() for non-EAL thread. > > Regards, > Olivier
[dpdk-dev] [PATCH v4 06/17] eal: add eal_common_thread.c for common thread API
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:00 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 06/17] eal: add eal_common_thread.c for > common thread API > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > The API works for both EAL thread and none EAL thread. > > When calling rte_thread_set_affinity, the *_socket_id* and > > *_cpuset* of calling thread will be updated if the thread > > successful set the cpu affinity. > > > > [...] > > +int > > +rte_thread_set_affinity(rte_cpuset_t *cpusetp) > > +{ > > + int s; > > + unsigned lcore_id; > > + pthread_t tid; > > + > > + if (!cpusetp) > > + return -1; > > Is it really needed to test that cpusetp is not NULL? [LCM] Accept, we can ignore it and depend on pthread_setaffinity_np() to return failure. > > > + > > + lcore_id = rte_lcore_id(); > > + if (lcore_id != (unsigned)LCORE_ID_ANY) { > > This is strange to see something that cannot happen: > lcore_id == LCORE_ID_ANY is only possible after your patch is 12/17 > is added. Maybe it can be reordered to avoid this inconsistency? [LCM] You're right, here do some re-order. The point is to make everything ready before switching the default value to -1. And we can have the whole function implement in one patch. It just won't take effect, but won't bring additional risk. > > > + /* EAL thread */ > > + tid = lcore_config[lcore_id].thread_id; > > + > > + s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp); > > + if (s != 0) { > > + RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n"); > > + return -1; > > + } > > + > > + /* store socket_id in TLS for quick access */ > > + RTE_PER_LCORE(_socket_id) = > > + eal_cpuset_socket_id(cpusetp); > > + > > + /* store cpuset in TLS for quick access */ > > + rte_memcpy(&RTE_PER_LCORE(_cpuset), cpusetp, > > + sizeof(rte_cpuset_t)); > > + > > + /* update lcore_config */ > > + lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id); > > + rte_memcpy(&lcore_config[lcore_id].cpuset, cpusetp, > > + sizeof(rte_cpuset_t)); > > + } else { > > + /* none EAL thread */ > > + tid = pthread_self(); > > + > > + s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp); > > + if (s != 0) { > > + RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n"); > > + return -1; > > + } > > + > > + /* store cpuset in TLS for quick access */ > > + rte_memcpy(&RTE_PER_LCORE(_cpuset), cpusetp, > > + sizeof(rte_cpuset_t)); > > + > > + /* store socket_id in TLS for quick access */ > > + RTE_PER_LCORE(_socket_id) = > > + eal_cpuset_socket_id(cpusetp); > > + } > > Why not always using pthread_self() to get the tid? [LCM] Good point, I haven't notice it. > > I think most of the code could be factorized here. The only difference > (which is hard to see as is as code is not exactly ordered in the same > manner) is that the config is updated in case it's an EAL thread. [LCM] Accept. > > > > > + > > + return 0; > > +} > > + > > +int > > +rte_thread_get_affinity(rte_cpuset_t *cpusetp) > > +{ > > + if (!cpusetp) > > + return -1; > > Same here. This is the only reason why rte_thread_get_affinity() could > fail. Removing this test would allow to change the API to return void > instead. It will avoid a useless test below in > eal_thread_dump_affinity(). [LCM] The cpusetp is used as destination of memcpy and the function suppose an EAL API. I don't think it's a good idea to remove the check, do you ? > > > + > > + rte_memcpy(cpusetp, &RTE_PER_LCORE(_cpuset), > > + sizeof(rte_cpuset_t)); > > + > > + return 0; > > +} > > + > > +void > > +eal_thread_dump_affinity(char str[], unsigned size) > > +{ > > + rte_cpuset_t cpuset; > > + unsigned cpu; > > + int ret; > > + unsigned int out = 0; > > + > > + if (rte_thread_get_affinity(&cpuset) < 0) { > > + str[0] = '\0'; > > + return; > > + } > > This one could be removed it the (== NULL) test is removed. > > > + > > + for (cpu = 0; cpu < RTE_MAX_LCORE; cpu++) { > > + if (!CPU_ISSET(cpu, &cpuset)) > > + continue; > > + > > + ret = snprintf(str + out, > > + size - out, "%u,", cpu); > > + if (ret < 0 || (unsigned)ret >= size - out) > > + break; > > On the contrary, I think here returning an error to the user > would be useful so he can knows that the dump is not complete. [LCM] accept. > > > Regards, > Olivier
[dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread by assigned cpuset
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:01 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread by > assigned cpuset > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > EAL threads use assigned cpuset to set core affinity during startup. > > It keeps 1:1 mapping, if no '--lcores' option is used. > > > > [...] > > > > lib/librte_eal/bsdapp/eal/eal.c | 13 --- > > lib/librte_eal/bsdapp/eal/eal_thread.c | 63 > > +- > > lib/librte_eal/linuxapp/eal/eal.c| 7 +++- > > lib/librte_eal/linuxapp/eal/eal_thread.c | 67 > > +++- > > 4 files changed, 54 insertions(+), 96 deletions(-) > > > > diff --git a/lib/librte_eal/bsdapp/eal/eal.c > > b/lib/librte_eal/bsdapp/eal/eal.c > > index 69f3c03..98c5a83 100644 > > --- a/lib/librte_eal/bsdapp/eal/eal.c > > +++ b/lib/librte_eal/bsdapp/eal/eal.c > > @@ -432,6 +432,7 @@ rte_eal_init(int argc, char **argv) > > int i, fctret, ret; > > pthread_t thread_id; > > static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0); > > + char cpuset[CPU_STR_LEN]; > > > > if (!rte_atomic32_test_and_set(&run_once)) > > return -1; > > @@ -502,13 +503,17 @@ rte_eal_init(int argc, char **argv) > > if (rte_eal_pci_init() < 0) > > rte_panic("Cannot init PCI\n"); > > > > - RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%p)\n", > > - rte_config.master_lcore, thread_id); > > - > > eal_check_mem_on_local_socket(); > > > > rte_eal_mcfg_complete(); > > > > + eal_thread_init_master(rte_config.master_lcore); > > + > > + eal_thread_dump_affinity(cpuset, CPU_STR_LEN); > > + > > + RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s])\n", > > + rte_config.master_lcore, thread_id, cpuset); > > + > > if (rte_eal_dev_init() < 0) > > rte_panic("Cannot init pmd devices\n"); > > > > @@ -532,8 +537,6 @@ rte_eal_init(int argc, char **argv) > > rte_panic("Cannot create thread\n"); > > } > > > > - eal_thread_init_master(rte_config.master_lcore); > > - > > /* > > * Launch a dummy function on all slave lcores, so that master lcore > > * knows they are all ready when this function returns. > > I wonder if changing this may have an impact on third-party drivers > that already use a management thread. Before the patch, the init() > function of the external library was called with default affinities, > and now it's called with the affinity from master lcore. > > I think it should at least be noticed in the commit log. > > Why are you doing this change? (I don't say it's a bad change, but > I don't understand why you are doing it here) [LCM] To be honest, the main purpose is I don't found any reason to have linuxapp and freebsdapp in different init sequence. I means in linux it init_master before dev_init(), but in freebsd it reverse. And as the default value of TLS already changes, if dev_init() first and using those TLS, the result will be not in an EAL thread. But actually they're in the EAL master thread. So I prefer to do the change follows linuxapp sequence. > > > > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c > b/lib/librte_eal/bsdapp/eal/eal_thread.c > > index d0c077b..5b16302 100644 > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c > > @@ -103,55 +103,27 @@ eal_thread_set_affinity(void) > > { > > int s; > > pthread_t thread; > > - > > -/* > > - * According to the section VERSIONS of the CPU_ALLOC man page: > > - * > > - * The CPU_ZERO(), CPU_SET(), CPU_CLR(), and CPU_ISSET() macros were > added > > - * in glibc 2.3.3. > > - * > > - * CPU_COUNT() first appeared in glibc 2.6. > > - * > > - * CPU_AND(), CPU_OR(), CPU_XOR(),CPU_EQUAL(), > CPU_ALLOC(), > > - * CPU_ALLOC_SIZE(), CPU_FREE(), CPU_ZERO_S(), CPU_SET_S(), > CPU_CLR_S(), > > - * CPU_ISSET_S(), CPU_AND_S(), CPU_OR_S(), CPU_XOR_S(), and > CPU_EQUAL_S() > > - * first appeared in glibc 2.7. > > - */ > > -#if defined(CPU_ALLOC) > > - size_t size; > > - cpu_set_t *cpusetp; > > - > > - cpusetp = CPU_ALLOC(RTE_MAX_LCORE); >
[dpdk-dev] [PATCH v4 09/17] enic: fix re-define freebsd compile complain
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:01 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 09/17] enic: fix re-define freebsd compile > complain > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > Some macro already been defined by freebsd 'sys/param.h'. > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_pmd_enic/enic.h| 1 + > > lib/librte_pmd_enic/enic_compat.h | 1 + > > 2 files changed, 2 insertions(+) > > > > diff --git a/lib/librte_pmd_enic/enic.h b/lib/librte_pmd_enic/enic.h > > index c43417c..189c3b9 100644 > > --- a/lib/librte_pmd_enic/enic.h > > +++ b/lib/librte_pmd_enic/enic.h > > @@ -66,6 +66,7 @@ > > #define ENIC_CALC_IP_CKSUM 1 > > #define ENIC_CALC_TCP_UDP_CKSUM 2 > > #define ENIC_MAX_MTU9000 > > +#undef PAGE_SIZE > > #define PAGE_SIZE 4096 > > #define PAGE_ROUND_UP(x) \ > > unsigned long)(x)) + PAGE_SIZE-1) & (~(PAGE_SIZE-1))) > > diff --git a/lib/librte_pmd_enic/enic_compat.h > b/lib/librte_pmd_enic/enic_compat.h > > index b1af838..b84c766 100644 > > --- a/lib/librte_pmd_enic/enic_compat.h > > +++ b/lib/librte_pmd_enic/enic_compat.h > > @@ -67,6 +67,7 @@ > > #define pr_warn(y, args...) dev_warning(0, y, ##args) > > #define BUG() pr_err("BUG at %s:%d", __func__, __LINE__) > > > > +#undef ALIGN > > #define ALIGN(x, a) __ALIGN_MASK(x, (typeof(x))(a)-1) > > #define __ALIGN_MASK(x, mask)(((x)+(mask))&~(mask)) > > #define udelay usleep > > > > Is the issue caused by a change you've made previously in the patch > series? [LCM] Yes, caused by [01/17] which include in freebsdapp. > > Wouldn't it be better to rename the macros in enic instead of doing > #undef? [LCM] Agree, will do it. > > Regards, > Olivier
[dpdk-dev] [PATCH v4 10/17] malloc: fix the issue of SOCKET_ID_ANY
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:01 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 10/17] malloc: fix the issue of > SOCKET_ID_ANY > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > Add check for rte_socket_id(), avoid get unexpected return like (-1). > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_malloc/malloc_heap.h | 7 ++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/lib/librte_malloc/malloc_heap.h > > b/lib/librte_malloc/malloc_heap.h > > index b4aec45..a47136d 100644 > > --- a/lib/librte_malloc/malloc_heap.h > > +++ b/lib/librte_malloc/malloc_heap.h > > @@ -44,7 +44,12 @@ extern "C" { > > static inline unsigned > > malloc_get_numa_socket(void) > > { > > - return rte_socket_id(); > > + unsigned socket_id = rte_socket_id(); > > + > > + if (socket_id == (unsigned)SOCKET_ID_ANY) > > + return 0; > > + > > + return socket_id; > > } > > > > void * > > > > The documentation off rte_malloc_socket() says: > > @param socket > NUMA socket to allocate memory on. If SOCKET_ID_ANY is used, this > function will behave the same as rte_malloc(). > > void * > rte_malloc_socket(const char *type, size_t size, unsigned align, int > socket); > > > Your patch changes the behavior of rte_malloc() without explaining > why, and the documentation becomes wrong. > > Can you explain why you need this change? [LCM] I don't think I change the declaration of rte_malloc_socket(). If socket_arg=SOCKET_ID_ANY, the socket value expect to the return value of malloc_get_numa_socket(). The malloc_get_numa_socket() supposed to return the correct TLS _socket_id. It works fine for normal cases. But as we change the default value of TLS _socket_id to SOCKET_ID_ANY. And one lcore can run on multiple cpu, if all cpus in the cpuset are not belongs to one NUMA node, the _socket_id would be SOCKET_ID_ANY. When user call rte_malloc_socket(SOCKET_ID_ANY), it does provide the same behavior as rte_malloc(). They both will get socket_id from malloc_get_numa_socket(). The addition part is the exception path process. > > Regards, > Olivier
[dpdk-dev] [PATCH v4 11/17] log: fix the gap to support non-EAL thread
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:01 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 11/17] log: fix the gap to support non-EAL > thread > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > For those non-EAL thread, *_lcore_id* is invalid and probably larger than > RTE_MAX_LCORE. > > The patch adds the check and allows only EAL thread using EAL per thread log > level and log type. > > Others shares the global log level. > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_eal/common/eal_common_log.c | 17 +++-- > > lib/librte_eal/common/include/rte_log.h | 5 + > > 2 files changed, 20 insertions(+), 2 deletions(-) > > > > diff --git a/lib/librte_eal/common/eal_common_log.c > b/lib/librte_eal/common/eal_common_log.c > > index cf57619..e8dc94a 100644 > > --- a/lib/librte_eal/common/eal_common_log.c > > +++ b/lib/librte_eal/common/eal_common_log.c > > @@ -193,11 +193,20 @@ rte_set_log_type(uint32_t type, int enable) > > rte_logs.type &= (~type); > > } > > > > +/* Get global log type */ > > +uint32_t > > +rte_get_log_type(void) > > +{ > > + return rte_logs.type; > > +} > > + > > /* get the current loglevel for the message beeing processed */ > > int rte_log_cur_msg_loglevel(void) > > { > > unsigned lcore_id; > > lcore_id = rte_lcore_id(); > > + if (lcore_id >= RTE_MAX_LCORE) > > + return rte_get_log_level(); > > return log_cur_msg[lcore_id].loglevel; > > } > > > > @@ -206,6 +215,8 @@ int rte_log_cur_msg_logtype(void) > > { > > unsigned lcore_id; > > lcore_id = rte_lcore_id(); > > + if (lcore_id >= RTE_MAX_LCORE) > > + return rte_get_log_type(); > > return log_cur_msg[lcore_id].logtype; > > } > > > > @@ -265,8 +276,10 @@ rte_vlog(__attribute__((unused)) uint32_t level, > > > > /* save loglevel and logtype in a global per-lcore variable */ > > lcore_id = rte_lcore_id(); > > - log_cur_msg[lcore_id].loglevel = level; > > - log_cur_msg[lcore_id].logtype = logtype; > > + if (lcore_id < RTE_MAX_LCORE) { > > + log_cur_msg[lcore_id].loglevel = level; > > + log_cur_msg[lcore_id].logtype = logtype; > > + } > > > > ret = vfprintf(f, format, ap); > > fflush(f); > > diff --git a/lib/librte_eal/common/include/rte_log.h > b/lib/librte_eal/common/include/rte_log.h > > index db1ea08..f83a0d9 100644 > > --- a/lib/librte_eal/common/include/rte_log.h > > +++ b/lib/librte_eal/common/include/rte_log.h > > @@ -144,6 +144,11 @@ uint32_t rte_get_log_level(void); > > void rte_set_log_type(uint32_t type, int enable); > > > > /** > > + * Get the global log type. > > + */ > > +uint32_t rte_get_log_type(void); > > + > > +/** > > * Get the current loglevel for the message being processed. > > * > > * Before calling the user-defined stream for logging, the log > > > > Wouldn't it be better to change the variable: > static struct log_cur_msg log_cur_msg[RTE_MAX_LCORE]; > into a pthread (tls) variable? > > With your patch, the log level and log type are not saved for > non-EAL threads. If TLS were used, I think it would work in any case. [LCM] Good point. But for this patch set, still suppose not involve big impact to EAL thread. For improve non-EAL thread, we'll have a separate patch set for it. > > Regards, > Olivier
[dpdk-dev] [PATCH v4 12/17] eal: set _lcore_id and _socket_id to (-1) by default
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:01 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 12/17] eal: set _lcore_id and _socket_id to > (-1) > by default > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > For those none EAL thread, *_lcore_id* shall always be LCORE_ID_ANY. > > The libraries using *_lcore_id* as index need to take care. > > *_socket_id* always be SOCKET_ID_ANY unitl the thread changes the affinity > > unitl -> until [LCM] accept. > > > by rte_thread_set_affinity() > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_eal/bsdapp/eal/eal_thread.c | 4 ++-- > > lib/librte_eal/linuxapp/eal/eal_thread.c | 4 ++-- > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c > b/lib/librte_eal/bsdapp/eal/eal_thread.c > > index 5b16302..2b3c9a8 100644 > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c > > @@ -56,8 +56,8 @@ > > #include "eal_private.h" > > #include "eal_thread.h" > > > > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); > > -RTE_DEFINE_PER_LCORE(unsigned, _socket_id); > > +RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY; > > +RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY; > > RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > > > > /* > > diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c > b/lib/librte_eal/linuxapp/eal/eal_thread.c > > index 6eb1525..ab94e20 100644 > > --- a/lib/librte_eal/linuxapp/eal/eal_thread.c > > +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c > > @@ -57,8 +57,8 @@ > > #include "eal_private.h" > > #include "eal_thread.h" > > > > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); > > -RTE_DEFINE_PER_LCORE(unsigned, _socket_id); > > +RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY; > > +RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY; > > RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > > As far as I understand, now a rte_lcore_id() can return LCORE_ID_ANY. > This should be modified in the rte_lcore_id() API comments. > > Same for rte_socket_id(). [LCM] accept. > > I also wonder if the API of these functions should be modified to > return an int instead of an unsigned as LCORE_ID_ANY is -1. [LCM] I prefer not change the API definition. (unsigned)LCORE_ID_ANY already used before. > > Regards, > Olivier
[dpdk-dev] [PATCH v4 14/17] mempool: add support to non-EAL thread
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:01 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 14/17] mempool: add support to non-EAL > thread > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > For non-EAL thread, bypass per lcore cache, directly use ring pool. > > It allows using rte_mempool in either EAL thread or any user pthread. > > As in non-EAL thread, it directly rely on rte_ring and it's none preemptive. > > It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool. > > It will get bad performance and has critical risk if scheduling policy is > > RT. > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_mempool/rte_mempool.h | 18 +++--- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/lib/librte_mempool/rte_mempool.h > b/lib/librte_mempool/rte_mempool.h > > index 3314651..4845f27 100644 > > --- a/lib/librte_mempool/rte_mempool.h > > +++ b/lib/librte_mempool/rte_mempool.h > > @@ -198,10 +198,12 @@ struct rte_mempool { > > * Number to add to the object-oriented statistics. > > */ > > #ifdef RTE_LIBRTE_MEMPOOL_DEBUG > > -#define __MEMPOOL_STAT_ADD(mp, name, n) do { \ > > - unsigned __lcore_id = rte_lcore_id(); \ > > - mp->stats[__lcore_id].name##_objs += n; \ > > - mp->stats[__lcore_id].name##_bulk += 1; \ > > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\ > > + unsigned __lcore_id = rte_lcore_id(); \ > > + if (__lcore_id < RTE_MAX_LCORE) { \ > > + mp->stats[__lcore_id].name##_objs += n; \ > > + mp->stats[__lcore_id].name##_bulk += 1; \ > > + } \ > > Does it mean that we have no statistics for non-EAL threads? > (same question for rings and timers in the next patches) [LCM] Yes, it is in this patch set, mainly focus on EAL thread and make sure no running issue on non-EAL thread. For full non-EAL function, will have other patch set to enhance non-EAL thread as the 2nd step. > > > > } while(0) > > #else > > #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0) > > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp, void > * const *obj_table, > > __MEMPOOL_STAT_ADD(mp, put, n); > > > > #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > > - /* cache is not enabled or single producer */ > > - if (unlikely(cache_size == 0 || is_mp == 0)) > > + /* cache is not enabled or single producer or none EAL thread */ > > + if (unlikely(cache_size == 0 || is_mp == 0 || > > +lcore_id >= RTE_MAX_LCORE)) > > goto ring_enqueue; > > > > /* Go straight to ring if put would overflow mem allocated for cache */ > > @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void > **obj_table, > > uint32_t cache_size = mp->cache_size; > > > > /* cache is not enabled or single consumer */ > > - if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size)) > > + if (unlikely(cache_size == 0 || is_mc == 0 || > > +n >= cache_size || lcore_id >= RTE_MAX_LCORE)) > > goto ring_dequeue; > > > > cache = &mp->local_cache[lcore_id]; > > > > What is the performance impact of adding this test? [LCM] By perf in unit test, it's almost the same. But haven't measure EAL thread and non-EAL thread share the same mempool. > > > Regards, > Olivier
[dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread lcore_config
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Tuesday, February 10, 2015 1:07 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread > lcore_config > > Hi, > > On 02/09/2015 12:33 PM, Liang, Cunming wrote: > >> On 02/02/2015 03:02 AM, Cunming Liang wrote: > >>> The patch adds 'cpuset' into per-lcore configure 'lcore_config[]', > >>> as the lcore no longer always 1:1 pinning with physical cpu. > >>> The lcore now stands for a EAL thread rather than a logical cpu. > >>> > >>> It doesn't change the default behavior of 1:1 mapping, but allows to > >>> affinity the EAL thread to multiple cpus. > >>> > >>> [...] > >>> diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c > >> b/lib/librte_eal/bsdapp/eal/eal_memory.c > >>> index 65ee87d..a34d500 100644 > >>> --- a/lib/librte_eal/bsdapp/eal/eal_memory.c > >>> +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c > >>> @@ -45,6 +45,8 @@ > >>> #include "eal_internal_cfg.h" > >>> #include "eal_filesystem.h" > >>> > >>> +/* avoid re-defined against with freebsd header */ > >>> +#undef PAGE_SIZE > >>> #define PAGE_SIZE (sysconf(_SC_PAGESIZE)) > >> > >> I don't see the link with the patch. Should this go somewhere else? > > Maybe you missed this one. [LCM] Yes, I missed this one. I agree to move to a separate one and remove undef but rename the PAGE_SIZE to EAL_PAGE_SIZE. > > > >>> diff --git a/lib/librte_eal/common/include/rte_lcore.h > >> b/lib/librte_eal/common/include/rte_lcore.h > >>> index 49b2c03..4c7d6bb 100644 > >>> --- a/lib/librte_eal/common/include/rte_lcore.h > >>> +++ b/lib/librte_eal/common/include/rte_lcore.h > >>> @@ -50,6 +50,13 @@ extern "C" { > >>> > >>> #define LCORE_ID_ANY -1/**< Any lcore. */ > >>> > >>> +#if defined(__linux__) > >>> + typedef cpu_set_t rte_cpuset_t; > >>> +#elif defined(__FreeBSD__) > >>> +#include > >>> + typedef cpuset_t rte_cpuset_t; > >>> +#endif > >>> + > >> > >> Should we also define RTE_CPU_SETSIZE? > >> For linux, should be included? > > [LCM] It uses the fix size cpuset, won't use CPU_ALLOC() to get the pointer > > of > cpuset. > > The RTE_CPU_SETSIZE always equal to sizeof(rte_cpuset_t). > > The advantage of using CPU_ALLOC() is to avoid issues when the number > of core will be higher than 1024. I agree it's probably a bit early > to think about this, but it could happen soon :) > > > >> If I understand well, after the patch series, the user of > >> rte_thread_set_affinity() and rte_thread_get_affinity() are > >> supposed to use the macros from sched.h to access to this > >> cpuset parameter. So I'm wondering if it's not better to > >> use cpu_set_t from libc instead of redefining rte_cpuset_t. > >> > >> To reword my question: what is the purpose of redefining > >> cpu_set_t in rte_cpuset_t if we still need to use all the > >> libc API to access to it? > > [LCM] In linux the type is *cpu_set_t*, but in freebsd it's *cpuset_t*. > > The purpose of *rte_cpuset_t* is to make the consistent type definition in > > EAL, > and to avoid lots of #ifdef for this diff. > > In either linux or freebsd, it still can use the MACRO in libc to set the > rte_cpuset_t. > > OK, it makes sense then. I did not notice the difference between linux > and bsd.
[dpdk-dev] [PATCH v4 03/17] eal: fix wrong strnlen() return value in 32bit icc
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Tuesday, February 10, 2015 1:13 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 03/17] eal: fix wrong strnlen() return > value in > 32bit icc > > Hi, > > On 02/09/2015 12:57 PM, Liang, Cunming wrote: > >>> @@ -469,7 +469,7 @@ eal_parse_lcores(const char *lcores) > >>> /* Remove all blank characters ahead and after */ > >>> while (isblank(*lcores)) > >>> lcores++; > >>> - i = strnlen(lcores, sysconf(_SC_ARG_MAX)); > >>> + i = strnlen(lcores, PATH_MAX); > >>> while ((i > 0) && isblank(lcores[i - 1])) > >>> i--; > >>> > >>> > >> > >> I think PATH_MAX is not equivalent to _SC_ARG_MAX. > >> > >> But the main question is: why do we need to use strnlen() here instead > >> of strlen? We can expect that argv[] pointers are always nul-terminated. > >> Replacing them by strlen() would probably also solve the icc issue. > > [LCM] You're right, here strlen() also solve icc issue and no risk for > > argv[]. > > But follows practice suggestion, keeping using those with 'n' function in > > DPDK is > not bad. > > There's additional two reason to keep strnlen and PATH_MAX. > > 1. PATH_MAX is defined as 4096 which is enough as our input. It doesn't > > matter > to be _SC_ARG_MAX or not. > > PATH_MAX is 4096 but it's not related to the maximum argument length. > > > 2. strnlen and PATH_MAX already used in eal_parse_coremask, to keep the > style consistent in '-l' and '--lcores'. > > I don't think it's a valid argument. > > What is the problem of using strlen()? It looks it solves all the > issues. Using strlen on valid strings is not a security issue. [LCM] All right, I buy in your point. > > > Regards, > Olivier
[dpdk-dev] [PATCH v4 05/17] eal: new TLS definition and API declaration
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Tuesday, February 10, 2015 1:26 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 05/17] eal: new TLS definition and API > declaration > > Hi, > > On 02/09/2015 01:45 PM, Liang, Cunming wrote: > >>> +/** > >>> + * Dump the current pthread cpuset. > >>> + * This function is private to EAL. > >>> + * > >>> + * @param str > >>> + * The string buffer the cpuset will dump to. > >>> + * @param size > >>> + * The string buffer size. > >>> + */ > >>> +#define CPU_STR_LEN256 > >>> +void > >>> +eal_thread_dump_affinity(char str[], unsigned size); > >> > >> Although it's equivalent for function arguments, I think "char *str" is > >> usually preferred over "char str[]". See for instance in snprintf() or > >> fgets(). > > [LCM] Accept. > >> > >> What is the purpose of CPU_STR_LEN? > > [LCM] For default quick reference for str[] definition used in > > dump_affinity() > > So the API comment of the function is not placed at the right > place. > > A comment "Default buffer size to use with eal_thread_dump_affinity()" > should be added above CPU_STR_LEN. Also, it could be renamed in > RTE_CPU_STR_LEN or RTE_CPU_AFFINITY_STR_LEN. [LCM] Got you. > > > > >>> @@ -80,7 +81,9 @@ struct lcore_config { > >>> */ > >>> extern struct lcore_config lcore_config[RTE_MAX_LCORE]; > >>> > >>> -RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */ > >>> +RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per thread "lcore id". > >> */ > >>> +RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket > id". > >> */ > >>> +RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread > "cpuset". > >> */ > >>> > >>> /** > >>> * Return the ID of the execution unit we are running on. > >>> @@ -146,7 +149,7 @@ rte_lcore_index(int lcore_id) > >>> static inline unsigned > >>> rte_socket_id(void) > >>> { > >>> - return lcore_config[rte_lcore_id()].socket_id; > >>> + return RTE_PER_LCORE(_socket_id); > >>> } > >> > >> I don't see where the _socket_id variable is assigned. I think there > >> is probably an issue with the splitting of the patches. > > [LCM] The value initializes as SOCKET_ID_ANY when RTE_DEFINE_PER_LCORE(). > > And updated in eal_thread_set_affinity() for EAL thread and > rte_thread_set_affinity() for non-EAL thread. > > This is done in a later patches: > > "eal: set _lcore_id and _socket_id to (-1) by default" > "eal: apply affinity of EAL thread by assigned cpuset" > > That's why I said there is probably an issue with the ordering > of the patches as these values are used here but initialized > later in the series. [LCM] Will reorder them in next version.
[dpdk-dev] [PATCH v4 06/17] eal: add eal_common_thread.c for common thread API
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Tuesday, February 10, 2015 1:30 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 06/17] eal: add eal_common_thread.c for > common thread API > > Hi, > > On 02/09/2015 02:12 PM, Liang, Cunming wrote: > >>> +int > >>> +rte_thread_get_affinity(rte_cpuset_t *cpusetp) > >>> +{ > >>> + if (!cpusetp) > >>> + return -1; > >> > >> Same here. This is the only reason why rte_thread_get_affinity() could > >> fail. Removing this test would allow to change the API to return void > >> instead. It will avoid a useless test below in > >> eal_thread_dump_affinity(). > > [LCM] The cpusetp is used as destination of memcpy and the function suppose > an EAL API. > > I don't think it's a good idea to remove the check, do you ? > > I know we often have debate on this subject on the list. My personal > opinion is that checking a NULL pointer in these cases is useless > because the user is suppose to give a non-NULL pointer. Returning > an error will result in managing an error for something that cannot > happen. > > On the other hand, adding an assert() (or the dpdk equivalent) would > be a good idea. [LCM] Ok, I see. Will update it. > > > >> > >>> + > >>> + rte_memcpy(cpusetp, &RTE_PER_LCORE(_cpuset), > >>> +sizeof(rte_cpuset_t)); > >>> + > >>> + return 0; > >>> +} > >>> + > >>> +void > >>> +eal_thread_dump_affinity(char str[], unsigned size) > >>> +{ > >>> + rte_cpuset_t cpuset; > >>> + unsigned cpu; > >>> + int ret; > >>> + unsigned int out = 0; > >>> + > >>> + if (rte_thread_get_affinity(&cpuset) < 0) { > >>> + str[0] = '\0'; > >>> + return; > >>> + } > >> > >> This one could be removed it the (== NULL) test is removed. > >> > >>> + > >>> + for (cpu = 0; cpu < RTE_MAX_LCORE; cpu++) { > >>> + if (!CPU_ISSET(cpu, &cpuset)) > >>> + continue; > >>> + > >>> + ret = snprintf(str + out, > >>> +size - out, "%u,", cpu); > >>> + if (ret < 0 || (unsigned)ret >= size - out) > >>> + break; > >> > >> On the contrary, I think here returning an error to the user > >> would be useful so he can knows that the dump is not complete. > > [LCM] accept. > >> > >> > >> Regards, > >> Olivier
[dpdk-dev] [PATCH v4 12/17] eal: set _lcore_id and _socket_id to (-1) by default
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Tuesday, February 10, 2015 1:49 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 12/17] eal: set _lcore_id and _socket_id to > (-1) > by default > > Hi, > > On 02/09/2015 03:24 PM, Liang, Cunming wrote: > >>> --- a/lib/librte_eal/linuxapp/eal/eal_thread.c > >>> +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c > >>> @@ -57,8 +57,8 @@ > >>> #include "eal_private.h" > >>> #include "eal_thread.h" > >>> > >>> -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); > >>> -RTE_DEFINE_PER_LCORE(unsigned, _socket_id); > >>> +RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY; > >>> +RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = > (unsigned)SOCKET_ID_ANY; > >>> RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > >> > >> As far as I understand, now a rte_lcore_id() can return LCORE_ID_ANY. > >> This should be modified in the rte_lcore_id() API comments. > >> > >> Same for rte_socket_id(). > > [LCM] accept. > >> > >> I also wonder if the API of these functions should be modified to > >> return an int instead of an unsigned as LCORE_ID_ANY is -1. > > [LCM] I prefer not change the API definition. (unsigned)LCORE_ID_ANY already > used before. > > OK > > And what about directly defining the following? > > #define LCORE_ID_ANY ((unsigned)-1) > > > It would avoid the casts. [LCM] Good point, will update it.
[dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread by assigned cpuset
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Tuesday, February 10, 2015 1:37 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread by > assigned cpuset > > Hi, > > On 02/09/2015 02:48 PM, Liang, Cunming wrote: > >> -Original Message- > >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > >> Sent: Monday, February 09, 2015 4:01 AM > >> To: Liang, Cunming; dev at dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread > >> by > >> assigned cpuset > >> > >> Hi, > >> > >> On 02/02/2015 03:02 AM, Cunming Liang wrote: > >>> EAL threads use assigned cpuset to set core affinity during startup. > >>> It keeps 1:1 mapping, if no '--lcores' option is used. > >>> > >>> [...] > >>> > >>> lib/librte_eal/bsdapp/eal/eal.c | 13 --- > >>> lib/librte_eal/bsdapp/eal/eal_thread.c | 63 > >>> +- > >>> lib/librte_eal/linuxapp/eal/eal.c| 7 +++- > >>> lib/librte_eal/linuxapp/eal/eal_thread.c | 67 > >>> +++- > >>> 4 files changed, 54 insertions(+), 96 deletions(-) > >>> > >>> diff --git a/lib/librte_eal/bsdapp/eal/eal.c > >>> b/lib/librte_eal/bsdapp/eal/eal.c > >>> index 69f3c03..98c5a83 100644 > >>> --- a/lib/librte_eal/bsdapp/eal/eal.c > >>> +++ b/lib/librte_eal/bsdapp/eal/eal.c > >>> @@ -432,6 +432,7 @@ rte_eal_init(int argc, char **argv) > >>> int i, fctret, ret; > >>> pthread_t thread_id; > >>> static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0); > >>> + char cpuset[CPU_STR_LEN]; > >>> > >>> if (!rte_atomic32_test_and_set(&run_once)) > >>> return -1; > >>> @@ -502,13 +503,17 @@ rte_eal_init(int argc, char **argv) > >>> if (rte_eal_pci_init() < 0) > >>> rte_panic("Cannot init PCI\n"); > >>> > >>> - RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%p)\n", > >>> - rte_config.master_lcore, thread_id); > >>> - > >>> eal_check_mem_on_local_socket(); > >>> > >>> rte_eal_mcfg_complete(); > >>> > >>> + eal_thread_init_master(rte_config.master_lcore); > >>> + > >>> + eal_thread_dump_affinity(cpuset, CPU_STR_LEN); > >>> + > >>> + RTE_LOG(DEBUG, EAL, "Master lcore %u is ready > (tid=%p;cpuset=[%s])\n", > >>> + rte_config.master_lcore, thread_id, cpuset); > >>> + > >>> if (rte_eal_dev_init() < 0) > >>> rte_panic("Cannot init pmd devices\n"); > >>> > >>> @@ -532,8 +537,6 @@ rte_eal_init(int argc, char **argv) > >>> rte_panic("Cannot create thread\n"); > >>> } > >>> > >>> - eal_thread_init_master(rte_config.master_lcore); > >>> - > >>> /* > >>>* Launch a dummy function on all slave lcores, so that master lcore > >>>* knows they are all ready when this function returns. > >> > >> I wonder if changing this may have an impact on third-party drivers > >> that already use a management thread. Before the patch, the init() > >> function of the external library was called with default affinities, > >> and now it's called with the affinity from master lcore. > >> > >> I think it should at least be noticed in the commit log. > >> > >> Why are you doing this change? (I don't say it's a bad change, but > >> I don't understand why you are doing it here) > > [LCM] To be honest, the main purpose is I don't found any reason to have > linuxapp and freebsdapp in different init sequence. > > I means in linux it init_master before dev_init(), but in freebsd it > > reverse. > > > I agree that's something we should fix. > > > > And as the default value of TLS already changes, if dev_init() first and > > using > those TLS, the result will be not in an EAL thread. > > But actually they're in the EAL master thread. So I prefer to do the change > follows linuxapp sequence. > > That makes sense. Is it possible to have this reordering in a separate > patch? The titl
[dpdk-dev] [PATCH v4 11/17] log: fix the gap to support non-EAL thread
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Tuesday, February 10, 2015 1:45 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 11/17] log: fix the gap to support non-EAL > thread > > Hi, > > On 02/09/2015 03:19 PM, Liang, Cunming wrote: > >>> --- a/lib/librte_eal/common/include/rte_log.h > >>> +++ b/lib/librte_eal/common/include/rte_log.h > >>> @@ -144,6 +144,11 @@ uint32_t rte_get_log_level(void); > >>> void rte_set_log_type(uint32_t type, int enable); > >>> > >>> /** > >>> + * Get the global log type. > >>> + */ > >>> +uint32_t rte_get_log_type(void); > >>> + > >>> +/** > >>> * Get the current loglevel for the message being processed. > >>> * > >>> * Before calling the user-defined stream for logging, the log > >>> > >> > >> Wouldn't it be better to change the variable: > >> static struct log_cur_msg log_cur_msg[RTE_MAX_LCORE]; > >> into a pthread (tls) variable? > >> > >> With your patch, the log level and log type are not saved for > >> non-EAL threads. If TLS were used, I think it would work in any case. > > [LCM] Good point. But for this patch set, still suppose not involve big > > impact to > EAL thread. > > For improve non-EAL thread, we'll have a separate patch set for it. > > OK, that's fine > > Will it be for 2.0 or later? [LCM] Will be in 2.1 I suppose, together with the patch for mempool cache to support non-EAL thread.
[dpdk-dev] [PATCH v4 14/17] mempool: add support to non-EAL thread
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Tuesday, February 10, 2015 1:52 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 14/17] mempool: add support to non-EAL > thread > > Hi, > > On 02/09/2015 03:41 PM, Liang, Cunming wrote: > >>> #ifdef RTE_LIBRTE_MEMPOOL_DEBUG > >>> -#define __MEMPOOL_STAT_ADD(mp, name, n) do { \ > >>> - unsigned __lcore_id = rte_lcore_id(); \ > >>> - mp->stats[__lcore_id].name##_objs += n; \ > >>> - mp->stats[__lcore_id].name##_bulk += 1; \ > >>> +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\ > >>> + unsigned __lcore_id = rte_lcore_id(); \ > >>> + if (__lcore_id < RTE_MAX_LCORE) { \ > >>> + mp->stats[__lcore_id].name##_objs += n; \ > >>> + mp->stats[__lcore_id].name##_bulk += 1; \ > >>> + } \ > >> > >> Does it mean that we have no statistics for non-EAL threads? > >> (same question for rings and timers in the next patches) > > [LCM] Yes, it is in this patch set, mainly focus on EAL thread and make > > sure no > running issue on non-EAL thread. > > For full non-EAL function, will have other patch set to enhance non-EAL > > thread > as the 2nd step. > > OK > > >>> @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, > void > >> **obj_table, > >>> uint32_t cache_size = mp->cache_size; > >>> > >>> /* cache is not enabled or single consumer */ > >>> - if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size)) > >>> + if (unlikely(cache_size == 0 || is_mc == 0 || > >>> + n >= cache_size || lcore_id >= RTE_MAX_LCORE)) > >>> goto ring_dequeue; > >>> > >>> cache = &mp->local_cache[lcore_id]; > >>> > >> > >> What is the performance impact of adding this test? > > [LCM] By perf in unit test, it's almost the same. But haven't measure EAL > > thread > and non-EAL thread share the same mempool. > > > When you say "unit test", are you talking about mempool tests from > "make test"? Do you have some numbers to share? [LCM] I means DPDK app/test, run mempool_perf_test. Will add numbers on v5.
[dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhihong Wang > Sent: Thursday, January 29, 2015 10:39 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization > > This patch set optimizes memcpy for DPDK for both SSE and AVX platforms. > It also extends memcpy test coverage with unaligned cases and more test > points. > > Optimization techniques are summarized below: > > 1. Utilize full cache bandwidth > > 2. Enforce aligned stores > > 3. Apply load address alignment based on architecture features > > 4. Make load/store address available as early as possible > > 5. General optimization techniques like inlining, branch reducing, prefetch > pattern access > > -- > Changes in v2: > > 1. Reduced constant test cases in app/test/test_memcpy_perf.c for fast build > > 2. Modified macro definition for better code readability & safety > > Zhihong Wang (4): > app/test: Disabled VTA for memcpy test in app/test/Makefile > app/test: Removed unnecessary test cases in app/test/test_memcpy.c > app/test: Extended test coverage in app/test/test_memcpy_perf.c > lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE > and AVX platforms > > app/test/Makefile | 6 + > app/test/test_memcpy.c | 52 +- > app/test/test_memcpy_perf.c| 220 --- > .../common/include/arch/x86/rte_memcpy.h | 680 +++- > - > 4 files changed, 654 insertions(+), 304 deletions(-) > > -- > 1.9.3 Acked-by: Cunming Liang
[dpdk-dev] [PATCH v4 07/17] eal: add rte_gettid() to acquire unique system tid
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 09, 2015 4:01 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 07/17] eal: add rte_gettid() to acquire > unique > system tid > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > The rte_gettid() wraps the linux and freebsd syscall gettid(). > > It provides a persistent unique thread id for the calling thread. > > It will save the unique id in TLS on the first time. > > > > [...] > > > > +/** > > + * A wrap API for syscall gettid. > > + * > > + * @return > > + * On success, returns the thread ID of calling process. > > + * It always successful. > > + */ > > +int rte_sys_gettid(void); > > + > > +/** > > + * Get system unique thread id. > > + * > > + * @return > > + * On success, returns the thread ID of calling process. > > + * It always successful. > > + */ > > +static inline int rte_gettid(void) > > +{ > > + static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1; > > + if (RTE_PER_LCORE(_thread_id) == -1) > > + RTE_PER_LCORE(_thread_id) = rte_sys_gettid(); > > + return RTE_PER_LCORE(_thread_id); > > +} > > Instead of doing the test each time rte_gettid() is called, why not > having 2 functions: > rte_init_tid() -> assign the per_lcore variable > rte_gettid() -> return the per_lcore variable [LCM] The rte_gettid() mainly used in recursive spinlock. For non-EAL thread, we don't expect new user thread has to explicit call something. The purpose to call it in EAL thread init, is to lower down the overhead of the first calling for EAL thread. > > > > Regards, > Olivier
[dpdk-dev] [PATCH v4 17/17] timer: add support to non-EAL thread
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Wednesday, February 11, 2015 1:45 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 17/17] timer: add support to non-EAL thread > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > Allow to setup timers only for EAL (lcore) threads (__lcore_id < > MAX_LCORE_ID). > > E.g. ? dynamically created thread will be able to reset/stop timer for lcore > thread, > > but it will be not allowed to setup timer for itself or another non-lcore > > thread. > > rte_timer_manage() for non-lcore thread would simply do nothing and return > straightway. > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_timer/rte_timer.c | 40 +++-- > --- > > lib/librte_timer/rte_timer.h | 2 +- > > 2 files changed, 32 insertions(+), 10 deletions(-) > > > > diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c > > index 269a992..601c159 100644 > > --- a/lib/librte_timer/rte_timer.c > > +++ b/lib/librte_timer/rte_timer.c > > @@ -79,9 +79,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE]; > > > > /* when debug is enabled, store some statistics */ > > #ifdef RTE_LIBRTE_TIMER_DEBUG > > -#define __TIMER_STAT_ADD(name, n) do { \ > > - unsigned __lcore_id = rte_lcore_id(); \ > > - priv_timer[__lcore_id].stats.name += (n); \ > > +#define __TIMER_STAT_ADD(name, n) do { > > \ > > + unsigned __lcore_id = rte_lcore_id(); \ > > + if (__lcore_id < RTE_MAX_LCORE) \ > > + priv_timer[__lcore_id].stats.name += (n); \ > > } while(0) > > #else > > #define __TIMER_STAT_ADD(name, n) do {} while(0) > > @@ -127,15 +128,26 @@ timer_set_config_state(struct rte_timer *tim, > > unsigned lcore_id; > > > > lcore_id = rte_lcore_id(); > > + if (lcore_id >= RTE_MAX_LCORE) > > + lcore_id = LCORE_ID_ANY; > > Is this still valid? > In my understanding, rte_lcore_id() was returning the core id or > LCORE_ID_ANY if it's a non-EAL thread. [LCM] It's a nice to have one, in case lcore_id got an invalid number. We can add a assert to replace these two line. > > > > > /* wait that the timer is in correct status before update, > > * and mark it as being configured */ > > while (success == 0) { > > prev_status.u32 = tim->status.u32; > > > > + /* > > +* prevent race condition of non-EAL threads > > +* to update the timer. When 'owner == LCORE_ID_ANY', > > +* it means updated by a non-EAL thread. > > +*/ > > + if (lcore_id == (unsigned)LCORE_ID_ANY && > > + (uint16_t)lcore_id == prev_status.owner) > > + return -1; > > + > > Are you sure this is required? > > I think prev_status.owner can be LCORE_ID_ANY only in config state, > as a timer cannot be scheduled on a non-EAL thread. And there is > already a test that returns -1 if state is CONFIG. [LCM] Good point, whenever prev_status.owner == LCORE_ID_ANY, the prev_status.state must be RTE_TIMER_CONFIG. Make sense to me to remove the condition check. > > > > /* timer is running on another core, exit */ > > if (prev_status.state == RTE_TIMER_RUNNING && > > - (unsigned)prev_status.owner != lcore_id) > > + prev_status.owner != (uint16_t)lcore_id) > > return -1; > > > > /* timer is being configured on another core */ > > @@ -366,9 +378,13 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t > expire, > > > > /* round robin for tim_lcore */ > > if (tim_lcore == (unsigned)LCORE_ID_ANY) { > > - tim_lcore = rte_get_next_lcore(priv_timer[lcore_id].prev_lcore, > > - 0, 1); > > - priv_timer[lcore_id].prev_lcore = tim_lcore; > > + if (lcore_id < RTE_MAX_LCORE) { > > if (lcore_id != LCORE_ID_ANY) ? [LCM] Accept. > > > > + tim_lcore = rte_get_next_lcore( > > + priv_timer[lcore_id].prev_lcore, > > + 0, 1); > > + priv_timer[lcore_id].prev_lcore = tim_lcore; > > + } else > > +
[dpdk-dev] [PATCH v4 17/17] timer: add support to non-EAL thread
Hi, > -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Thursday, February 12, 2015 1:22 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 17/17] timer: add support to non-EAL thread > > Hi, > > On 02/11/2015 07:25 AM, Liang, Cunming wrote: > >>> + tim_lcore = rte_get_next_lcore( > >>> + priv_timer[lcore_id].prev_lcore, > >>> + 0, 1); > >>> + priv_timer[lcore_id].prev_lcore = tim_lcore; > >>> + } else > >>> + tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0, 1); > >> > >> I think the following line: > >> tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0, 1); > >> Will return the first enabled core. > >> > >> Maybe using rte_get_master_lcore() is clearer? > > [LCM] It doesn't expect must to be a master lcore. > > Any available lcore is fine, so I think make sense to just use the first > > enabled > core. > > Yes I agree it does not need to be the master lcore, but until recently > the definition of the master lcore was "the first enabled core". > > I was thinking rte_get_master_lcore() is easier to understand > that rte_get_next_lcore(LCORE_ID_ANY, 0, 1). If you still prefer > to keep the second one, can you add a comment saying something like > "non-EAL thread do not run rte_timer_manage(), so schedule the timer > on the first enabled lcore"? [LCM] That makes sense, will add it. Thanks. > > Thanks, > Olivier
[dpdk-dev] [PATCH v2 3/5] igb: enable rx queue interrupts for PF
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhou Danny > Sent: Tuesday, February 03, 2015 4:18 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 3/5] igb: enable rx queue interrupts for PF > > v2 changes > - Consolidate review comments related to coding style > > The patch does below for igb PF: > - Setup NIC to generate MSI-X interrupts > - Set the IVAR register to map interrupt causes to vectors > - Implement interrupt enable/disable functions > > Signed-off-by: Danny Zhou > Tested-by: Yong Liu > --- > lib/librte_pmd_e1000/e1000/e1000_hw.h | 3 + > lib/librte_pmd_e1000/e1000_ethdev.h | 6 + > lib/librte_pmd_e1000/igb_ethdev.c | 230 > ++ > 3 files changed, 214 insertions(+), 25 deletions(-) > > diff --git a/lib/librte_pmd_e1000/e1000/e1000_hw.h > b/lib/librte_pmd_e1000/e1000/e1000_hw.h > index 4dd92a3..9b999ec 100644 > --- a/lib/librte_pmd_e1000/e1000/e1000_hw.h > +++ b/lib/librte_pmd_e1000/e1000/e1000_hw.h > @@ -780,6 +780,9 @@ struct e1000_mac_info { > u16 mta_reg_count; > u16 uta_reg_count; > > + u32 max_rx_queues; > + u32 max_tx_queues; > + [LCM] It can be avoid to define new things in share code. The max_rx/tx_queues in mac info only used by eth_igb_configure_msix_intr(). It the input param won't be limit to 'hw'. If using rte_eth_dev as input, you can get all the info from dev_info. The risk in share code is, on next time code merging, it won't care the change here do. > /* Maximum size of the MTA register table in all supported adapters */ > #define MAX_MTA_REG 128 > u32 mta_shadow[MAX_MTA_REG]; > diff --git a/lib/librte_pmd_e1000/e1000_ethdev.h > b/lib/librte_pmd_e1000/e1000_ethdev.h > index d155e77..713ca11 100644 > --- a/lib/librte_pmd_e1000/e1000_ethdev.h > +++ b/lib/librte_pmd_e1000/e1000_ethdev.h > @@ -34,6 +34,8 @@ > #ifndef _E1000_ETHDEV_H_ > #define _E1000_ETHDEV_H_ > > +#include > + > /* need update link, bit flag */ > #define E1000_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0) > #define E1000_FLAG_MAILBOX (uint32_t)(1 << 1) > @@ -105,10 +107,14 @@ > #define E1000_FTQF_QUEUE_SHIFT 16 > #define E1000_FTQF_QUEUE_ENABLE 0x0100 > > +/* maximum number of other interrupts besides Rx & Tx interrupts */ > +#define E1000_MAX_OTHER_INTR 1 > + > /* structure for interrupt relative data */ > struct e1000_interrupt { > uint32_t flags; > uint32_t mask; > + rte_spinlock_t lock; > }; > > /* local vfta copy */ > diff --git a/lib/librte_pmd_e1000/igb_ethdev.c > b/lib/librte_pmd_e1000/igb_ethdev.c > index 2a268b8..7d9b103 100644 > --- a/lib/librte_pmd_e1000/igb_ethdev.c > +++ b/lib/librte_pmd_e1000/igb_ethdev.c > @@ -97,6 +97,7 @@ static int eth_igb_flow_ctrl_get(struct rte_eth_dev *dev, > static int eth_igb_flow_ctrl_set(struct rte_eth_dev *dev, > struct rte_eth_fc_conf *fc_conf); > static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev); > +static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev); > static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev); > static int eth_igb_interrupt_action(struct rte_eth_dev *dev); > static void eth_igb_interrupt_handler(struct rte_intr_handle *handle, > @@ -191,6 +192,14 @@ static int eth_igb_filter_ctrl(struct rte_eth_dev *dev, >enum rte_filter_op filter_op, >void *arg); > > +static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t > queue_id); > +static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t > queue_id); > +static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction, > + uint8_t queue, uint8_t msix_vector); > +static void eth_igb_configure_msix_intr(struct e1000_hw *hw); > +static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector, > + uint8_t index, uint8_t offset); > + > /* > * Define VF Stats MACRO for Non "cleared on read" register > */ > @@ -250,6 +259,8 @@ static struct eth_dev_ops eth_igb_ops = { > .vlan_tpid_set= eth_igb_vlan_tpid_set, > .vlan_offload_set = eth_igb_vlan_offload_set, > .rx_queue_setup = eth_igb_rx_queue_setup, > + .rx_queue_intr_enable = eth_igb_rx_queue_intr_enable, > + .rx_queue_intr_disable = eth_igb_rx_queue_intr_disable, > .rx_queue_release = eth_igb_rx_queue_release, > .rx_queue_count = eth_igb_rx_queue_count, > .rx_descriptor_done = eth_igb_rx_descriptor_done, > @@ -592,6 +603,16 @@ eth_igb_dev_init(__attribute__((unused)) struct > eth_driver *eth_drv, >eth_dev->data->port_id, pci_dev->id.vendor_id, >pci_dev->id.device_id); > > + /* set max interrupt vfio request */ > + struct rte_eth_dev_info dev_info; > + > + memset(&dev_info, 0, sizeof(dev_info)); > + eth_igb_i
[dpdk-dev] [PATCH v5 18/19] ring: add sched_yield to avoid spin forever
Hi, > -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Thursday, February 12, 2015 7:16 PM > To: Liang, Cunming; dev at dpdk.org > Cc: Ananyev, Konstantin > Subject: Re: [PATCH v5 18/19] ring: add sched_yield to avoid spin forever > > Hi, > > On 02/12/2015 09:16 AM, Cunming Liang wrote: > > Add a sched_yield() syscall if the thread spins for too long, waiting other > > thread > to finish its operations on the ring. > > That gives pre-empted thread a chance to proceed and finish with ring > enqnue/dequeue operation. > > The purpose is to reduce contention on the ring. By ring_perf_test, it > > doesn't > shows additional perf penalty. > > > > Signed-off-by: Cunming Liang > > --- > > v5 changes: > > add RTE_RING_PAUSE_REP to config file > > > > v4 changes: > > update and add more comments on sched_yield() > > > > v3 changes: > > new patch adding sched_yield() in rte_ring to avoid long spin > > > > config/common_bsdapp | 1 + > > config/common_linuxapp | 1 + > > lib/librte_ring/rte_ring.h | 31 +++ > > 3 files changed, 29 insertions(+), 4 deletions(-) > > > > diff --git a/config/common_bsdapp b/config/common_bsdapp > > index 57bacb8..52c5143 100644 > > --- a/config/common_bsdapp > > +++ b/config/common_bsdapp > > @@ -234,6 +234,7 @@ CONFIG_RTE_PMD_PACKET_PREFETCH=y > > CONFIG_RTE_LIBRTE_RING=y > > CONFIG_RTE_LIBRTE_RING_DEBUG=n > > CONFIG_RTE_RING_SPLIT_PROD_CONS=n > > +CONFIG_RTE_RING_PAUSE_REP=n > > Maybe it's better to use CONFIG_RTE_RING_PAUSE_REP=0 instead? > If I understand well, it has to be set to an integer value to > enable it, am I correct? [LCM] If RTE_RING_PAUSE_REP=N (no define), by default will use 0. If it's set to 'y'(=1), will issue yield in the most frequent rate. It also can set as integer to assign any number. All cases works for this configure. One point is in configure file, just demonstrate the default way to use it. It can't prevent to use anything unexpected. Except we rule the n & y illegal for this option. The meaningful value of it can write in the doc. > > Thanks, > Olivier
[dpdk-dev] [PATCH v5 19/19] timer: add support to non-EAL thread
> -Original Message- > From: Ananyev, Konstantin > Sent: Thursday, February 12, 2015 9:54 PM > To: Liang, Cunming; dev at dpdk.org > Cc: olivier.matz at 6wind.com > Subject: RE: [PATCH v5 19/19] timer: add support to non-EAL thread > > Hi lads, > > > -Original Message- > > From: Liang, Cunming > > Sent: Thursday, February 12, 2015 8:17 AM > > To: dev at dpdk.org > > Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Liang, Cunming > > Subject: [PATCH v5 19/19] timer: add support to non-EAL thread > > > > Allow to setup timers only for EAL (lcore) threads (__lcore_id < > MAX_LCORE_ID). > > E.g. ? dynamically created thread will be able to reset/stop timer for lcore > thread, > > but it will be not allowed to setup timer for itself or another non-lcore > > thread. > > rte_timer_manage() for non-lcore thread would simply do nothing and return > straightway. > > > > Signed-off-by: Cunming Liang > > --- > > v5 changes: > >add assert in rte_timer_manage > >remove duplicate check in timer_set_config_state > > > > lib/librte_timer/rte_timer.c | 31 ++- > > lib/librte_timer/rte_timer.h | 4 ++-- > > 2 files changed, 24 insertions(+), 11 deletions(-) > > > > diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c > > index 269a992..fa43fa9 100644 > > --- a/lib/librte_timer/rte_timer.c > > +++ b/lib/librte_timer/rte_timer.c > > @@ -35,6 +35,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -79,9 +80,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE]; > > > > /* when debug is enabled, store some statistics */ > > #ifdef RTE_LIBRTE_TIMER_DEBUG > > -#define __TIMER_STAT_ADD(name, n) do { \ > > - unsigned __lcore_id = rte_lcore_id(); \ > > - priv_timer[__lcore_id].stats.name += (n); \ > > +#define __TIMER_STAT_ADD(name, n) do { > > \ > > + unsigned __lcore_id = rte_lcore_id(); \ > > + if (__lcore_id < RTE_MAX_LCORE) \ > > + priv_timer[__lcore_id].stats.name += (n); \ > > } while(0) > > #else > > #define __TIMER_STAT_ADD(name, n) do {} while(0) > > @@ -135,7 +137,7 @@ timer_set_config_state(struct rte_timer *tim, > > > > /* timer is running on another core, exit */ > > if (prev_status.state == RTE_TIMER_RUNNING && > > - (unsigned)prev_status.owner != lcore_id) > > + prev_status.owner != (uint16_t)lcore_id) > > return -1; > > > > /* timer is being configured on another core */ > > @@ -366,9 +368,15 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t > expire, > > > > /* round robin for tim_lcore */ > > if (tim_lcore == (unsigned)LCORE_ID_ANY) { > > - tim_lcore = rte_get_next_lcore(priv_timer[lcore_id].prev_lcore, > > - 0, 1); > > - priv_timer[lcore_id].prev_lcore = tim_lcore; > > + if (lcore_id != LCORE_ID_ANY) { > > + tim_lcore = rte_get_next_lcore( > > + priv_timer[lcore_id].prev_lcore, > > + 0, 1); > > + priv_timer[lcore_id].prev_lcore = tim_lcore; > > + } else > > + /* non-EAL thread do not run rte_timer_manage(), > > +* so schedule the timer on the first enabled lcore. */ > > + tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0, 1); > > } > > > > /* wait that the timer is in correct status before update, > > @@ -378,7 +386,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t > expire, > > return -1; > > > > __TIMER_STAT_ADD(reset, 1); > > - if (prev_status.state == RTE_TIMER_RUNNING) { > > + if (prev_status.state == RTE_TIMER_RUNNING && > > + lcore_id != LCORE_ID_ANY) { > > priv_timer[lcore_id].updated = 1; > > } > > > > @@ -455,7 +464,8 @@ rte_timer_stop(struct rte_timer *tim) > > return -1; > > > > __TIMER_STAT_ADD(stop, 1); > > - if (prev_status.state == RTE_TIMER_RUNNING) { > > + if (prev_status.state == RTE_TIMER_RUNNING && > > + lcore_id != LCORE_ID_ANY) { >
[dpdk-dev] [PATCH v2 4/5] eal: add per rx queue interrupt handling based on VFIO
Hi, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhou Danny > Sent: Tuesday, February 03, 2015 4:19 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 4/5] eal: add per rx queue interrupt handling > based on VFIO > [...] > #include "eal_private.h" > #include "eal_vfio.h" > @@ -127,6 +128,7 @@ static pthread_t intr_thread; > #ifdef VFIO_PRESENT > > #define IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int)) > +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int) * > (VFIO_MAX_QUEUE_ID + 1)) [LCM] Does it better to add comment for '+1' which is the max other interrupts besides rxtx. > > /* enable legacy (INTx) interrupts */ > static int > @@ -221,7 +223,7 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) { > /* enable MSI-X interrupts */ [LCM] typo on comment. 'enable MSI interrupts' ? > > @@ -292,8 +284,8 @@ vfio_disable_msi(struct rte_intr_handle *intr_handle) { > /* enable MSI-X interrupts */ > static int > vfio_enable_msix(struct rte_intr_handle *intr_handle) { > - int len, ret; > - char irq_set_buf[IRQ_SET_BUF_LEN]; > + int len, ret, max_intr; > + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; > struct vfio_irq_set *irq_set; > int *fd_ptr; > > @@ -301,12 +293,19 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) > { > > irq_set = (struct vfio_irq_set *) irq_set_buf; > irq_set->argsz = len; > - irq_set->count = 1; > + if ((!intr_handle->max_intr) || > + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID)) > + max_intr = VFIO_MAX_QUEUE_ID + 1; > + else > + max_intr = intr_handle->max_intr; > + > + irq_set->count = max_intr; > irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | > VFIO_IRQ_SET_ACTION_TRIGGER; > irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX; > irq_set->start = 0; > fd_ptr = (int *) &irq_set->data; > - *fd_ptr = intr_handle->fd; > + memcpy(fd_ptr, intr_handle->queue_fd, sizeof(intr_handle- > >queue_fd)); > + fd_ptr[max_intr - 1] = intr_handle->fd; > > ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > > @@ -316,22 +315,6 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) { > return -1; > } > > - /* manually trigger interrupt to enable it */ > - memset(irq_set, 0, len); > - len = sizeof(struct vfio_irq_set); > - irq_set->argsz = len; > - irq_set->count = 1; > - irq_set->flags = VFIO_IRQ_SET_DATA_NONE | > VFIO_IRQ_SET_ACTION_TRIGGER; > - irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX; > - irq_set->start = 0; > - > - ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > - > - if (ret) { > - RTE_LOG(ERR, EAL, "Error triggering MSI-X interrupts for fd > %d\n", > - intr_handle->fd); > - return -1; > - } > return 0; > } > > @@ -339,7 +322,7 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) { > static int > vfio_disable_msix(struct rte_intr_handle *intr_handle) { > struct vfio_irq_set *irq_set; > - char irq_set_buf[IRQ_SET_BUF_LEN]; > + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; > int len, ret; > > len = sizeof(struct vfio_irq_set); > @@ -824,3 +807,119 @@ rte_eal_intr_init(void) > return -ret; > } > > +static void > +eal_intr_process_rx_interrupts(uint8_t port_id, struct epoll_event *events, > int > nfds) > +{ > + int n, bytes_read; > + union rte_intr_read_buffer buf; > + struct rte_intr_handle intr_handle = rte_eth_devices[port_id].pci_dev- > >intr_handle; [LCM] column number large than 80. > + > + for (n = 0; n < nfds; n++) { > + /* set the length to be read for different handle type */ > + switch (intr_handle.type) { > + case RTE_INTR_HANDLE_UIO: > + bytes_read = sizeof(buf.uio_intr_count); > + break; > + case RTE_INTR_HANDLE_ALARM: > + bytes_read = sizeof(buf.timerfd_num); > + break; > +#ifdef VFIO_PRESENT > + case RTE_INTR_HANDLE_VFIO_MSIX: > + case RTE_INTR_HANDLE_VFIO_MSI: > + case RTE_INTR_HANDLE_VFIO_LEGACY: > + bytes_read = sizeof(buf.vfio_intr_count); > + break; > +#endif > + default: > + bytes_read = 1; > + break; > + } > + > + /** > + * read out to clear the ready-to-be-read flag > + * for epoll_wait. > + */ > + bytes_read = read(events[n].data.fd, &buf, bytes_read); > + if (bytes_read < 0) > + RTE_LOG(ERR, EAL, "Error reading from file " > + "descriptor %d: %s\n", events[n].data.fd, > + strerror(errno)); >
[dpdk-dev] [PATCH v6 12/19] malloc: fix the issue of SOCKET_ID_ANY
Hi, > -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Saturday, February 14, 2015 1:57 AM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 12/19] malloc: fix the issue of > SOCKET_ID_ANY > > On Fri, Feb 13, 2015 at 09:38:14AM +0800, Cunming Liang wrote: > > Add check for rte_socket_id(), avoid get unexpected return like (-1). > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_malloc/malloc_heap.h | 7 ++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/lib/librte_malloc/malloc_heap.h > > b/lib/librte_malloc/malloc_heap.h > > index b4aec45..a47136d 100644 > > --- a/lib/librte_malloc/malloc_heap.h > > +++ b/lib/librte_malloc/malloc_heap.h > > @@ -44,7 +44,12 @@ extern "C" { > > static inline unsigned > > malloc_get_numa_socket(void) > > { > > - return rte_socket_id(); > > + unsigned socket_id = rte_socket_id(); > > + > > + if (socket_id == (unsigned)SOCKET_ID_ANY) > > + return 0; > > + > > + return socket_id; > Why is -1 unexpected? Isn't it reasonable to assume that some memory is > equidistant from all cpu numa nodes? [LCM] One piece of memory will be whole allocated from one specific NUMA node. But won't be like some part from one and the other part from another. If no specific NUMA node assigned(SOCKET_ID_ANY/-1), it firstly asks for the current NUMA node where current core belongs to. 'malloc_get_numa_socket()' is called on that time. When the time 1:1 thread/core mapping is assumed and the default value is 0, it always will return a none (-1) value. Now rte_socket_id() may return -1 in the case the pthread runs on multi-cores which are not belongs to one NUMA node, or in the case _socket_id is not yet assigned and the default value is (-1). So if current _socket_id is -1, then just pick up the first node as the candidate. Probably I shall add more comments for this. > > Neil > > > } > > > > void * > > -- > > 1.8.1.4 > > > >
[dpdk-dev] [PATCH v6 06/19] eal: new TLS definition and API declaration
Hi, > -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Friday, February 13, 2015 9:58 PM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 06/19] eal: new TLS definition and API > declaration > > On Fri, Feb 13, 2015 at 09:38:08AM +0800, Cunming Liang wrote: > > 1. add two TLS *_socket_id* and *_cpuset* > > 2. add two external API rte_thread_set/get_affinity > > 3. add one internal API eal_thread_dump_affinity > > > > Signed-off-by: Cunming Liang > > --- > > v5 changes: > >add comments for RTE_CPU_AFFINITY_STR_LEN > >update comments for eal_thread_dump_affinity() > >return void for rte_thread_get_affinity() > >move rte_socket_id() change to another patch > > > > lib/librte_eal/bsdapp/eal/eal_thread.c| 2 ++ > > lib/librte_eal/common/eal_thread.h| 36 > +++ > > lib/librte_eal/common/include/rte_lcore.h | 26 +- > > lib/librte_eal/linuxapp/eal/eal_thread.c | 2 ++ > > 4 files changed, 65 insertions(+), 1 deletion(-) > > > > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c > b/lib/librte_eal/bsdapp/eal/eal_thread.c > > index ab05368..10220c7 100644 > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c > > @@ -56,6 +56,8 @@ > > #include "eal_thread.h" > > > > RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); > > +RTE_DEFINE_PER_LCORE(unsigned, _socket_id); > > +RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > > > > /* > > * Send a message to a slave lcore identified by slave_id to call a > > diff --git a/lib/librte_eal/common/eal_thread.h > b/lib/librte_eal/common/eal_thread.h > > index f1ce0bd..e4e76b9 100644 > > --- a/lib/librte_eal/common/eal_thread.h > > +++ b/lib/librte_eal/common/eal_thread.h > > @@ -34,6 +34,8 @@ > > #ifndef EAL_THREAD_H > > #define EAL_THREAD_H > > > > +#include > > + > > /** > > * basic loop of thread, called for each thread by eal_init(). > > * > > @@ -61,4 +63,38 @@ void eal_thread_init_master(unsigned lcore_id); > > */ > > unsigned eal_cpu_socket_id(unsigned cpu_id); > > > > +/** > > + * Get the NUMA socket id from cpuset. > > + * This function is private to EAL. > > + * > > + * @param cpusetp > > + * The point to a valid cpu set. > > + * @return > > + * socket_id or SOCKET_ID_ANY > > + */ > > +int eal_cpuset_socket_id(rte_cpuset_t *cpusetp); > > + > > +/** > > + * Default buffer size to use with eal_thread_dump_affinity() > > + */ > > +#define RTE_CPU_AFFINITY_STR_LEN256 > > + > > +/** > > + * Dump the current pthread cpuset. > > + * This function is private to EAL. > > + * > > + * Note: > > + * If the dump size is greater than the size of given buffer, > > + * the string will be truncated and with '\0' at the end. > > + * > > + * @param str > > + * The string buffer the cpuset will dump to. > > + * @param size > > + * The string buffer size. > > + * @return > > + * 0 for success, -1 if truncation happens. > > + */ > > +int > > +eal_thread_dump_affinity(char *str, unsigned size); > > + > > #endif /* EAL_THREAD_H */ > > diff --git a/lib/librte_eal/common/include/rte_lcore.h > b/lib/librte_eal/common/include/rte_lcore.h > > index 4c7d6bb..33f558e 100644 > > --- a/lib/librte_eal/common/include/rte_lcore.h > > +++ b/lib/librte_eal/common/include/rte_lcore.h > > @@ -80,7 +80,9 @@ struct lcore_config { > > */ > > extern struct lcore_config lcore_config[RTE_MAX_LCORE]; > > > > -RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */ > > +RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per thread "lcore id". > */ > > +RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". > */ > > +RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". > */ > > > > /** > > * Return the ID of the execution unit we are running on. > > @@ -229,6 +231,28 @@ rte_get_next_lcore(unsigned i, int skip_master, int > wrap) > > i > i = rte_get_next_lcore(i, 1, 0)) > > > > +/** > > + * Set core affinity of the current thread. > > + * Support both EAL and none-EAL thread and update TLS. > > + * > > + * @param cpusetp > > + * Point to cpu_set_t for s
[dpdk-dev] [PATCH v6 05/19] eal: add support parsing socket_id from cpuset
> -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Friday, February 13, 2015 9:52 PM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 05/19] eal: add support parsing socket_id > from cpuset > > On Fri, Feb 13, 2015 at 09:38:07AM +0800, Cunming Liang wrote: > > It returns the socket_id if all cpus in the cpuset belongs > > to the same NUMA node, otherwise it will return SOCKET_ID_ANY. > > > > Signed-off-by: Cunming Liang > > --- > > v5 changes: > >expose cpu_socket_id as eal_cpu_socket_id for linuxapp > >eal_cpuset_socket_id() remove static inline and move to c file > > > > lib/librte_eal/bsdapp/eal/eal_lcore.c | 7 +++ > > lib/librte_eal/common/eal_thread.h | 11 +++ > > lib/librte_eal/linuxapp/eal/eal_lcore.c | 7 --- > > 3 files changed, 22 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c > b/lib/librte_eal/bsdapp/eal/eal_lcore.c > > index 72f8ac2..162fb4f 100644 > > --- a/lib/librte_eal/bsdapp/eal/eal_lcore.c > > +++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c > > @@ -41,6 +41,7 @@ > > #include > > > > #include "eal_private.h" > > +#include "eal_thread.h" > > > > /* No topology information available on FreeBSD including NUMA info */ > > #define cpu_core_id(X) 0 > > @@ -112,3 +113,9 @@ rte_eal_cpu_init(void) > > > > return 0; > > } > > + > > +unsigned > > +eal_cpu_socket_id(__rte_unused unsigned cpu_id) > > +{ > > + return cpu_socket_id(cpu_id); > > +} > > diff --git a/lib/librte_eal/common/eal_thread.h > b/lib/librte_eal/common/eal_thread.h > > index b53b84d..f1ce0bd 100644 > > --- a/lib/librte_eal/common/eal_thread.h > > +++ b/lib/librte_eal/common/eal_thread.h > > @@ -50,4 +50,15 @@ __attribute__((noreturn)) void *eal_thread_loop(void > *arg); > > */ > > void eal_thread_init_master(unsigned lcore_id); > > > > +/** > > + * Get the NUMA socket id from cpu id. > > + * This function is private to EAL. > > + * > > + * @param cpu_id > > + * The logical process id. > > + * @return > > + * socket_id or SOCKET_ID_ANY > > + */ > > +unsigned eal_cpu_socket_id(unsigned cpu_id); > > + > > #endif /* EAL_THREAD_H */ > > diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c > b/lib/librte_eal/linuxapp/eal/eal_lcore.c > > index 29615f8..ef8c433 100644 > > --- a/lib/librte_eal/linuxapp/eal/eal_lcore.c > > +++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c > > @@ -45,6 +45,7 @@ > > > > #include "eal_private.h" > > #include "eal_filesystem.h" > > +#include "eal_thread.h" > > > > #define SYS_CPU_DIR "/sys/devices/system/cpu/cpu%u" > > #define CORE_ID_FILE "topology/core_id" > > @@ -71,8 +72,8 @@ cpu_detected(unsigned lcore_id) > > * Note: physical package id != NUMA node, but we use it as a > > * fallback for kernels which don't create a nodeY link > > */ > > -static unsigned > > -cpu_socket_id(unsigned lcore_id) > > +unsigned > > +eal_cpu_socket_id(unsigned lcore_id) > If you want to export this symbol, then you need to add it to the version map. [LCM] They're all EAL internal function, won't plan to expose as EAL API. > > Neil
[dpdk-dev] [PATCH v6 04/19] eal: fix wrong strnlen() return value in 32bit icc
> -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Saturday, February 14, 2015 1:55 AM > To: Olivier MATZ > Cc: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 04/19] eal: fix wrong strnlen() return > value in > 32bit icc > > On Fri, Feb 13, 2015 at 03:05:44PM +0100, Olivier MATZ wrote: > > Hi Neil, > > > > On 02/13/2015 02:49 PM, Neil Horman wrote: > > > On Fri, Feb 13, 2015 at 09:38:06AM +0800, Cunming Liang wrote: > > >> The problem is that strnlen() here may return invalid value with 32bit > > >> icc. > > >> (actually it returns it?s second parameter,e.g: sysconf(_SC_ARG_MAX)). > > >> It starts to manifest hwen max_len parameter is > 2M and using icc ?m32 ? > O2 (or above). > > >> > > >> Suggested-by: Konstantin Ananyev > > >> Signed-off-by: Cunming Liang > > >> --- > > >> v5 changes: > > >>using strlen instead of strnlen. > > >> > > >> lib/librte_eal/common/eal_common_options.c | 6 +++--- > > >> 1 file changed, 3 insertions(+), 3 deletions(-) > > >> > > >> diff --git a/lib/librte_eal/common/eal_common_options.c > b/lib/librte_eal/common/eal_common_options.c > > >> index 178e303..9cf2faa 100644 > > >> --- a/lib/librte_eal/common/eal_common_options.c > > >> +++ b/lib/librte_eal/common/eal_common_options.c > > >> @@ -167,7 +167,7 @@ eal_parse_coremask(const char *coremask) > > >> if (coremask[0] == '0' && ((coremask[1] == 'x') > > >> || (coremask[1] == 'X'))) > > >> coremask += 2; > > >> -i = strnlen(coremask, PATH_MAX); > > >> +i = strlen(coremask); > > > This is crash prone. If coremask is passed in without a trailing null > > > pointer, > > > strlen will return a huge value that can overrun the array. > > > > We discussed that in a previous thread: > > http://dpdk.org/ml/archives/dev/2015-February/012552.html > > > > coremask is always a valid nul-terminated string as it comes from > > argv[] table. > > It is not a memory fragment that is controlled by a user, so I don't > > think using strnlen() instead of strlen() would solve any issue. > > > Thats absolutely false, you can't in any way make that assertion. > eal_parse_common_option is a public API call. An application can construct > its > own string to pass into the parser. The test applications all use the command > line functions so its not a visible issue from the test apps, but you can't > assume what the test apps do is what everyone will do. It would be one thing > if > you could make the parse_common_option function private, but with the > current > layout you can't so you have to be ready for garbage input. > > Neil [LCM] It sounds reasonable to me. I'll rollback the code and use strnlen(coremask, ARG_MAX) instead. > > > Regards, > > Olivier > >
[dpdk-dev] [PATCH v6 00/19] support multi-pthread per core
Hi, > -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Friday, February 13, 2015 6:06 PM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 00/19] support multi-pthread per core > > Hi, > > On 02/13/2015 02:38 AM, Cunming Liang wrote: > > v6 changes: > > rename RTE_RING_PAUSE_REP(_COUNT) and set default to 0 > > rollback to use RTE_MAX_LCORE when checking valid lcore_id for EAL thread > > > > v5 changes: > > reorder some patch and split into addtional two patches > > rte_thread_get_affinity() return type change to avoid > > add RTE_RING_PAUSE_REP into config and by default turn off > > > > v4 changes: > > new patch fixing strnlen() invalid return in 32bit icc [03/17] > > update and add more comments on sched_yield() [16/17] > > > > v3 changes: > > new patch adding sched_yield() in rte_ring to avoid long spin [16/17] > > > > v2 changes: > > add '-' support for EAL option '--lcores' [02/17] > > > > The patch series contain the enhancements of EAL and fixes for libraries > > to run multi-pthreads(either EAL or non-EAL thread) per physical core. > > Two major changes list as below: > > - Extend the core affinity of each EAL thread to 1:n. > > Each lcore stands for a EAL thread rather than a logical core. > > The change adds new EAL option to allow static lcore to cpuset assginment. > > Then a lcore(EAL thread) affinity to a cpuset, original 1:1 mapping is > > the special > case. > > - Fix the libraries to allow running on any non-EAL thread. > > It fix the gaps running libraries in non-EAL thread(dynamic created by > > user). > > Each fix libraries take care the case of rte_lcore_id() >= RTE_MAX_LCORE. > > > > Thanks a million for the comments from Konstantin, Bruce, Mirek and Stephen > in RFC review. > > > > Cunming Liang (19): > > eal: add cpuset into per EAL thread lcore_config > > eal: fix PAGE_SIZE redefine complaint on freebsd > > eal: new eal option '--lcores' for cpu assignment > > eal: fix wrong strnlen() return value in 32bit icc > > eal: add support parsing socket_id from cpuset > > eal: new TLS definition and API declaration > > eal: add eal_common_thread.c for common thread API > > eal: standardize init sequence between linux and bsd > > eal: add rte_gettid() to acquire unique system tid > > eal: apply affinity of EAL thread by assigned cpuset > > enic: fix re-define freebsd compile complain > > malloc: fix the issue of SOCKET_ID_ANY > > log: fix the gap to support non-EAL thread > > eal: set _lcore_id and _socket_id to (-1) by default > > eal: fix recursive spinlock in non-EAL thraed > > mempool: add support to non-EAL thread > > ring: add support to non-EAL thread > > ring: add sched_yield to avoid spin forever > > timer: add support to non-EAL thread > > > > config/common_bsdapp | 1 + > > config/common_linuxapp | 1 + > > lib/librte_eal/bsdapp/eal/Makefile | 1 + > > lib/librte_eal/bsdapp/eal/eal.c| 14 +- > > lib/librte_eal/bsdapp/eal/eal_lcore.c | 14 + > > lib/librte_eal/bsdapp/eal/eal_memory.c | 8 +- > > lib/librte_eal/bsdapp/eal/eal_thread.c | 77 ++ > > lib/librte_eal/common/eal_common_log.c | 17 +- > > lib/librte_eal/common/eal_common_options.c | 308 > - > > lib/librte_eal/common/eal_common_thread.c | 150 ++ > > lib/librte_eal/common/eal_options.h| 2 + > > lib/librte_eal/common/eal_thread.h | 47 > > .../common/include/generic/rte_spinlock.h | 4 +- > > lib/librte_eal/common/include/rte_eal.h| 27 ++ > > lib/librte_eal/common/include/rte_lcore.h | 40 ++- > > lib/librte_eal/common/include/rte_log.h| 5 + > > lib/librte_eal/linuxapp/eal/Makefile | 4 + > > lib/librte_eal/linuxapp/eal/eal.c | 8 +- > > lib/librte_eal/linuxapp/eal/eal_lcore.c| 15 +- > > lib/librte_eal/linuxapp/eal/eal_thread.c | 77 ++ > > lib/librte_malloc/malloc_heap.h| 7 +- > > lib/librte_mempool/rte_mempool.h | 18 +- > > lib/librte_pmd_enic/enic.h | 4 +- > > lib/librte_pmd_enic/enic_compat.h | 2 +- > &g
[dpdk-dev] [PATCH v6 06/19] eal: new TLS definition and API declaration
> -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Sunday, February 15, 2015 1:17 PM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 06/19] eal: new TLS definition and API > declaration > [...] > > > > > > > > RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); > > > > +RTE_DEFINE_PER_LCORE(unsigned, _socket_id); > > > > +RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > > > > > > > > /* > > > > * Send a message to a slave lcore identified by slave_id to call a > > > > -- > > > > 1.8.1.4 > > > > > > > > > > > All of these exported functions need to be exported in the version map. > > > Also, > I > > > don't think its a good idea to simply expose the per lcore cpuset > > > variables. It > > > would be far better to create an api around them > > [LCM] Thanks for the remind, I haven't taken care of the version map. > > The rte_thread_set/get_affinity() are the api around _cpuset, so do you > suggest we don't put 'per_lcore__cpuset' into rte_eal_version.map ? > > On this point, I agree with you and think we'd better not expose > 'per_lcore__socket_id' as well, what do you think ? > Yes, absolutely, you should wrap some API around them, and make them > defined > symbols, only inline them if they're going to be in the hot path. [LCM] _socket_id is wrapped by rte_socket_id() and rte_thread_set_affinity(). rte_socket_id() is defined as inline in rte_lcore.h. So finally two (rte_thread_set/get_affinity()) are added into version map. All these are updated in v7. > > Thanks > Neil > > > > > > > Neil > > > >
[dpdk-dev] [PATCH v1] test: add ut for eal flags --lcores
Hi, > -Original Message- > From: Qiu, Michael > Sent: Sunday, February 15, 2015 2:59 PM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] test: add ut for eal flags --lcores > > Hi, Steve > > Why not post this patch within your enabling EAL "--lcores" option patch > set? > As it is not merged yet. > [LCM] As that patch series already go through several round review, I won't expect to involve the totally new update on it. > Just a suggestion, depends you. > > Thanks, > Michael > On 2/15/2015 1:48 PM, Cunming Liang wrote: > > The patch add unit test for the new eal option "--lcores". > > > > Signed-off-by: Cunming Liang > > --- > > It depends on the previous patch which enabling EAL "--lcores" option. > > http://dpdk.org/ml/archives/dev/2015-February/013204.html > > > > app/test/test_eal_flags.c | 95 > --- > > 1 file changed, 81 insertions(+), 14 deletions(-) > > > > diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c > > index 0a8269c..0352f87 100644 > > --- a/app/test/test_eal_flags.c > > +++ b/app/test/test_eal_flags.c > > @@ -512,47 +512,114 @@ test_missing_c_flag(void) > > > > /* -c flag but no coremask value */ > > const char *argv1[] = { prgname, prefix, mp_flag, "-n", "3", "-c"}; > > - /* No -c or -l flag at all */ > > + /* No -c, -l or --lcores flag at all */ > > const char *argv2[] = { prgname, prefix, mp_flag, "-n", "3"}; > > /* bad coremask value */ > > - const char *argv3[] = { prgname, prefix, mp_flag, "-n", "3", "-c", > > "error" }; > > + const char *argv3[] = { prgname, prefix, mp_flag, > > + "-n", "3", "-c", "error" }; > > /* sanity check of tests - valid coremask value */ > > - const char *argv4[] = { prgname, prefix, mp_flag, "-n", "3", "-c", "1" > > }; > > + const char *argv4[] = { prgname, prefix, mp_flag, > > + "-n", "3", "-c", "1" }; > > /* -l flag but no corelist value */ > > - const char *argv5[] = { prgname, prefix, mp_flag, "-n", "3", "-l"}; > > - const char *argv6[] = { prgname, prefix, mp_flag, "-n", "3", "-l", " " > > }; > > + const char *argv5[] = { prgname, prefix, mp_flag, > > + "-n", "3", "-l"}; > > + const char *argv6[] = { prgname, prefix, mp_flag, > > + "-n", "3", "-l", " " }; > > /* bad corelist values */ > > - const char *argv7[] = { prgname, prefix, mp_flag, "-n", "3", "-l", > > "error" }; > > - const char *argv8[] = { prgname, prefix, mp_flag, "-n", "3", "-l", "1-" > > }; > > - const char *argv9[] = { prgname, prefix, mp_flag, "-n", "3", "-l", "1," > > }; > > - const char *argv10[] = { prgname, prefix, mp_flag, "-n", "3", "-l", > > "1#2" }; > > + const char *argv7[] = { prgname, prefix, mp_flag, > > + "-n", "3", "-l", "error" }; > > + const char *argv8[] = { prgname, prefix, mp_flag, > > + "-n", "3", "-l", "1-" }; > > + const char *argv9[] = { prgname, prefix, mp_flag, > > + "-n", "3", "-l", "1," }; > > + const char *argv10[] = { prgname, prefix, mp_flag, > > +"-n", "3", "-l", "1#2" }; > > /* sanity check test - valid corelist value */ > > - const char *argv11[] = { prgname, prefix, mp_flag, "-n", "3", "-l", > > "1-2,3" }; > > + const char *argv11[] = { prgname, prefix, mp_flag, > > +"-n", "3", "-l", "1-2,3" }; > > + > > + /* --lcores flag but no lcores value */ > > + const char *argv12[] = { prgname, prefix, mp_flag, > > +"-n", "3", "--lcores" }; > > + const char *argv13[]
[dpdk-dev] [PATCH v6 12/19] malloc: fix the issue of SOCKET_ID_ANY
Hi, > -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Sunday, February 15, 2015 10:09 PM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 12/19] malloc: fix the issue of > SOCKET_ID_ANY > > On Sun, Feb 15, 2015 at 12:43:03AM +, Liang, Cunming wrote: > > Hi, > > > > > -Original Message- > > > From: Neil Horman [mailto:nhorman at tuxdriver.com] > > > Sent: Saturday, February 14, 2015 1:57 AM > > > To: Liang, Cunming > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v6 12/19] malloc: fix the issue of > SOCKET_ID_ANY > > > > > > On Fri, Feb 13, 2015 at 09:38:14AM +0800, Cunming Liang wrote: > > > > Add check for rte_socket_id(), avoid get unexpected return like (-1). > > > > > > > > Signed-off-by: Cunming Liang > > > > --- > > > > lib/librte_malloc/malloc_heap.h | 7 ++- > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/lib/librte_malloc/malloc_heap.h > b/lib/librte_malloc/malloc_heap.h > > > > index b4aec45..a47136d 100644 > > > > --- a/lib/librte_malloc/malloc_heap.h > > > > +++ b/lib/librte_malloc/malloc_heap.h > > > > @@ -44,7 +44,12 @@ extern "C" { > > > > static inline unsigned > > > > malloc_get_numa_socket(void) > > > > { > > > > - return rte_socket_id(); > > > > + unsigned socket_id = rte_socket_id(); > > > > + > > > > + if (socket_id == (unsigned)SOCKET_ID_ANY) > > > > + return 0; > > > > + > > > > + return socket_id; > > > Why is -1 unexpected? Isn't it reasonable to assume that some memory is > > > equidistant from all cpu numa nodes? > > [LCM] One piece of memory will be whole allocated from one specific NUMA > node. But won't be like some part from one and the other part from another. > > If no specific NUMA node assigned(SOCKET_ID_ANY/-1), it firstly asks for the > current NUMA node where current core belongs to. > > 'malloc_get_numa_socket()' is called on that time. When the time 1:1 > thread/core mapping is assumed and the default value is 0, it always will > return a > none (-1) value. > > Now rte_socket_id() may return -1 in the case the pthread runs on > > multi-cores > which are not belongs to one NUMA node, or in the case _socket_id is not yet > assigned and the default value is (-1). So if current _socket_id is -1, then > just pick > up the first node as the candidate. Probably I shall add more comments for > this. > > > > Ok, but doesn't that provide an abnormal bias for node 0? I was thinking it > might be better to be honest with the application so that it can choose a node > according to its own policy. [LCM] Personally I like the idea grant application to make the decision. Either add a simple default configure or defines the more flexible policy of SOCKET_ID_ANY like 1) use the assigned default socket_id; 2) use current socket_id, if fail goto 1); 3) (weight)round robin across the malloc_heaps; 4) use current socket_id, if fail goto 3); and etc. But on another side, the well-tuned application are usually NUMA friendly. Instead of using SOCKET_ID_ANY, it most often assigned the expected socket_id. Except getting the real current valid socket_id, The policy won't help on the affinity but mainly helps on the memory utilization. I guess the worry comes from the case, after lots of memory allocation happens on socket 0, a new memzone_reserve fails when it definitely has to do it on socket 0 as well. In this case, either changes the default NUMA node or balance the allocation won't solve the problem, but respite it happening. It's because the explicit assignment allocation (memzone_reserve, malloc with a specified socket_id) may not average balanced. In reverse, if reserving all necessary memzone first, even malloc fails on default socket, it will try to get allocation from other NUMA node. I think it's out of the scope of this patch series. On current moment, using the simplest way taking node 0 as default socket_id is not bad. For more, we can post on separate patch and involved more on the discussion. Thanks. > > Neil > > > > Neil > > > > > > > } > > > > > > > > void * > > > > -- > > > > 1.8.1.4 > > > > > > > > > >
[dpdk-dev] [PATCH v7 04/19] eal: fix wrong strnlen() return value in 32bit icc
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, February 16, 2015 10:52 PM > To: Liang, Cunming; dev at dpdk.org > Cc: Ananyev, Konstantin; nhorman at tuxdriver.com > Subject: Re: [PATCH v7 04/19] eal: fix wrong strnlen() return value in 32bit > icc > > Hi, > > On 02/15/2015 04:15 AM, Cunming Liang wrote: > > The problem is that strnlen() here may return invalid value with 32bit icc. > > (actually it returns it?s second parameter,e.g: sysconf(_SC_ARG_MAX)). > > It starts to manifest hwen max_len parameter is > 2M and using icc ?m32 ?O2 > (or above). > > > > Suggested-by: Konstantin Ananyev > > Signed-off-by: Cunming Liang > > Sorry but I don't think using strnlen() is appropriate here. See > http://dpdk.org/ml/archives/dev/2015-February/013309.html > > I still don't agree that we should use strnlen(coremask, ARG_MAX). > > The API of eal_parse_coremask() requires that a valid string is passed > as an argument, so strlen() is perfectly fine. It's up to the caller to > ensure that the string is valid. [LCM] To me, personally I think either strlen() or strnlen(str, EAL_ARG_MAX) is ok. Neil's point is that 'eal_parse_common_option()' extern as a EAL global function, itself should avoid the dirty input. That's for security programming. As strlen() intended to be used only to calculate the size of incoming untrusted data in a buffer of known size. Your point is strlen() is enough as it only be used in EAL, and so far all input comes from optarg which is trusted data from getopt_long(). Add Thomas in cc list, I'll submit a v8 to make sure in both case there's patch series ready. > > Using strnlen(coremask, ARG_MAX) in eal_parse_coremask() with an > arbitrary length does not protect from having a segfault in case the > string is invalid and the caller's buffer length is < ARG_MAX. > [LCM] I'm afraid not getting your point here. It causes segfault only if the > input string is NULL, doesn't it ? As it already check the case, so using strnlen do protect against the unterminated string. > > This would still be true even if eal_parse_coremask() is public. > > Regards, > Olivier
[dpdk-dev] [PATCH v3 00/16] unified packet type
> -Original Message- > From: Zhang, Helin > Sent: Tuesday, February 17, 2015 2:59 PM > To: dev at dpdk.org > Cc: Cao, Waterman; Liang, Cunming; Liu, Jijiang; Ananyev, Konstantin; > Richardson, > Bruce; Zhang, Helin > Subject: [PATCH v3 00/16] unified packet type > > Currently only 6 bits which are stored in ol_flags are used to indicate the > packet types. This is not enough, as some NIC hardware can recognize quite > a lot of packet types, e.g i40e hardware can recognize more than 150 packet > types. Hiding those packet types hides hardware offload capabilities which > could be quite useful for improving performance and for end users. So an > unified packet types are needed to support all possible PMDs. A 16 bits > packet_type in mbuf structure can be changed to 32 bits and used for this > purpose. In addition, all packet types stored in ol_flag field should be > deleted at all, and 6 bits of ol_flags can be save as the benifit. > > Initially, 32 bits of packet_type can be divided into several sub fields to > indicate different packet type information of a packet. The initial design > is to divide those bits into fields for L2 types, L3 types, L4 types, tunnel > types, inner L2 types, inner L3 types and inner L4 types. All PMDs should > translate the offloaded packet types into these 7 fields of information, for > user applications. > > v2 changes: > * Enlarged the packet_type field from 16 bits to 32 bits. > * Redefined the packet type sub-fields. > * Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes. > * Used redefined packet types and enlarged packet_type field for all PMDs > and corresponding applications. > * Removed changes in bond and its relevant application, as there is no need > at all according to the recent bond changes. > > v3 changes: > * Put the mbuf layout changes into a single patch. > * Put vector ixgbe changes right after mbuf changes. > * Disabled vector ixgbe PMD by default, as mbuf layout changed, and then > re-enabled it after vector ixgbe PMD updated. > * Put the definitions of unified packet type into a single patch. > * Minor bug fixes and enhancements in l3fwd example. > > Helin Zhang (16): > mbuf: redefinition of packet_type in rte_mbuf > ixgbe: support of unified packet type for vector > mbuf: add definitions of unified packet types > e1000: support of unified packet type > ixgbe: support of unified packet type > i40e: support of unified packet type > enic: support of unified packet type > vmxnet3: support of unified packet type > app/test-pipeline: support of unified packet type > app/testpmd: support of unified packet type > examples/ip_fragmentation: support of unified packet type > examples/ip_reassembly: support of unified packet type > examples/l3fwd-acl: support of unified packet type > examples/l3fwd-power: support of unified packet type > examples/l3fwd: support of unified packet type > mbuf: remove old packet type bit masks > > app/test-pipeline/pipeline_hash.c | 7 +- > app/test-pmd/csumonly.c| 10 +- > app/test-pmd/rxonly.c | 9 +- > examples/ip_fragmentation/main.c | 7 +- > examples/ip_reassembly/main.c | 7 +- > examples/l3fwd-acl/main.c | 19 +- > examples/l3fwd-power/main.c| 5 +- > examples/l3fwd/main.c | 71 +- > .../linuxapp/eal/include/exec-env/rte_kni_common.h | 4 +- > lib/librte_mbuf/rte_mbuf.c | 6 - > lib/librte_mbuf/rte_mbuf.h | 127 +++- > lib/librte_pmd_e1000/igb_rxtx.c| 98 ++- > lib/librte_pmd_enic/enic_main.c| 14 +- > lib/librte_pmd_i40e/i40e_rxtx.c| 786 > ++--- > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 146 +++- > lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 49 +- > lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 4 +- > 17 files changed, 921 insertions(+), 448 deletions(-) > > -- > 1.9.3 Acked-by: Cunming Liang
[dpdk-dev] [PATCH v3 0/5] Interrupt mode PMD
Hi, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhou Danny > Sent: Tuesday, February 17, 2015 9:47 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v3 0/5] Interrupt mode PMD > > v3 changes > - Add return value for interrupt enable/disable functions > - Move spinlok from PMD to L3fwd-power > - Remove unnecessary variables in e1000_mac_info > - Fix miscelleous review comments > > v2 changes > - Fix compilation issue in Makefile for missed header file. > - Consolidate internal and community review comments of v1 patch set. > > The patch series introduce low-latency one-shot rx interrupt into DPDK with > polling and interrupt mode switch control example. > > DPDK userspace interrupt notification and handling mechanism is based on UIO > with below limitation: > 1) It is designed to handle LSC interrupt only with inefficient suspended > pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread > which then wakes up DPDK polling thread). In this way, it introduces > non-deterministic wakeup latency for DPDK polling thread as well as packet > latency if it is used to handle Rx interrupt. > 2) UIO only supports a single interrupt vector which has to been shared by > LSC interrupt and interrupts assigned to dedicated rx queues. > > This patchset includes below features: > 1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF > only). > 2) Build on top of the VFIO mechanism instead of UIO, so it could support > up to 64 interrupt vectors for rx queue interrupts. > 3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated > VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in > user space. > 4) Demonstrate interrupts control APIs and userspace NAIP-like > polling/interrupt > switch algorithms in L3fwd-power example. > > Known limitations: > 1) It does not work for UIO due to a single interrupt eventfd shared by LSC > and rx queue interrupt handlers causes a mess. > 2) LSC interrupt is not supported by VF driver, so it is by default disabled > in L3fwd-power now. Feel free to turn in on if you want to support both LSC > and rx queue interrupts on a PF. > > Danny Zhou (5): > ethdev: add rx interrupt enable/disable functions > ixgbe: enable rx queue interrupts for both PF and VF > igb: enable rx queue interrupts for PF > eal: add per rx queue interrupt handling based on VFIO > l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode > switch > > examples/l3fwd-power/main.c| 153 ++--- > lib/librte_eal/common/include/rte_eal.h| 12 + > lib/librte_eal/linuxapp/eal/Makefile | 1 + > lib/librte_eal/linuxapp/eal/eal_interrupts.c | 190 --- > lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 12 +- > .../linuxapp/eal/include/exec-env/rte_interrupts.h | 4 + > lib/librte_ether/rte_ethdev.c | 43 +++ > lib/librte_ether/rte_ethdev.h | 57 > lib/librte_pmd_e1000/e1000_ethdev.h| 3 + > lib/librte_pmd_e1000/igb_ethdev.c | 228 +++-- > lib/librte_pmd_ixgbe/ixgbe_ethdev.c| 365 > - > lib/librte_pmd_ixgbe/ixgbe_ethdev.h| 6 + > 13 files changed, 963 insertions(+), 114 deletions(-) > > -- > 1.8.1.4 Acked-by: Cunming Liang
[dpdk-dev] [PATCH v1] afpacket: fix critical issue reported by klocwork
> -Original Message- > From: John W. Linville [mailto:linville at tuxdriver.com] > Sent: Saturday, February 21, 2015 2:39 AM > To: Thomas Monjalon > Cc: Liang, Cunming; dev at dpdk.org; John Linville > Subject: Re: [dpdk-dev] [PATCH v1] afpacket: fix critical issue reported by > klocwork > > On Fri, Feb 20, 2015 at 11:19:59AM +0100, Thomas Monjalon wrote: > > Hi Cunming, > > > > You would have more chance to have a review by CC'ing John. > > I checked your patch and have a comment below. > > > > 2015-02-12 17:08, Cunming Liang: > > > Klocwork report 'req' might be used uninitialized. > > > In some cases it can 'goto error' when '*internals' not been set. > > > The result is unexpected checking the value of '*internals'. > > > > > > Signed-off-by: Cunming Liang > > > --- > > > lib/librte_pmd_af_packet/rte_eth_af_packet.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/librte_pmd_af_packet/rte_eth_af_packet.c > b/lib/librte_pmd_af_packet/rte_eth_af_packet.c > > > index 1ffe1cd..185607d 100644 > > > --- a/lib/librte_pmd_af_packet/rte_eth_af_packet.c > > > +++ b/lib/librte_pmd_af_packet/rte_eth_af_packet.c > > > @@ -439,13 +439,15 @@ rte_pmd_init_internals(const char *name, > > > size_t ifnamelen; > > > unsigned k_idx; > > > struct sockaddr_ll sockaddr; > > > - struct tpacket_req *req; > > > + struct tpacket_req *req = NULL; > > > > If *internals is set to NULL, there should be no case where req used > > and undefined. [LCM] Agree, so that's why I add '*internals = NULL' below as well. > > I agree -- it looks to me like req is protected by checking for > *internals == NULL. I don't think this patch is necessary. [LCM] The major piece of the patch is add setting for '*internals=NULL;'. > > > > struct pkt_rx_queue *rx_queue; > > > struct pkt_tx_queue *tx_queue; > > > int rc, qsockfd, tpver, discard; > > > unsigned int i, q, rdsize; > > > int fanout_arg __rte_unused, bypass __rte_unused; > > > > > > + *internals = NULL; > > > + > > > for (k_idx = 0; k_idx < kvlist->count; k_idx++) { > > > pair = &kvlist->pairs[k_idx]; > > > if (strstr(pair->key, ETH_AF_PACKET_IFACE_ARG) != NULL) > > > > > > > > > > > > > -- > John W. Linville Someday the world will need a hero, and you > linville at tuxdriver.com might be all we have. Be ready.
[dpdk-dev] [PATCH v8 00/19] support multi-pthread per core
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, February 25, 2015 2:53 AM > To: Liang, Cunming > Cc: Ananyev, Konstantin; dev at dpdk.org; olivier.matz at 6wind.com; > nhorman at tuxdriver.com > Subject: Re: [PATCH v8 00/19] support multi-pthread per core > > > > v8 changes: > > > keep using strlen for trusted input string > > > > > > v7 changes: > > > update EAL version map for new public EAL API > > > rollback to use strnlen() passing EAL core option > > > > > > v6 changes: > > > rename RTE_RING_PAUSE_REP(_COUNT) and set default to 0 > > > rollback to use RTE_MAX_LCORE when checking valid lcore_id for EAL > > > thread > > > > > > v5 changes: > > > reorder some patch and split into addtional two patches > > > rte_thread_get_affinity() return type change to avoid > > > add RTE_RING_PAUSE_REP into config and by default turn off > > > > > > v4 changes: > > > new patch fixing strnlen() invalid return in 32bit icc [03/17] > > > update and add more comments on sched_yield() [16/17] > > > > > > v3 changes: > > > new patch adding sched_yield() in rte_ring to avoid long spin [16/17] > > > > > > v2 changes: > > > add '-' support for EAL option '--lcores' [02/17] > > > > > > The patch series contain the enhancements of EAL and fixes for libraries > > > to run multi-pthreads(either EAL or non-EAL thread) per physical core. > > > Two major changes list as below: > > > - Extend the core affinity of each EAL thread to 1:n. > > > Each lcore stands for a EAL thread rather than a logical core. > > > The change adds new EAL option to allow static lcore to cpuset > > > assginment. > > > Then a lcore(EAL thread) affinity to a cpuset, original 1:1 mapping is > > > the > special case. > > > - Fix the libraries to allow running on any non-EAL thread. > > > It fix the gaps running libraries in non-EAL thread(dynamic created by > > > user). > > > Each fix libraries take care the case of rte_lcore_id() >= > > > RTE_MAX_LCORE. > > > > > > Thanks a million for the comments from Konstantin, Bruce, Mirek and > Stephen in RFC review. > > > > > > Cunming Liang (19): > > > eal: add cpuset into per EAL thread lcore_config > > > eal: fix PAGE_SIZE redefine complaint on freebsd > > > eal: new eal option '--lcores' for cpu assignment > > > eal: fix wrong strnlen() return value in 32bit icc > > > eal: add public function parsing socket_id from cpu_id > > > eal: new TLS definition and API declaration > > > eal: add eal_common_thread.c for common thread API > > > eal: standardize init sequence between linux and bsd > > > eal: add rte_gettid() to acquire unique system tid > > > eal: apply affinity of EAL thread by assigned cpuset > > > enic: fix re-define freebsd compile complain > > > malloc: fix the issue of SOCKET_ID_ANY > > > log: fix the gap to support non-EAL thread > > > eal: set _lcore_id and _socket_id to (-1) by default > > > eal: fix recursive spinlock in non-EAL thraed > > > mempool: add support to non-EAL thread > > > ring: add support to non-EAL thread > > > ring: add sched_yield to avoid spin forever > > > timer: add support to non-EAL thread > > > > Acked-by: Konstantin Ananyev > > I tried to fix many english typos. Please consider it during reviews. > Cunming, you'll repeat 10 times "non-EAL threads compute more than none" ;) > > Applied, thanks [Liang, Cunming] Thanks, Thomas. I'll take care of it next time. :) > > My main concern in this patchset is about naming. Now lcore means thread > in many places. I would prefer to have a cleanup to use right term at > right place, even if it requires breaking API. [Liang, Cunming] 'lcore' is limited used as EAL thread. Comparing to the legacy usage, the difference is such lcore(logical core) may not only affinity to one physical core. If extending the meaning of 'lcore' a bit wider (as prog_guide doc said, let's consider a logical core stands for an EAL thread), it then makes sense to keep origin APIs. That helps the existing apps migrate transparently. > > Are we going to deprecate the fresh option -l in favor of --lcores/--threads? [Liang, Cunming] I think so, as '--lcores' already covered '-l' pattern. Mark it as deprecated, and remove it on next version ?
[dpdk-dev] [PATCH v1] doc: prog guide update for eal multi-pthread
I'm afraid not yet, so appreciate for any revision suggestion. > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, February 25, 2015 3:11 AM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] doc: prog guide update for eal > multi-pthread > > 2015-02-16 15:34, Cunming Liang: > > The patch add the multi-pthread section under EAL chapter of prog_guide. > > > > Signed-off-by: Cunming Liang > > I guess this documentation has been co-written with a native english? > > Applied, thanks
[dpdk-dev] Missing symbol error
Hi, You're right, it's missing in the version map. Will send path to fix it. Thanks. > -Original Message- > From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp] > Sent: Wednesday, February 25, 2015 10:49 AM > To: dev at dpdk.org > Cc: Liang, Cunming > Subject: Missing symbol error > > Hi, > > I've got following error when I enable CONFIG_RTE_BUILD_SHARED_LIB. > > dpdk/x86_64-native-linuxapp-gcc/lib/libethdev.so: undefined reference to > `per_lcore__socket_id' > collect2: error: ld returned 1 exit status > make[5]: *** [dump_cfg] Error 1 > make[4]: *** [dump_cfg] Error 2 > make[4]: *** Waiting for unfinished jobs > > > It seems after applying below commit, this issue is occurred. > 8baacdd... eal: apply thread affinity by assigned cpuset > > Thanks, > Tetsuya
[dpdk-dev] ixgbe vector mode not working.
Hi Stephen, I tried on the latest mater branch with testpmd. 2 rxq and 2 txq as below, vector pmd on both rx and tx. I can't reproduced it. I checked your log, on tx side, it looks the tx vector haven't enabled. (it shows vpmd on rx, spmd on tx). Would you help to share the below params in your app ? RX desc=128 - RX free threshold=32 TX desc=512 - TX free threshold=32 TX RS bit threshold=32 - TXQ flags=0xf01 As in your case which using 2 rxq and 1 txq, would you explain the traffic flow between them. One thread polling packets from each rxq and send to the specified txq ? ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xff00 -n 4 -- -i --coremask=f000 --txfreet=32 --rxfreet=32 --txqflags=0xf01 --txrst=32 --rxq=2 --txq=2 --numa [...] Configuring Port 0 (socket 1) PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f99cace9ac0 hw_ring=0x7f99c9c3f480 dma_addr=0x1fdd83f480 PMD: set_tx_function(): Using simple tx code path PMD: set_tx_function(): Vector tx enabled. PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f99cace7980 hw_ring=0x7f99c9c4f480 dma_addr=0x1fdd84f480 PMD: set_tx_function(): Using simple tx code path PMD: set_tx_function(): Vector tx enabled. PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f99cace7100 hw_ring=0x7f99c9c5f480 dma_addr=0x1fdd85f480 PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are satisfied. Rx Burst Bulk Alloc function will be used on port=0, queue=0. PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX burst size no less than 32. PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f99cace6880 hw_ring=0x7f99c9c6f500 dma_addr=0x1fdd86f500 PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are satisfied. Rx Burst Bulk Alloc function will be used on port=0, queue=1. PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX burst size no less than 32. Port 0: 90:E2:BA:30:A0:75 Configuring Port 1 (socket 1) PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f99cace4540 hw_ring=0x7f99c9c7f580 dma_addr=0x1fdd87f580 PMD: set_tx_function(): Using simple tx code path PMD: set_tx_function(): Vector tx enabled. PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f99cace2400 hw_ring=0x7f99c9c8f580 dma_addr=0x1fdd88f580 PMD: set_tx_function(): Using simple tx code path PMD: set_tx_function(): Vector tx enabled. PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f99cace1b80 hw_ring=0x7f99c9c9f580 dma_addr=0x1fdd89f580 PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are satisfied. Rx Burst Bulk Alloc function will be used on port=1, queue=0. PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX burst size no less than 32. PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f99cace1300 hw_ring=0x7f99c9caf600 dma_addr=0x1fdd8af600 PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are satisfied. Rx Burst Bulk Alloc function will be used on port=1, queue=1. PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX burst size no less than 32. Port 1: 90:E2:BA:06:90:59 Checking link statuses... Port 0 Link Up - speed 1 Mbps - full-duplex Port 1 Link Up - speed 1 Mbps - full-duplex Done testpmd> show config rxtx io packet forwarding - CRC stripping disabled - packets/burst=32 nb forwarding cores=4 - nb forwarding ports=2 RX queues=2 - RX desc=128 - RX free threshold=32 RX threshold registers: pthresh=8 hthresh=8 wthresh=0 TX queues=2 - TX desc=512 - TX free threshold=32 TX threshold registers: pthresh=32 hthresh=0 wthresh=0 TX RS bit threshold=32 - TXQ flags=0xf01 -Cunming > -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Wednesday, February 25, 2015 8:16 AM > To: Nemeth, Balazs; Richardson, Bruce; Liang, Cunming; Neil Horman > Cc: dev at dpdk.org > Subject: ixgbe vector mode not working. > > The ixgbe driver (from 1.8 or 2.0) works fine in normal (non-vectored) mode. > But when vector mode is enabled, it gets a few packets through then hangs. > We use 2 Rx queues and 1 Tx queue per interface. > > Devices: > 01:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ > Network Connection (rev 01) > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit > X540- > AT2 (rev 01) > > Log: > EAL: probe driver: 8086:10fb rte_ixgbe_pmd > PMD: eth_ixgbe_dev_init(): MAC: 2, PHY: 17, SFP+: 5 > PMD: eth_ixgbe_dev_init(): port 0 vendorID=0x8086 deviceID=0x10fb > EAL: probe driver: 8086:1528 rte_ixgbe_pmd > PMD: eth_ixgbe_dev_init(): MAC: 4, PHY: 3 > PMD: eth_ixgbe_dev_init(): port 1 vendorID=0x8086 deviceID=0x1528 > [0.43] DATAPLANE: Port 0 rte_ixgbe_pmd on socket 0 > [0.53] DATAPLANE: Port 1 rte_ixgbe_pmd on socket 0 > [0.031638] PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7fc5ac6a1b40 > hw_ring=0x7fc5ab548300 dma_addr=0x67348300 > [0.031647]
[dpdk-dev] ixgbe vector mode not working.
Hi Stephen, Thanks for the info, with rxd=4000, I can reproduce it. On that time, it runs out of mbuf. I'll follow up this issue. > -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Wednesday, February 25, 2015 3:37 PM > To: Liang, Cunming > Cc: Nemeth, Balazs; Richardson, Bruce; Neil Horman; dev at dpdk.org > Subject: Re: ixgbe vector mode not working. > > On Wed, 25 Feb 2015 04:55:09 + > "Liang, Cunming" wrote: > > > Hi Stephen, > > > > I tried on the latest mater branch with testpmd. > > 2 rxq and 2 txq as below, vector pmd on both rx and tx. I can't reproduced > > it. > > I checked your log, on tx side, it looks the tx vector haven't enabled. (it > > shows > vpmd on rx, spmd on tx). > > Would you help to share the below params in your app ? > > RX desc=128 - RX free threshold=32 > > TX desc=512 - TX free threshold=32 > > TX RS bit threshold=32 - TXQ flags=0xf01 > > As in your case which using 2 rxq and 1 txq, would you explain the traffic > > flow > between them. > > One thread polling packets from each rxq and send to the specified txq ? > > Basic thread model of application is same as examples/qos_sched. > > On ixgbe: > RX desc = 4000 - RX free threshold=32 > TX desc = 512 - TX free threshold=0 so driver sets default of 32 > > I was setting rx/tx conf but since examples don't went away from that. [LCM] All these params defined in rte_eth_rxconf/rte_eth_txconf which are used during rte_eth_rx/tx_queue_setup. If don't care the value and assign nothing for it, it takes the default value per each device. For ixgbe, the default_txconf will use the vpmd. In your log, it's not. So that's why I asked for such params. > > The whole RX/TX tuning parameters are a very poor programming model only > a hardware engineer could love. Requiring the application to look at > driver string and choose the magic parameter settings, is in my opnion > an indication of using incorrect abstraction. [LCM] It's not necessary for application to look at such parameter. As you said, that's only for RX/TX tuning. If tuning, it makes sense to understand what these parameters mean.
[dpdk-dev] [PATCH v1] afpacket: fix critical issue reported by klocwork
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, February 25, 2015 4:46 PM > To: Liang, Cunming > Cc: John W. Linville; dev at dpdk.org; John Linville > Subject: Re: [dpdk-dev] [PATCH v1] afpacket: fix critical issue reported by > klocwork > > 2015-02-25 00:57, Liang, Cunming: > > From: John W. Linville [mailto:linville at tuxdriver.com] > > > On Fri, Feb 20, 2015 at 11:19:59AM +0100, Thomas Monjalon wrote: > > > > 2015-02-12 17:08, Cunming Liang: > > > > > --- a/lib/librte_pmd_af_packet/rte_eth_af_packet.c > > > > > +++ b/lib/librte_pmd_af_packet/rte_eth_af_packet.c > > > > > @@ -439,13 +439,15 @@ rte_pmd_init_internals(const char *name, > > > > > size_t ifnamelen; > > > > > unsigned k_idx; > > > > > struct sockaddr_ll sockaddr; > > > > > - struct tpacket_req *req; > > > > > + struct tpacket_req *req = NULL; > > > > > > > > If *internals is set to NULL, there should be no case where req used > > > > and undefined. > > > > [LCM] Agree, so that's why I add '*internals = NULL' below as well. > > > > > > I agree -- it looks to me like req is protected by checking for > > > *internals == NULL. I don't think this patch is necessary. > > > > [LCM] The major piece of the patch is add setting for '*internals=NULL;'. > > Yes understood, but it is already initialized to NULL before calling > rte_pmd_init_internals(): > http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_af_packet/rte_eth_af_packet > .c#n706 [LCM] I see, it's complained by klocwork. So either adding 'internals=NULL' or adding some comments helps to avoid checking again on the next scanning. How do you think ?
[dpdk-dev] [PATCH] eal: Clean up export of per_lcore__socket_id
Hi Neil, Thanks for the cleanup. Does it better moving rte_socket_id() to eal_common_thread.c ? As it simply returns _socket_id, it's not necessary to have two copy in both linux and bsd. -Cunming > -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Wednesday, February 25, 2015 10:34 PM > To: dev at dpdk.org > Cc: thomas.monjalon at 6wind.com; Liang, Cunming; Neil Horman > Subject: [PATCH] eal: Clean up export of per_lcore__socket_id > > Theres no need to export this variable. Its set and queried from an API call > that doesn't exist in the hot path. Instead just export the rte_socket_id > symbol and make the variable private to protect it from type changes. We > should > do this with the other exported variables too, but I think its too late in the > release cycle to do that. > > tested using distributor_autotest (which uses rte_socket_id), successfully. > Only tested on linux, as I don't currently have a bsd system spun up, but the > changes are symmetric, and should be fine > > Signed-off-by: Neil Horman > --- > lib/librte_eal/bsdapp/eal/eal_thread.c | 5 + > lib/librte_eal/bsdapp/eal/rte_eal_version.map | 2 +- > lib/librte_eal/common/eal_common_thread.c | 2 ++ > lib/librte_eal/common/include/rte_lcore.h | 7 +-- > lib/librte_eal/linuxapp/eal/eal_thread.c| 5 + > lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 +- > 6 files changed, 15 insertions(+), 8 deletions(-) > > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c > b/lib/librte_eal/bsdapp/eal/eal_thread.c > index ca95c72..5e6eea9 100644 > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c > @@ -60,6 +60,11 @@ RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = > LCORE_ID_ANY; > RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY; > RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > > +unsigned rte_socket_id(void) > +{ > + return RTE_PER_LCORE(_socket_id); > +} > + > /* > * Send a message to a slave lcore identified by slave_id to call a > * function f with argument arg. Once the execution is done, the > diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > index 17515a9..d83524d 100644 > --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > @@ -10,7 +10,6 @@ DPDK_2.0 { > pci_driver_list; > per_lcore__lcore_id; > per_lcore__rte_errno; > - per_lcore__socket_id; > rte_cpu_check_supported; > rte_cpu_get_flag_enabled; > rte_cycles_vmware_tsc_map; > @@ -82,6 +81,7 @@ DPDK_2.0 { > rte_set_log_level; > rte_set_log_type; > rte_snprintf; > + rte_socket_id; > rte_strerror; > rte_strsplit; > rte_sys_gettid; > diff --git a/lib/librte_eal/common/eal_common_thread.c > b/lib/librte_eal/common/eal_common_thread.c > index f4d9892..4010eab 100644 > --- a/lib/librte_eal/common/eal_common_thread.c > +++ b/lib/librte_eal/common/eal_common_thread.c > @@ -46,6 +46,8 @@ > > #include "eal_thread.h" > > +RTE_DECLARE_PER_LCORE(unsigned , _socket_id); > + > int eal_cpuset_socket_id(rte_cpuset_t *cpusetp) > { > unsigned cpu = 0; > diff --git a/lib/librte_eal/common/include/rte_lcore.h > b/lib/librte_eal/common/include/rte_lcore.h > index 20a58eb..e03264e 100644 > --- a/lib/librte_eal/common/include/rte_lcore.h > +++ b/lib/librte_eal/common/include/rte_lcore.h > @@ -81,7 +81,6 @@ struct lcore_config { > extern struct lcore_config lcore_config[RTE_MAX_LCORE]; > > RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per thread "lcore id". */ > -RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". > */ > RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */ > > /** > @@ -145,11 +144,7 @@ rte_lcore_index(int lcore_id) > * @return > * the ID of current lcoreid's physical socket > */ > -static inline unsigned > -rte_socket_id(void) > -{ > - return RTE_PER_LCORE(_socket_id); > -} > +unsigned rte_socket_id(void); > > /** > * Get the ID of the physical socket of the specified lcore > diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c > b/lib/librte_eal/linuxapp/eal/eal_thread.c > index 5635c7d..9cacd86 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_thread.c > +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c > @@ -60,6 +60,11 @@ RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = > LCORE_ID_ANY; > RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_AN
[dpdk-dev] [PATCH v2] eal: Clean up export of per_lcore__socket_id
Hi, > -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Thursday, February 26, 2015 8:48 PM > To: dev at dpdk.org > Cc: thomas.monjalon at 6wind.com; Liang, Cunming; Neil Horman > Subject: [PATCH v2] eal: Clean up export of per_lcore__socket_id > > Theres no need to export this variable. Its set and queried from an API call > that doesn't exist in the hot path. Instead just export the rte_socket_id > symbol and make the variable private to protect it from type changes. We > should > do this with the other exported variables too, but I think its too late in the > release cycle to do that. > > tested using distributor_autotest (which uses rte_socket_id), successfully. > Only tested on linux, as I don't currently have a bsd system spun up, but the > changes are symmetric, and should be fine > > Signed-off-by: Neil Horman > > --- > Change Notes: > > v2) Moved rte_socket_id to be a common function > --- > lib/librte_eal/bsdapp/eal/eal_thread.c | 1 - > lib/librte_eal/bsdapp/eal/rte_eal_version.map | 2 +- > lib/librte_eal/common/eal_common_thread.c | 7 +++ > lib/librte_eal/common/include/rte_lcore.h | 7 +-- > lib/librte_eal/linuxapp/eal/eal_thread.c| 1 - > lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 +- > 6 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c > b/lib/librte_eal/bsdapp/eal/eal_thread.c > index ca95c72..3672cdb 100644 > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c > @@ -59,7 +59,6 @@ > RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = LCORE_ID_ANY; > RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY; > RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > - > /* > * Send a message to a slave lcore identified by slave_id to call a > * function f with argument arg. Once the execution is done, the > diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > index 17515a9..d83524d 100644 > --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > @@ -10,7 +10,6 @@ DPDK_2.0 { > pci_driver_list; > per_lcore__lcore_id; > per_lcore__rte_errno; > - per_lcore__socket_id; > rte_cpu_check_supported; > rte_cpu_get_flag_enabled; > rte_cycles_vmware_tsc_map; > @@ -82,6 +81,7 @@ DPDK_2.0 { > rte_set_log_level; > rte_set_log_type; > rte_snprintf; > + rte_socket_id; > rte_strerror; > rte_strsplit; > rte_sys_gettid; > diff --git a/lib/librte_eal/common/eal_common_thread.c > b/lib/librte_eal/common/eal_common_thread.c > index f4d9892..2405e93 100644 > --- a/lib/librte_eal/common/eal_common_thread.c > +++ b/lib/librte_eal/common/eal_common_thread.c > @@ -46,6 +46,13 @@ > > #include "eal_thread.h" > > +RTE_DECLARE_PER_LCORE(unsigned , _socket_id); > + > +unsigned rte_socket_id(void) > +{ > + return RTE_PER_LCORE(_socket_id); > +} > + > int eal_cpuset_socket_id(rte_cpuset_t *cpusetp) > { > unsigned cpu = 0; > diff --git a/lib/librte_eal/common/include/rte_lcore.h > b/lib/librte_eal/common/include/rte_lcore.h > index 20a58eb..e03264e 100644 > --- a/lib/librte_eal/common/include/rte_lcore.h > +++ b/lib/librte_eal/common/include/rte_lcore.h > @@ -81,7 +81,6 @@ struct lcore_config { > extern struct lcore_config lcore_config[RTE_MAX_LCORE]; > > RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per thread "lcore id". */ > -RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". > */ > RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */ > > /** > @@ -145,11 +144,7 @@ rte_lcore_index(int lcore_id) > * @return > * the ID of current lcoreid's physical socket > */ > -static inline unsigned > -rte_socket_id(void) > -{ > - return RTE_PER_LCORE(_socket_id); > -} > +unsigned rte_socket_id(void); > > /** > * Get the ID of the physical socket of the specified lcore > diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c > b/lib/librte_eal/linuxapp/eal/eal_thread.c > index 5635c7d..65bcbe3 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_thread.c > +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c > @@ -59,7 +59,6 @@ > RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = LCORE_ID_ANY; > RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY; > RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); > - > /* > * Send a
[dpdk-dev] [PATCH] virtio: Fix compilation issue on freebsd
Hi, > -Original Message- > From: Ouyang, Changchun > Sent: Friday, February 27, 2015 10:30 AM > To: dev at dpdk.org > Cc: Liang, Cunming; Cao, Waterman; Ouyang, Changchun > Subject: [PATCH] virtio: Fix compilation issue on freebsd > > This patch fixes the compilation issue on freebsd: > > /root/qwan/tmp/dpdk_org/lib/librte_pmd_virtio/virtio_ethdev.c: In function > 'virtio_resource_init': > /root/qwan/tmp/dpdk_org/lib/librte_pmd_virtio/virtio_ethdev.c:1071:56: error: > unused parameter 'pci_dev' [-Werror=unused-parameter] static int > virtio_resource_init(struct rte_pci_device *pci_dev) > ^ > cc1: all warnings being treated as errors > > Signed-off-by: Changchun Ouyang > --- > lib/librte_pmd_virtio/virtio_ethdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c > b/lib/librte_pmd_virtio/virtio_ethdev.c > index 9eb0217..88ecd57 100644 > --- a/lib/librte_pmd_virtio/virtio_ethdev.c > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c > @@ -1068,7 +1068,7 @@ virtio_has_msix(const struct rte_pci_addr *loc > __rte_unused) > return 0; > } > > -static int virtio_resource_init(struct rte_pci_device *pci_dev) > +static int virtio_resource_init(struct rte_pci_device *pci_dev __rte_unused) > { > /* no setup required */ > return 0; > -- > 1.8.4.2 Acked-by: Cunming Liang
[dpdk-dev] [PATCH v6 3/8] eal/bsd: dummy for new intr definition
From: David Marchand [mailto:david.march...@6wind.com] Sent: Friday, February 27, 2015 6:00 PM To: Liang, Cunming Cc: dev at dpdk.org; Stephen Hemminger; Thomas Monjalon Subject: Re: [PATCH v6 3/8] eal/bsd: dummy for new intr definition Hello, On Fri, Feb 27, 2015 at 5:56 AM, Cunming Liang mailto:cunming.liang at intel.com>> wrote: diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h index 87a9cf6..b114aac 100644 --- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h @@ -38,6 +38,8 @@ #ifndef _RTE_LINUXAPP_INTERRUPTS_H_ #define _RTE_LINUXAPP_INTERRUPTS_H_ +#define VFIO_MAX_RXTX_INTR_ID32 + enum rte_intr_handle_type { RTE_INTR_HANDLE_UNKNOWN = 0, RTE_INTR_HANDLE_UIO, /**< uio device handle */ @@ -49,6 +51,8 @@ enum rte_intr_handle_type { struct rte_intr_handle { int fd; /**< file descriptor */ enum rte_intr_handle_type type; /**< handle type */ + int max_intr;/**< max interrupt requested */ + uint32_t vec_num[VFIO_MAX_QUEUE_ID]; /**< rxtx intr vector number */ }; No need to add those since this is not supported for bsd. [Liang, Cunming] max_intr is used in dev_init for pci_dev->intr_handle init. Vec_num is used in ethdev API rx_intr_vec_get. Without it, BSD macro will used for each of the reference place. As they?re quite generic, even bsd will require either max_intr or vec mapping table. -- David Marchand
[dpdk-dev] [PATCH v6 2/8] eal/linux: add rx queue interrupt FDs to intr handle struct
From: David Marchand [mailto:david.march...@6wind.com] Sent: Friday, February 27, 2015 6:33 PM To: Liang, Cunming Cc: dev at dpdk.org; Stephen Hemminger; Thomas Monjalon; Zhou, Danny Subject: Re: [PATCH v6 2/8] eal/linux: add rx queue interrupt FDs to intr handle struct Hello, On Fri, Feb 27, 2015 at 5:56 AM, Cunming Liang mailto:cunming.liang at intel.com>> wrote: Per vector event fd will store in rte_intr_handle during init. Device drivers take responsibility to fill queue-vec mapping table(vec_num[]). Signed-off-by: Danny Zhou mailto:danny.zhou at intel.com>> Signed-off-by: Cunming Liang mailto:cunming.liang at intel.com>> --- v6 changes: - add mapping table between irq vector number and queue id. diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h index 6a159c7..9f45377 100644 --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h @@ -38,6 +38,9 @@ #ifndef _RTE_LINUXAPP_INTERRUPTS_H_ #define _RTE_LINUXAPP_INTERRUPTS_H_ +#define VFIO_MAX_RXTX_INTR_ID32 +#define VFIO_MAX_QUEUE_IDVFIO_MAX_RXTX_INTR_ID + This is a little weird to talk about vfio here. This file is "generic". Ok, you will store vfio eventfds here, but vfio is an implementation, not the abstraction. [Liang, Cunming] If looking at the rte_intr_hanle_type, it includes UIO/VFIO_LEGACY/VFIO_MSI/VFIO_MSIX. I agree, VFIO is an implementation, but the different type combination is a kind of ?abstraction?. So in rte_intr_handle (like a multiplexing), some specified field for vfio interrupter mapping, I feel it?s reasonable. -- David Marchand
[dpdk-dev] [PATCH v6 4/8] eal/linux: add per rx queue interrupt handling based on VFIO
Hi, From: David Marchand [mailto:david.march...@6wind.com] Sent: Friday, February 27, 2015 6:34 PM To: Liang, Cunming Cc: dev at dpdk.org; Stephen Hemminger; Thomas Monjalon; Zhou, Danny Subject: Re: [PATCH v6 4/8] eal/linux: add per rx queue interrupt handling based on VFIO I am not really comfortable with this api. This is just creating something on top of the standard epoll api with limitations. In the end, we could just use an external lib that does this already. [Liang, Cunming] Not really, I think. We try to protect the data inside ?rte_intr_handle?, it doesn?t expect user to understand the things defined inside ?rte_intr_handle?. It?s better typedef ?rte_intr_handle? as a raw integer ID, having a function to get it from a ethdev. Then all the interrupt api is around it. It provides the common pci NIC devices rxtx interrupt processing approach. For the limitations, we can fix it step by step. So ok, this will work for your limited use case, but this will not be really useful for anything else. Not sure it has its place in eal, this is more an example to me. [Liang, Cunming] ?limited use case? do you means only for rxtx ? It don?t expect to provide a generic event mechanism (like libev/libevent does), but a simple way to allow PMD work with DMA interrupt. It mainly abstract for rx interrupt purpose. I appreciate if you could help to list more useful cases. On Fri, Feb 27, 2015 at 5:56 AM, Cunming Liang mailto:cunming.liang at intel.com>> wrote: This patch does below: - Create multiple VFIO eventfd for rx queues. - Handle per rx queue interrupt. - Eliminate unnecessary suspended DPDK polling thread wakeup mechanism for rx interrupt by allowing polling thread epoll_wait rx queue interrupt notification. Signed-off-by: Danny Zhou mailto:danny.zhou at intel.com>> Signed-off-by: Cunming Liang mailto:cunming.liang at intel.com>> --- v6 changes - split rte_intr_wait_rx_pkt into two function, wait and set. - rewrite rte_intr_rx_wait/rte_intr_rx_set to remove queue visibility on eal. - rte_intr_rx_wait to support multiplexing. - allow epfd as input to support flexible event fd combination. lib/librte_eal/linuxapp/eal/eal_interrupts.c| 224 +++- lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 23 ++- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 + 3 files changed, 201 insertions(+), 48 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 8c5b834..f90c2b4 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c [snip] +static void +eal_intr_process_rxtx_interrupts(struct rte_intr_handle *intr_handle, +struct epoll_event *events, +uint32_t *vec, int nfds) +{ + int i, bytes_read; + union rte_intr_read_buffer buf; + int fd; + + for (i = 0; i < nfds; i++) { + /* set the length to be read for different handle type */ + switch (intr_handle->type) { + case RTE_INTR_HANDLE_UIO: + bytes_read = sizeof(buf.uio_intr_count); + break; + case RTE_INTR_HANDLE_ALARM: + bytes_read = sizeof(buf.timerfd_num); + break; +#ifdef VFIO_PRESENT + case RTE_INTR_HANDLE_VFIO_MSIX: + case RTE_INTR_HANDLE_VFIO_MSI: + case RTE_INTR_HANDLE_VFIO_LEGACY: + bytes_read = sizeof(buf.vfio_intr_count); + break; +#endif + default: + bytes_read = 1; + break; + } + + /** + * read out to clear the ready-to-be-read flag + * for epoll_wait. + */ + vec[i] = events[i].data.u32; + assert(vec[i] < VFIO_MAX_RXTX_INTR_ID); + + fd = intr_handle->efds[vec[i]]; + bytes_read = read(fd, &buf, bytes_read); + if (bytes_read < 0) + RTE_LOG(ERR, EAL, "Error reading from file " + "descriptor %d: %s\n", fd, strerror(errno)); + else if (bytes_read == 0) + RTE_LOG(ERR, EAL, "Read nothing from file " + "descriptor %d\n", fd); + } +} Why unconditionnally read ? You are absorbing events from the application if the application gave you an external epfd and populated it with its own fds. [Liang, Cunming] The vector number was checked. If an external epfd populated some event carry fd rather than a data.u32 but the value inside the valid range, it considers as a valid vector number. No matter the read success or not, it always notify the event. Do you have any sug
[dpdk-dev] [PATCH v6 2/8] eal/linux: add rx queue interrupt FDs to intr handle struct
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Friday, February 27, 2015 10:52 PM > To: Liang, Cunming > Cc: David Marchand; dev at dpdk.org; Stephen Hemminger; Zhou, Danny > Subject: Re: [PATCH v6 2/8] eal/linux: add rx queue interrupt FDs to intr > handle > struct > > 2015-02-27 11:28, Liang, Cunming: > > From: David Marchand [mailto:david.marchand at 6wind.com] > > Sent: Friday, February 27, 2015 6:33 PM > > > On Fri, Feb 27, 2015 at 5:56 AM, Cunming Liang wrote: > > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h > > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h > > > > @@ -38,6 +38,9 @@ > > > > > > > > #ifndef _RTE_LINUXAPP_INTERRUPTS_H_ > > > > #define _RTE_LINUXAPP_INTERRUPTS_H_ > > > > > > > > +#define VFIO_MAX_RXTX_INTR_ID32 > > > > +#define VFIO_MAX_QUEUE_IDVFIO_MAX_RXTX_INTR_ID > > > > > > This is a little weird to talk about vfio here. > > > This file is "generic". > > > > > > Ok, you will store vfio eventfds here, but vfio is an implementation, > > > not the abstraction. > > > > [Liang, Cunming] If looking at the rte_intr_hanle_type, it includes > UIO/VFIO_LEGACY/VFIO_MSI/VFIO_MSIX. > > I agree, VFIO is an implementation, but the different type combination is a > > kind > of ?abstraction?. > > So in rte_intr_handle (like a multiplexing), some specified field for vfio > interrupter mapping, I feel it?s reasonable. > > Not sure to understand. Are we trying to mask the different kernel drivers > from an application point of view, and provide a generic interrupt mechanism? > If yes, why some VFIO constants are needed? > I'm not saying that the current implementation is perfect, but we should try > to improve it. [LCM] VFIO_MAX_RXTX_INTR_ID is easy to fix, it can move to a private interrupt header file, as only be used inside EAL. VFIO_MAX_QUEUE_ID can be removed, so vec_num[] dynamic creation by the device driver. Sounds good ? > > Thanks
[dpdk-dev] [PATCH v6 3/8] eal/bsd: dummy for new intr definition
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Friday, February 27, 2015 10:22 PM > To: Liang, Cunming > Cc: David Marchand; dev at dpdk.org; Stephen Hemminger > Subject: Re: [PATCH v6 3/8] eal/bsd: dummy for new intr definition > > 2015-02-27 11:21, Liang, Cunming: > > From: David Marchand [mailto:david.marchand at 6wind.com] > > > On Fri, Feb 27, 2015 at 5:56 AM, Cunming Liang wrote: > > > > @@ -49,6 +51,8 @@ enum rte_intr_handle_type { > > > > > > > > struct rte_intr_handle { > > > > > > > > int fd; /**< file descriptor */ > > > > enum rte_intr_handle_type type; /**< handle type */ > > > > > > > > + int max_intr;/**< max interrupt requested */ > > > > + uint32_t vec_num[VFIO_MAX_QUEUE_ID]; /**< rxtx intr vector > number */ > > > > }; > > > > > > No need to add those since this is not supported for bsd. > > > > [Liang, Cunming] max_intr is used in dev_init for pci_dev->intr_handle init. > > Vec_num is used in ethdev API rx_intr_vec_get. Without it, BSD macro will > > used for each of the reference place. > > As they?re quite generic, even bsd will require either max_intr or vec > > mapping table. > > Is it needed to build and run DPDK on FreeBSD? [LCM] As it's the EAL change, so I try to make sure FreeBSD can build and run as normal.
[dpdk-dev] [PATCH v6 4/8] eal/linux: add per rx queue interrupt handling based on VFIO
Thanks Thomas. It's my fault that directly reply David's mail, haven't notice his mail isn't in a plain text mode. > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Friday, February 27, 2015 10:13 PM > To: Liang, Cunming > Cc: David Marchand; dev at dpdk.org; Stephen Hemminger; Zhou, Danny > Subject: Re: [PATCH v6 4/8] eal/linux: add per rx queue interrupt handling > based > on VFIO > > Hi Cunming, > > First, sorry to have to say that, but it is not easy to read discussions > where quote marks are not used. I re-insert them for clarity. > > Comments below. > > 2015-02-27 12:22, Liang, Cunming: > > From: David Marchand [mailto:david.marchand at 6wind.com] > > Sent: Friday, February 27, 2015 6:34 PM > > > > > I am not really comfortable with this api. > > > > > > This is just creating something on top of the standard epoll api with > > > limitations. In the end, we could just use an external lib that does this > > > already. > > > > [Liang, Cunming] Not really, I think. We try to protect the data inside > > ?rte_intr_handle?, it doesn?t expect user to understand the things defined > > inside ?rte_intr_handle?. > > It?s better typedef ?rte_intr_handle? as a raw integer ID, having a function > > to get it from a ethdev. Then all the interrupt api is around it. > > It provides the common pci NIC devices rxtx interrupt processing approach. > > For the limitations, we can fix it step by step. > > > > > So ok, this will work for your limited use case, but this will not be > > > really useful for anything else. > > > Not sure it has its place in eal, this is more an example to me. > > > > [Liang, Cunming] ?limited use case? do you means only for rxtx ? > > It don?t expect to provide a generic event mechanism (like libev/libevent > > does), but a simple way to allow PMD work with DMA interrupt. It mainly > > abstract for rx interrupt purpose. I appreciate if you could help to list > > more useful cases. > > You don't expect to provide a generic event mechanism but application > developpers could need to wait for many events at once, not only Rx ones. > That's why it's better to provide only the needed parts to use something > generic like libevent. > And we should avoid reinventing the wheel. [LCM] Ok, I get you. I have a simple proposal to allow either RX event or other events can be handled in rte_intr_wait(). For the input data 'epoll_data', instead of using 'u32', let's keep use 'int fd'. If the most significant bit is 0, event[n] stands for a fd. If it's 1, event[0]&0x stands for a vector number. So during 'rte_intr_set', it get 16bit vector number and encode it as a 32bit int with the most significant bit 1. Then on 'rte_intr_wait', only process the data.fd with the most significant bit 1. And bypass the user fd. 'rte_intr_wait(struct rte_intr_handle *intr_handle, int epfd, int *event, uint16_t num)'. As user already can assign an epfd, so they can add any normal event fd into the epfd. Make sense ? > > > > > +static void > > > > +eal_intr_process_rxtx_interrupts(struct rte_intr_handle *intr_handle, > > > > +struct epoll_event *events, > > > > +uint32_t *vec, int nfds) > > > > +{ > > > > + int i, bytes_read; > > > > + union rte_intr_read_buffer buf; > > > > + int fd; > > > > + > > > > + for (i = 0; i < nfds; i++) { > > > > + /* set the length to be read for different handle type > > > > */ > > > > + switch (intr_handle->type) { > > > > + case RTE_INTR_HANDLE_UIO: > > > > + bytes_read = sizeof(buf.uio_intr_count); > > > > + break; > > > > + case RTE_INTR_HANDLE_ALARM: > > > > + bytes_read = sizeof(buf.timerfd_num); > > > > + break; > > > > +#ifdef VFIO_PRESENT > > > > + case RTE_INTR_HANDLE_VFIO_MSIX: > > > > + case RTE_INTR_HANDLE_VFIO_MSI: > > > > + case RTE_INTR_HANDLE_VFIO_LEGACY: > > > > + bytes_read = sizeof(buf.vfio_intr_count); > > > > + break; > > > > +#endif > > > > + default: > > > > + bytes_re
[dpdk-dev] [PATCH v3] eal: Clean up export of per_lcore__socket_id
Hi, > -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Friday, February 27, 2015 8:33 PM > To: dev at dpdk.org > Cc: thomas.monjalon at 6wind.com; Liang, Cunming; Neil Horman > Subject: [PATCH v3] eal: Clean up export of per_lcore__socket_id > > Theres no need to export this variable. Its set and queried from an API call > that doesn't exist in the hot path. Instead just export the rte_socket_id > symbol and make the variable private to protect it from type changes. We > should > do this with the other exported variables too, but I think its too late in the > release cycle to do that. > > tested using distributor_autotest (which uses rte_socket_id), successfully. > Only tested on linux, as I don't currently have a bsd system spun up, but the > changes are symmetric, and should be fine > > Signed-off-by: Neil Horman > > --- > Change Notes: > > v2) Moved rte_socket_id to be a common function > > v3) replaced some previously removed spaces > --- > lib/librte_eal/bsdapp/eal/rte_eal_version.map | 2 +- > lib/librte_eal/common/eal_common_thread.c | 7 +++ > lib/librte_eal/common/include/rte_lcore.h | 7 +-- > lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 +- > 4 files changed, 10 insertions(+), 8 deletions(-) > > diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > index 17515a9..d83524d 100644 > --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > @@ -10,7 +10,6 @@ DPDK_2.0 { > pci_driver_list; > per_lcore__lcore_id; > per_lcore__rte_errno; > - per_lcore__socket_id; > rte_cpu_check_supported; > rte_cpu_get_flag_enabled; > rte_cycles_vmware_tsc_map; > @@ -82,6 +81,7 @@ DPDK_2.0 { > rte_set_log_level; > rte_set_log_type; > rte_snprintf; > + rte_socket_id; > rte_strerror; > rte_strsplit; > rte_sys_gettid; > diff --git a/lib/librte_eal/common/eal_common_thread.c > b/lib/librte_eal/common/eal_common_thread.c > index f4d9892..2405e93 100644 > --- a/lib/librte_eal/common/eal_common_thread.c > +++ b/lib/librte_eal/common/eal_common_thread.c > @@ -46,6 +46,13 @@ > > #include "eal_thread.h" > > +RTE_DECLARE_PER_LCORE(unsigned , _socket_id); > + > +unsigned rte_socket_id(void) > +{ > + return RTE_PER_LCORE(_socket_id); > +} > + > int eal_cpuset_socket_id(rte_cpuset_t *cpusetp) > { > unsigned cpu = 0; > diff --git a/lib/librte_eal/common/include/rte_lcore.h > b/lib/librte_eal/common/include/rte_lcore.h > index 20a58eb..e03264e 100644 > --- a/lib/librte_eal/common/include/rte_lcore.h > +++ b/lib/librte_eal/common/include/rte_lcore.h > @@ -81,7 +81,6 @@ struct lcore_config { > extern struct lcore_config lcore_config[RTE_MAX_LCORE]; > > RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per thread "lcore id". */ > -RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". > */ > RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */ > > /** > @@ -145,11 +144,7 @@ rte_lcore_index(int lcore_id) > * @return > * the ID of current lcoreid's physical socket > */ > -static inline unsigned > -rte_socket_id(void) > -{ > - return RTE_PER_LCORE(_socket_id); > -} > +unsigned rte_socket_id(void); > > /** > * Get the ID of the physical socket of the specified lcore > diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map > b/lib/librte_eal/linuxapp/eal/rte_eal_version.map > index 17515a9..d83524d 100644 > --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map > +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map > @@ -10,7 +10,6 @@ DPDK_2.0 { > pci_driver_list; > per_lcore__lcore_id; > per_lcore__rte_errno; > - per_lcore__socket_id; > rte_cpu_check_supported; > rte_cpu_get_flag_enabled; > rte_cycles_vmware_tsc_map; > @@ -82,6 +81,7 @@ DPDK_2.0 { > rte_set_log_level; > rte_set_log_type; > rte_snprintf; > + rte_socket_id; > rte_strerror; > rte_strsplit; > rte_sys_gettid; > -- > 2.1.0 Acked-by: Cunming Liang
[dpdk-dev] ixgbe vector mode not working.
Hi Stephen, The root cause is about the rx descriptor number. As we use below code to quick process the rx_tail wrap, it require rxd value is a 2^n. "rxq->rx_tail = (uint16_t)(rxq->rx_tail & (rxq->nb_rx_desc - 1));" We should add more checking on the input rxd, if checking fail, then tend to use scalar pmd. Thanks for the report, I'll send fix patch soon. -Cunming > -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Thursday, February 26, 2015 9:07 AM > To: Liang, Cunming > Cc: Nemeth, Balazs; Richardson, Bruce; Neil Horman; dev at dpdk.org > Subject: Re: ixgbe vector mode not working. > > On Wed, 25 Feb 2015 08:49:48 + > "Liang, Cunming" wrote: > > > Hi Stephen, > > > > Thanks for the info, with rxd=4000, I can reproduce it. > > On that time, it runs out of mbuf. > > I'll follow up this issue. > > The first time I ran it, the code was configure rx/tx conf > which was leftover from older versions. > > Second time I ran it and the same hang happened. > Looking at mbuf pool statistics I see that it gets exhausted, > even when extra mbuf's are added to the pool. > > Looks like a memory leak.
[dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
> -Original Message- > From: Ouyang, Changchun > Sent: Wednesday, December 24, 2014 1:23 PM > To: dev at dpdk.org > Cc: Liang, Cunming; Cao, Waterman; Ouyang, Changchun > Subject: [PATCH v3 5/6] ixgbe: Config VF RSS > > It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable VF RSS. > > The psrtype will determine how many queues the received packets will > distribute > to, > and the value of psrtype should depends on both facet: max VF rxq number > which > has been negotiated with PF, and the number of rxq specified in config on > guest. > > Signed-off-by: Changchun Ouyang > --- > lib/librte_pmd_ixgbe/ixgbe_pf.c | 15 +++ > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 92 > ++- > 2 files changed, 97 insertions(+), 10 deletions(-) > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c b/lib/librte_pmd_ixgbe/ixgbe_pf.c > index cbb0145..9c9dad8 100644 > --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c > +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c > @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev > *eth_dev) > IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw->mac.num_rar_entries), > 0); > IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw->mac.num_rar_entries), > 0); > > + /* > + * VF RSS can support at most 4 queues for each VF, even if > + * 8 queues are available for each VF, it need refine to 4 > + * queues here due to this limitation, otherwise no queue > + * will receive any packet even RSS is enabled. > + */ > + if (eth_dev->data->dev_conf.rxmode.mq_mode == > ETH_MQ_RX_VMDQ_RSS) { > + if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) { > + RTE_ETH_DEV_SRIOV(eth_dev).active = ETH_32_POOLS; > + RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4; > + RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = > + dev_num_vf(eth_dev) * 4; > + } > + } > + > /* set VMDq map to default PF pool */ > hw->mac.ops.set_vmdq(hw, 0, > RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx); > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > index f69abda..a7c17a4 100644 > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > @@ -3327,6 +3327,39 @@ ixgbe_alloc_rx_queue_mbufs(struct igb_rx_queue > *rxq) > } > > static int > +ixgbe_config_vf_rss(struct rte_eth_dev *dev) > +{ > + struct ixgbe_hw *hw; > + uint32_t mrqc; > + > + ixgbe_rss_configure(dev); > + > + hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > + > + /* MRQC: enable VF RSS */ > + mrqc = IXGBE_READ_REG(hw, IXGBE_MRQC); > + mrqc &= ~IXGBE_MRQC_MRQE_MASK; > + switch (RTE_ETH_DEV_SRIOV(dev).active) { > + case ETH_64_POOLS: > + mrqc |= IXGBE_MRQC_VMDQRSS64EN; > + break; > + > + case ETH_32_POOLS: > + case ETH_16_POOLS: > + mrqc |= IXGBE_MRQC_VMDQRSS32EN; > + break; > + > + default: > + PMD_INIT_LOG(ERR, "Invalid pool number in IOV mode"); > + return -EINVAL; > + } > + > + IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc); > + > + return 0; > +} > + > +static int > ixgbe_dev_mq_rx_configure(struct rte_eth_dev *dev) > { > struct ixgbe_hw *hw = > @@ -3358,24 +3391,38 @@ ixgbe_dev_mq_rx_configure(struct rte_eth_dev > *dev) > default: ixgbe_rss_disable(dev); > } > } else { > - switch (RTE_ETH_DEV_SRIOV(dev).active) { > /* >* SRIOV active scheme >* FIXME if support DCB/RSS together with VMDq & SRIOV >*/ > - case ETH_64_POOLS: > - IXGBE_WRITE_REG(hw, IXGBE_MRQC, > IXGBE_MRQC_VMDQEN); > + switch (dev->data->dev_conf.rxmode.mq_mode) { > + case ETH_MQ_RX_RSS: > + case ETH_MQ_RX_VMDQ_RSS: > + ixgbe_config_vf_rss(dev); > break; > > - case ETH_32_POOLS: > - IXGBE_WRITE_REG(hw, IXGBE_MRQC, > IXGBE_MRQC_VMDQRT4TCEN); > - break; > + default: > + switch (RTE_ETH_DEV_SRIOV(dev).active) { [Liang, Cunming] Just a minor comments. To avoid a switch branch inside another switch, we can have a ixgbe_config_vf_default(), which process all the things if no RSS/DCB required in multi-queue setting. Then we can put all the 'switch(SRIOV(dev).active){...}' in it.
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> -Original Message- > From: Ananyev, Konstantin > Sent: Friday, January 09, 2015 1:06 AM > To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > Hi Steve, > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming > > Sent: Tuesday, December 23, 2014 9:52 AM > > To: Stephen Hemminger; Richardson, Bruce > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > > > > > > -Original Message- > > > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > > > Sent: Tuesday, December 23, 2014 2:29 AM > > > To: Richardson, Bruce > > > Cc: Liang, Cunming; dev at dpdk.org > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > > > On Mon, 22 Dec 2014 09:46:03 + > > > Bruce Richardson wrote: > > > > > > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote: > > > > > ... > > > > > > I'm conflicted on this one. However, I think far more applications > > > > > > would > be > > > > > > broken > > > > > > to start having to use thread_id in place of an lcore_id than would > > > > > > be > > > broken > > > > > > by having the lcore_id no longer actually correspond to a core. > > > > > > I'm actually struggling to come up with a large number of scenarios > where > > > it's > > > > > > important to an app to determine the cpu it's running on, compared > > > > > > to > the > > > large > > > > > > number of cases where you need to have a data-structure per thread. > In > > > DPDK > > > > > > libs > > > > > > alone, you see this assumption that lcore_id == thread_id a large > number > > > of > > > > > > times. > > > > > > > > > > > > Despite the slight logical inconsistency, I think it's better to > > > > > > avoid > > > introducing > > > > > > a thread-id and continue having lcore_id representing a unique > > > > > > thread. > > > > > > > > > > > > /Bruce > > > > > > > > > > Ok, I understand it. > > > > > I list the implicit meaning if using lcore_id representing the unique > > > > > thread. > > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the > > > > > logical > > > core id. > > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an > unique > > > id for thread. > > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be > > > > > used > only > > > in CASE 1) > > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no > > > > > matter > > > represent a logical core id. > > > > > > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base > > > > > on this > > > conclusion. > > > > > > > > > > /Cunming > > > > > > > > Sorry, I don't like that suggestion either, as having lcore_id values > > > > greater > > > > than RTE_MAX_LCORE is terrible, as how will people know how to > dimension > > > arrays > > > > to be indexes by lcore id? Given the choice, if we are not going to > > > > just use > > > > lcore_id as a generic thread id, which is always between 0 and > > > RTE_MAX_LCORE > > > > we can look to define a new thread_id variable to hold that. However, it > should > > > > have a bounded range. > > > > From an ease-of-porting perspective, I still think that the simplest > > > > option is > to > > > > use the existing lcore_id and accept the fact that it's now a thread id > > > > rather > > > > than an actual physical lcore. Question is, is would that cause us lots > > > > of > issues > > > > in the future? > > > > > > > > /Bruce > > > > > > The current rte_lcore_id() has different meaning the thread. Your proposal > will > >
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > BTW, one more thing: while we are on it - it is probably a good time to do > something with our interrupt thread? > It is a bit strange that we can't use rte_pktmbuf_free() or > rte_spinlock_recursive_lock() from our own interrupt/alarm handlers > > Konstantin [Liang, Cunming] I'll think about it.
[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
I see. Will update soon. Thanks for all the comments. > -Original Message- > From: Richardson, Bruce > Sent: Friday, January 09, 2015 1:24 AM > To: Ananyev, Konstantin; Liang, Cunming; Stephen Hemminger > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > My opinion on this is that the lcore_id is rarely (if ever) used to find the > actual > core a thread is being run on. Instead it is used 99% of the time as a unique > array > index per thread, and therefore that we can keep that usage by just assigning > a > valid lcore_id to any extra threads created. The suggestion to get/set > affinities on > top of that seems a good one to me also. > > /Bruce > > -Original Message- > From: Ananyev, Konstantin > Sent: Thursday, January 8, 2015 5:06 PM > To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > Hi Steve, > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming > > Sent: Tuesday, December 23, 2014 9:52 AM > > To: Stephen Hemminger; Richardson, Bruce > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per > > lcore > > > > > > > > > -Original Message- > > > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > > > Sent: Tuesday, December 23, 2014 2:29 AM > > > To: Richardson, Bruce > > > Cc: Liang, Cunming; dev at dpdk.org > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per > > > lcore > > > > > > On Mon, 22 Dec 2014 09:46:03 + > > > Bruce Richardson wrote: > > > > > > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote: > > > > > ... > > > > > > I'm conflicted on this one. However, I think far more > > > > > > applications would be broken to start having to use thread_id > > > > > > in place of an lcore_id than would be > > > broken > > > > > > by having the lcore_id no longer actually correspond to a core. > > > > > > I'm actually struggling to come up with a large number of > > > > > > scenarios where > > > it's > > > > > > important to an app to determine the cpu it's running on, > > > > > > compared to the > > > large > > > > > > number of cases where you need to have a data-structure per > > > > > > thread. In > > > DPDK > > > > > > libs > > > > > > alone, you see this assumption that lcore_id == thread_id a > > > > > > large number > > > of > > > > > > times. > > > > > > > > > > > > Despite the slight logical inconsistency, I think it's better > > > > > > to avoid > > > introducing > > > > > > a thread-id and continue having lcore_id representing a unique > > > > > > thread. > > > > > > > > > > > > /Bruce > > > > > > > > > > Ok, I understand it. > > > > > I list the implicit meaning if using lcore_id representing the unique > > > > > thread. > > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents > > > > > the logical > > > core id. > > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents > > > > > an unique > > > id for thread. > > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest > > > > > to be used only > > > in CASE 1) > > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value > > > > > no matter > > > represent a logical core id. > > > > > > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 > > > > > base on this > > > conclusion. > > > > > > > > > > /Cunming > > > > > > > > Sorry, I don't like that suggestion either, as having lcore_id > > > > values greater than RTE_MAX_LCORE is terrible, as how will people > > > > know how to dimension > > > arrays > > > > to be indexes by lcore id? Given the choice, if we are not going > > > > to just use lcore_id as a generic thre
[dpdk-dev] [PATCH v6 0/8] Interrupt mode PMD
Hi Stephen, On 3/4/2015 8:52 AM, Stephen Hemminger wrote: > On Fri, 27 Feb 2015 11:38:25 +0100 > David Marchand wrote: > >> Ok, so after looking at this patchset, I would say this is the right >> direction, but still this is too limited. >> The ethdev part and the vfio eventfds part look acceptable to me. >> But thinking about it, I could just reuse a standard event library with the >> eventfds I would get from ethdev without a need for a new eal api. > I would prefer that there was just an fd and a callback. > An application should be able to use what ever event model or library it > wants. [LCM] I agree, on application perspective it is. As it's easy to get RX/TX interrupt fd, there's no limit for application to do all the things with the 3rd party event library. The improvement probably be 1) a rte_intr_vec_to_fd() API; 2) expose eal_intr_process_rxtx_interrupts() as a public API for RX/TX interrupt callback. However, it should allow to use the packet interrupt feature in case application don't choose any 3rd party event library. That's the motivation to give a very lightweight 'wait' EAL API. Sounds reasonable ? > > IMHO the existing interrupt thread model is incorrectly designed and creates > lots of opportunities for races because of that. Look at the effort it has to > use to pass the event back to link state code.