[dpdk-dev] Is it possible to get symmetric hash from RSS for a given flow
Hi, This is exactly what you want: http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf Sangjin On Thu, Feb 27, 2014 at 4:22 PM, Daniel Kan wrote: > Hi, > It appears that the hash computed from RSS is unidirectional. Hence, for a > 5-tuple flow, packet in one direction can be routed to a queue that may be > different than packet in the other direction. I'm wondering if there is a > configuration or mechanism to get a symmetric hash so that all packets for a > given flow will be in the same queue. Thanks in advance. > > Dan
[dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay()
Hi, I encountered this error message when I tried to use the testpmd application. Cause: No probed ethernet devices - check that CONFIG_RTE_LIBRTE_IGB_PMD=y and that CONFIG_RTE_LIBRTE_EM_PMD=y and that CONFIG_RTE_LIBRTE_IXGBE_PMD=y in your configuration file which is caused by rte_eth_dev_count() == 0. However, my 82599 ports are already unbound from ixgbe. (I have two Xeon X5560 (@ 2.80GHz) processors and two X520-DA2 cards). I googled for possible causes and came across a similar case: http://openetworking.blogspot.com/2014/01/debugging-no-probed-ethernet-devices.html Based on the article, I dug into the source code, and found the cause: ixgbe_82599.c: ixgbe_reset_pipeline_82599() ... for (i = 0; i < 10; i++) { msec_delay(4); anlp1_reg = IXGBE_READ_REG(hw, IXGBE_ANLP1); if (anlp1_reg & IXGBE_ANLP1_AN_STATE_MASK) break; } if (!(anlp1_reg & IXGBE_ANLP1_AN_STATE_MASK)) { DEBUGOUT("auto negotiation not completed\n"); ret_val = IXGBE_ERR_RESET_FAILED; goto reset_pipeline_out; } ... The number of iterations (== 10) in the for loop was not enough. In my case, it needed to be at least 12, then everything worked fine. The issue was that msec_delay() is not very accurate on my system. While it reads the CPU Hz info from /proc/cpuinfo, it may not reflect the actual TSCs/sec. Since I did not disable the P-State feature , /proc/cpuinfo reports 1.6GHz, but my TSC counter is 2.8GHz. As a result, msec_delay(4) only waited 2.x milliseconds, which in turn causes the failure. I think /proc/cpuinfo is not a reliable way to get eal_tsc_resolution_hz, since it varies based on the current CPU clock frequency. Enforcing applications to run at the max frequency can be too restrictive. It would be nice if I can bypass set_tsc_freq_from_cpuinfo() in set_tsc_freq(). Thanks, Sangjin
[dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay()
Hi, >> It would be nice if I can bypass set_tsc_freq_from_cpuinfo() in >> set_tsc_freq(). > > I think it would not solve the problem because your clock is varying and the > TSC calibration must be updated accordingly with different values by core. Reasonably new Intel CPUs (including Nehalem) has a constant TSC rate, regardless of the current P/C-state (constant_tsc and nonstop_tsc flags in /proc/cpuinfo). So TSC calibration is unnecessary even with variable clock frequency on those CPUs. Also, it seems that there is no guarantee that the TSC rate is identical to the CPU max clock frequency. While it happens to be true for Intel CPUs, this article from AMD says, (https://lkml.org/lkml/2005/11/4/173) "The rate of the invariant TSC is implementation-dependent and will likely *not* be the frequency of the processor core [...]" It would be great if someone can actually measure TSC rate on AMD processors to verify this. I would like to suggest two possible options: 1. If we can assume that the TSC rate always equals to the max clock frequency, then we can use /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq instead of /proc/cpuinfo (which reflects cpuinfo_cur_freq). 2. If we can't (AMD?), we can simply get rid of set_tsc_freq_from_cpuinfo() and fall back to set_tsc_freq_from_clock() or set_tsc_freq_ballback() instead. I always get reasonably good accuracy with those two functions -- the only drawback is that it takes 0.5 - 1 second for applications to boot up. Not sure if it is a big deal or not, though. --- Besides the TSC frequency, the 4ms * 10 delay in ixgbe_reset_pipeline_82599() seems too tight. On my system, it succeeds only after 7 (or so) iterations with correct msec_delay(). The per-iteration delay (4ms; in the kernel ixgbe driver, it is set to be 4-8ms) and/or the number of iterations (10) should be increased, I suppose. Sangjin
[dpdk-dev] [PATCH] timer: add lfence before TSC read
Why LFENCE, rather than CPUID? I guess LFENCE does not prevent out-of-order execution for non-load instructions across it. This link has detailed information on RDTSC, RDTSCP, and CPUID: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf Sangjin On Mon, Jan 27, 2014 at 3:58 AM, didier.pallard wrote: > Yes, i will add a new function that includes the lfence; > > for the performance penalty, we did not see noticable performance impact on > our full software, so we did not see any reason to use 2 functions, but it's > certainly because we make a very limited number of calls to rdtsc and it's > true that it is highly application dependant, so 2 functions are probably > better. But if using the unaccurate function, you may have some hard time > the first time you want to debug or do some precise measures, since the > measure is not always done when expected. And generally, especially when > debugging, you're not focusing at first on the function you're using to > debug... > i don't know how to do to be sure that people will be aware of the problem > and do not lose time on the same problem, i will try to add some kind of > warning in rte_rdtsc function itself. > But perhaps should it be better to use the precise version as default one > and let the optimized version with another name to be use on purpose when > accuracy is not important; By default, i think we generaly suppose a time > reading function to be accurate... > > thanks > didier > > > On 01/27/2014 10:57 AM, Thomas Monjalon wrote: >> >> 24/01/2014 12:42, Fran?ois-Fr?d?ric Ozog: >>> >>> IMHO, adding the lfence for all cases is introducing an un-necessary >>> performance penalty. >>> >>> What about adding rte_rdtsc_sync() or rte_rdtsc_serial() with the comment >>> about the rdtsc instruction behavior so that developers can choose which >>> form they want? >> >> Yes it could be a good idea in some cases. Didier, could you try to add >> such >> function ? >> >> But in some debugging cases we need to have high precision for almost all >> timestamps. Here I don't know what is the smartest solution. >> >> Thank you for commenting. Hope we'll find a good fix. > > >
[dpdk-dev] QoS Question
Hi, According to "21.2.4.6.6.2", it seems that the implementation is supposed to achieve max-min fairness. In your example, the effective cap of single active pipe should be 1Mbps, given the total demand of other 1999 pipes is less than 999Mbps. Sangjin On Mon, Apr 20, 2015 at 9:40 AM Greg Smith wrote: > Hi DPDK team, > > The docs on QoS (http://dpdk.org/doc/guides/prog_guide/qos_framework.html# > ) describe the traffic class (TC) as follows: > 1 - The TCs of the same pipe handled in strict priority order. > 2 - Upper limit enforced per TC at the pipe level. > 3 - Lower priority TCs able to reuse pipe bandwidth currently unused by > higher priority TCs. > 4 - When subport TC is oversubscribed (configuration time event), pipe TC > upper limit is capped to a dynamically adjusted value that is shared by all > the subport pipes. > > Can someone describe how and when the TC upper limit is "dynamically" > changed? > > For example, assume there's a 1Gb/s port and a single 1Gb/s subport and > 2000 pipes each of 1Mb/s (total pipes = 2Gb/s which is > the 1Gb/s subport > which I think means "oversubscribed" as used in the doc). Each Pipe has a > single TC. > In that case, would each pipe be shaped to an upper limit of 0.5 Mb/s? > What if there was no traffic on 1999 pipes, would the single active pipe > still be limited to 0.5 Mb/s? > What if the number of pipes changes without restarting the OS, how does > that change the behavior? > > BTW, great docs overall, thanks for writing those up. > > Thanks, > > Greg Smith > > > >
[dpdk-dev] [PATCH] lpm: fix build error on g++ with -O0 option
When rte_lpm.h is used on x86, -O0 option (no optimization at all) given to g++ (not gcc) causes a compile error like this: error: the last argument must be an 8-bit immediate i24 = _mm_srli_si128(i24, sizeof(uint64_t)); -O0 option is useful for debugging and code coverage measurement, but this error prevents C++ programs from building. This patch replaces "sizeof(uint64_t)" with a constant literal "8" to work around the issue. Tested with g++ 5.4.1. Signed-off-by: Sangjin Han --- lib/librte_lpm/rte_lpm_sse.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h index ef33c6a..2e17df3 100644 --- a/lib/librte_lpm/rte_lpm_sse.h +++ b/lib/librte_lpm/rte_lpm_sse.h @@ -78,7 +78,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], /* extract values from tbl24[] */ idx = _mm_cvtsi128_si64(i24); - i24 = _mm_srli_si128(i24, sizeof(uint64_t)); + i24 = _mm_srli_si128(i24, 8); ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx]; tbl[0] = *ptbl; -- 2.7.4
[dpdk-dev] [PATCH v2] lpm: fix build error on g++ with -O0 option
When rte_lpm.h is used on x86, -O0 option (no optimization at all) given to gcc causes a compile error like this: error: the last argument must be an 8-bit immediate i24 = _mm_srli_si128(i24, sizeof(uint64_t)); -O0 option is useful for debugging and code coverage measurement, but this error prevents DPDK programs from building. This patch replaces "sizeof(uint64_t)" with a constant literal "8" to work around the issue. The issue occurs on gcc/g++ versions from 4.8 to 5. Signed-off-by: Sangjin Han --- v2: * Added a comment * Updated the commit message: both gcc and g++ are affected. --- lib/librte_lpm/rte_lpm_sse.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h index ef33c6a..7ab90b7 100644 --- a/lib/librte_lpm/rte_lpm_sse.h +++ b/lib/librte_lpm/rte_lpm_sse.h @@ -78,7 +78,9 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], /* extract values from tbl24[] */ idx = _mm_cvtsi128_si64(i24); - i24 = _mm_srli_si128(i24, sizeof(uint64_t)); + + /* With -O0 option, gcc 4.8 - 5.4 fails to fold sizeof() into a constant */ + i24 = _mm_srli_si128(i24, /* sizeof(uint64_t) */ 8); ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx]; tbl[0] = *ptbl; -- 2.7.4