Why LFENCE, rather than CPUID? I guess LFENCE does not prevent out-of-order execution for non-load instructions across it.
This link has detailed information on RDTSC, RDTSCP, and CPUID: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf Sangjin On Mon, Jan 27, 2014 at 3:58 AM, didier.pallard <didier.pallard at 6wind.com> wrote: > Yes, i will add a new function that includes the lfence; > > for the performance penalty, we did not see noticable performance impact on > our full software, so we did not see any reason to use 2 functions, but it's > certainly because we make a very limited number of calls to rdtsc and it's > true that it is highly application dependant, so 2 functions are probably > better. But if using the unaccurate function, you may have some hard time > the first time you want to debug or do some precise measures, since the > measure is not always done when expected. And generally, especially when > debugging, you're not focusing at first on the function you're using to > debug... > i don't know how to do to be sure that people will be aware of the problem > and do not lose time on the same problem, i will try to add some kind of > warning in rte_rdtsc function itself. > But perhaps should it be better to use the precise version as default one > and let the optimized version with another name to be use on purpose when > accuracy is not important; By default, i think we generaly suppose a time > reading function to be accurate... > > thanks > didier > > > On 01/27/2014 10:57 AM, Thomas Monjalon wrote: >> >> 24/01/2014 12:42, Fran?ois-Fr?d?ric Ozog: >>> >>> IMHO, adding the lfence for all cases is introducing an un-necessary >>> performance penalty. >>> >>> What about adding rte_rdtsc_sync() or rte_rdtsc_serial() with the comment >>> about the rdtsc instruction behavior so that developers can choose which >>> form they want? >> >> Yes it could be a good idea in some cases. Didier, could you try to add >> such >> function ? >> >> But in some debugging cases we need to have high precision for almost all >> timestamps. Here I don't know what is the smartest solution. >> >> Thank you for commenting. Hope we'll find a good fix. > > >