> On Jul 24, 2018, at 6:31 AM, Van Haaren, Harry <harry.van.haa...@intel.com> > wrote: > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Morten Brørup >> Sent: Tuesday, July 24, 2018 9:14 AM >> To: Olivier Matz <olivier.m...@6wind.com>; Wiles, Keith >> <keith.wi...@intel.com> >> Cc: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; dev@dpdk.org >> Subject: Re: [dpdk-dev] rte_mbuf library likely()/unlikely() >> >>> -----Original Message----- >>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz >>> Sent: Tuesday, July 24, 2018 9:29 AM >>> >>> Hi, >>> >>> On Mon, Jul 23, 2018 at 10:40:03PM +0000, Wiles, Keith wrote: >>>> >>>> >>>>> On Jul 23, 2018, at 2:09 PM, Morten Brørup >>> <m...@smartsharesystems.com> wrote: >>>>> >>>>> I haven't performance tested, but they are compiler branch >>> prediction hints pointing out the most likely execution path, so I >>> expect them to have a positive effect. >>>> >>>> We really need to make sure this provides any performance improvement >>> and that means it needs to be tested on a number of systems. Can you >>> please do some performance testing or see if we can get the guys doing >>> DPDK performance testing to first give this a try? This area is very >>> sensitive to tweaking. >>> >>> I agree we should be driven by performance improvements. >> >> Which is why I suggested these changes. Theoretically, they will provide a >> performance improvement. The most likely execution path is obvious from code >> review. > > >>> I remember a discussion with Bruce on the ML saying that hardware >>> branch predictors generally do a good job. >> >> They do, and it is very well documented. E.g. here's a really interesting >> historical review about branch predictors: >> https://danluu.com/branch-prediction/ >> >> However, just because hardware branch predictors are pretty good, I don't >> think we should remove or stop adding likely()/unlikely() and other branch >> prediction hints. The hints still add value, both for execution speed and >> for source code readability. > > > Although "likely" and "unlikely" sound like they predict branch-taken > or branch-not-taken, in practice the UNLIKELY() macro expands to > __builtin_expect(). > > The compiler uses __builtin_expect() to understand what code is expected to > run most of the time. In *practice* what this means is that the code that is > not expected to run gets moved "far away" in the binary itself. The compiler > can layout code to be better with the "static branch prediction" methods (see > below). > > Think of UNLIKELY() not as helping a runtime branch predictor, it doesn't. > Think of UNLIKELY() as a method to move error-handling instructions to cold > instruction-cache-lines. > > To put it another way, using UNLIKELY() puts the unlikely branch code far > away, > and there's a chance we're going to pay an i-cache miss for that. > > As a summary: > UNLIKELY() for *error handling* is good, it moves cold instructions out of > hot i-cache > UNLIKELY() on anything that is valid to happen (e.g. ring dequeue == 0) is > not advised, > as we could cause i-cache misses on the data path. > > >> Please also refer to the other link I provided about GCC branches. It >> basically says that GCC treats an If-sentence like this: >> >> If (Condition) Then >> Expect to execute this >> Else >> Do not expect to execute this >> >> So if we don't want unlikely() around an if-condition which probably >> evaluates to false, we should rewrite the execution order accordingly. > > In the case of error handling, this is correct - and the compiler will handle > it. > If both taken and not-taken occur at runtime, let the runtime branch > predictor handle it. > > >> Although hardware branch predictors help a lot most of the time, the >> likely()/unlikely() still helps the first time the CPU executes the branch >> instruction. > > There is a static branch prediction scheme that predicts branches that are > executing for the first time. It is detailed in the Software Optimization > Manual, > Section 3.4 "Optimizing the front end", Page 103 "Static Branch Prediction > Algorithm": > > https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf > > UNLIKELY() tells GCC which layout to use - so it does help somewhat, but the > price of a single i-cache miss is much higher than multiple branch-misses... > > >> Furthermore, I'm very well aware of the rule of thumb for adding >> likely()/unlikely(): Don't add one if it isn't correct almost every time the >> branch is considered. > > I would word this stronger: if both taken and not-taken are *valid*, don't > use UNLIKELY(). > Use UNLIKELY() for error handling cases only. > > >> How much more compiler branch prediction hints adds to hardware compiler >> branch prediction is a somewhat academic discussion. But Honnappa and Keith >> are right: Performance improvements should be performance tested. > > +1 that ultimately it comes down to measured numbers, on a variety of > platforms.. > > ..but micro-benchmarks are not enough either. What if the whole micro-bench > fits into > i-cache, it looks like there is no difference in perf. Then apply these > UNLIKELY() > changes to a real world app, and the i-cache misses may become much more > expensive. > > >> Unfortunately, I don't have the equipment or resources to perform a usable >> performance test, so I submitted the changes to the mailing list for code >> review instead. And I'm certainly getting code reviewed now. :-) >> >> From a code review perspective, someone else than me might observe that the >> exception handling execution path is "missing" the unlikely() hint, so I >> would argue that code readability is an argument for adding it - unless >> performance testing shows a slowdown. >> >> >> Med venlig hilsen / kind regards >> - Morten Brørup > > Unless you see a significant branch-miss bottleneck somewhere in the code, > my opinion is that we are trying to micro-optimize code, when there are > probably > larger problems to solve.
+1 Regards, Keith