Re: [PATCH] Optimize BB sorting in domwalk

Jeff Law Mon, 24 Jul 2017 15:15:07 -0700

On 07/24/2017 05:29 AM, Alexander Monakov wrote:
> Profiling uses of qsort in GCC reveals that a significant portion of calls
> comes from domwalk.c where child nodes in dominator tree are reordered
> according to postorder numbering.  However we know that in dominator trees
> the vast majority of nodes have 0, 2 or 3 children, and handling those
> cases separately allows to avoid qsort overhead.
> 
> I don't have trunk figures handy, but on gcc-6 release-checking tramp3d
> compilation this would eliminate about 63% of _all_ qsort calls by count,
> or about 27% by time.  Call count profile looks like:
> 
> N=0   N=1     N=2     N=3     N=4     N=5     N=6     ...     Total
> 16111   9406    228607  91870   20154   8317    3823            406971
> 
> With this patch it would look like
> 
> N=0   N=1     N=2     N=3     N=4     N=5     N=6     ...     Total
> 16111   9406    32200   34132   16966   8022    3702            149177
> 
> While at it, I've also added a clarifying comment to bb_postorder (it
> actually gives reverse postorder), simplified casts in cmp_bb_postorder
> and changed it to compute the result in a branchless fashion.
> 
> Bootstrapped and regtested on x86-64, also verified that gcc/*.o files are
> unchanged except for domwalk.o and *-checksum.o.  OK for trunk?
> 
>       * domwalk.c (cmp_bb_postorder): Simplify.
>       (sort_bbs_postorder): New function.  Use it...
>       (dom_walker::walk): ...here to optimize common cases.
Is it really beneficial to write cmp_bb_postorder without simple and
trivial to understand conditionals?  And if it is, then doesn't that
actually suggest that we should have a match.pd pattern to turn the
branchy code into a branchless sequence?


As Uli noted, we should be using std::swap.

Can you please repost ?

THanks,
Jeff

Re: [PATCH] Optimize BB sorting in domwalk

Reply via email to