On 07/24/2017 05:29 AM, Alexander Monakov wrote: > Profiling uses of qsort in GCC reveals that a significant portion of calls > comes from domwalk.c where child nodes in dominator tree are reordered > according to postorder numbering. However we know that in dominator trees > the vast majority of nodes have 0, 2 or 3 children, and handling those > cases separately allows to avoid qsort overhead. > > I don't have trunk figures handy, but on gcc-6 release-checking tramp3d > compilation this would eliminate about 63% of _all_ qsort calls by count, > or about 27% by time. Call count profile looks like: > > N=0 N=1 N=2 N=3 N=4 N=5 N=6 ... Total > 16111 9406 228607 91870 20154 8317 3823 406971 > > With this patch it would look like > > N=0 N=1 N=2 N=3 N=4 N=5 N=6 ... Total > 16111 9406 32200 34132 16966 8022 3702 149177 > > While at it, I've also added a clarifying comment to bb_postorder (it > actually gives reverse postorder), simplified casts in cmp_bb_postorder > and changed it to compute the result in a branchless fashion. > > Bootstrapped and regtested on x86-64, also verified that gcc/*.o files are > unchanged except for domwalk.o and *-checksum.o. OK for trunk? > > * domwalk.c (cmp_bb_postorder): Simplify. > (sort_bbs_postorder): New function. Use it... > (dom_walker::walk): ...here to optimize common cases. Is it really beneficial to write cmp_bb_postorder without simple and trivial to understand conditionals? And if it is, then doesn't that actually suggest that we should have a match.pd pattern to turn the branchy code into a branchless sequence?
As Uli noted, we should be using std::swap. Can you please repost ? THanks, Jeff