https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155
--- Comment #38 from prathamesh3492 at gcc dot gnu.org --- Hi, The issue can be reproduced exactly, with pr77445-2.c. I am testing with making is_digit() noinline. * Reordering SINK before PRE SPEC2006 data for building SPEC2006 with sink before pre: Number of statements sunk: +2677 (~ +14%) Number of total PRE insertions: -3971 (~ -1%) On the private embedded benchmark suite, there's overall no significant difference. Not sure if this is much helpful. Is there a way to get info about number of registers spilled from lra dump or assembly ? I would like to see the effect on spills by reordering passes. Reordering sink before pre seems to regress no-scevccp-outer-22.c and ssa-dom-thread-7.c, and several SVE tests on aarch64: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/262002-sink-pre/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt Also there seems to be some interplay with hoisting and forwprop. Disabling forwprop3 and forwprop4 seems to eliminate the spill too. However as Bin pointed out on the list, forwprop is also helping to reduce register pressure for this case by mem_ref folding (forward_propagate_addr_expr). * Jump threading cost models It seems jump-threading pass increases the size for this case from 38 to 79 blocks. Wondering if that adds up to "resource hog", eventually leading to extra spill ? Disabling jump threading pass eliminates the spill. I looked a bit into fine tuning jump threading cost models for cortex-m7. Strangely, setting max-jump-thread-duplication-stmts to 20 and fsm-scale-path-stmts to 3 not only removes the spill but also results in 9 more hoistings! I am investigating why this resulted in improved performance. However it regresses ssa-dom-thread-7.c: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/262539-jump-thread-cost-models/aarch64-none-elf/diff-gcc-rh60-aarch64-none-elf-default-default-default.txt * Stop-gap measure for hoisting ? As a stop-gap measure, would it make sense to "localize" hoisting within "large" loop (based on loop->num_nodes?) by refusing to hoist expressions computed outside loop ? My assumption is that hoisting will increase live range of expression which was previously computed in a block outside loop but is brought inside the loop due to hoisting since we'd now need to consider path along the loop as well for estimating it's live-range ? I suppose a cheap way to test that would be to check if block's post-dominator also lies within the same loop since it would ensure all paths from block to EXIT would lie inside the loop ? I created a patch for this (http://people.linaro.org/~prathamesh.kulkarni/pdom.diff), which works to remove the spill but regressed pr77445-2.c (which is how I stumbled on that test). Although the underlying issue doesn't seem particularly relevant to hoisting, so not sure if this "heuristic" makes much sense. * Live range shrinking pass There was some discussion about an inter-block live-range shrinking GIMPLE pass on the list (https://gcc.gnu.org/ml/gcc/2018-05/msg00260.html), which will run just before expand. I would be grateful for suggestions on how to get started with it. I realize this'd be pretty hard, but would like to give a try. Thanks, Prathamesh