Hi! On 2021-03-17T08:46:16+0100, Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, Mar 17, 2021 at 12:25 AM Thomas Schwinge > <tho...@codesourcery.com> wrote: >> This is a prototype/"initial hack" for a very simple implementation of >> <https://gcc.gnu.org/PR90591> "Avoid unnecessary data transfer out of OMP >> construct". (... to be completed and posted later.) So simple that it >> will easily fail (gracefully, of course), but yet is effective for a lot >> of real-world code: >> >> subroutine [...] >> [...] >> real([...]) xx !temporary variable, for distance calculation >> [...] >> !$acc kernels pcopyin(x, zii) reduction(+:eva) ! implicit 'copy(xx)' for >> scalar used inside region; established during gimplification >> do 100 i=0,n-1 >> evx=0.0d0 >> do 90 j=0,n-1 >> xx=abs(x(1,i)-x(1,j)) >> [...] >> !$acc end kernels >> [...] >> ['xx' never used here] >> end subroutine [...] >> >> Inside 'kernels', we'd like to automatically parallelize loops (we've >> basically got that working; analysis by Graphite etc.), but the problem >> is that given "implicit 'copy(xx)' for scalar used inside region", when >> Graphite later looks at the outlined 'kernels' region's function, it must >> assume that 'xx' is still live after the OpenACC 'kernels' construct -- >> and thus cannot treat it as a thread-private temporary, cannot >> parallelize. >> >> Now, walking each function backwards (!), I'm taking note of any >> variables' uses, and if I reach an 'kernels' construct, but have not seen >> a use of 'xx', I may then optimize 'copy(xx)' -> 'firstprivate(xx)', >> enabling Graphite to do its thing. (Other such optimizations may be >> added later.) (This is inspired by Jakub's commit >> 1a80d6b87d81c3f336ab199a901cf80ae349c335 "re PR tree-optimization/68128 >> (A huge regression in Parboil v2.5 OpenMP CUTCP test (2.5 times lower >> performance))".) >> >> I've now got a simple 'callback_op', which for '!is_lhs' looks at >> 'get_base_address ([op])', and if that 'var' is contained in the set of >> current candidates (initialized per containg 'bind's, which we enter >> first, even if walking a 'gimple_seq' backwards), removes that 'var' as a >> candidate for such optimization. (Plus some "details", of couse.) This >> seems to work fine, as far as I can tell. :-) > > It might then still fail for x = a[var] when you are interested in 'var'.
That already works as expected: gimple_assign <array_ref, x, a[var], NULL, NULL> ..., and in the debug log ('OP' is 'get_base_address (OPERAND)'), I see: ##### LOOKING INTO OPERAND: <array_ref 0x7ffff67af3b8> ###### OP: <var_decl 0x7ffff7feeb40 a> ####### WASN'T CANDIDATE ##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feeb40 a> ###### OP: <var_decl 0x7ffff7feeb40 a> ####### WASN'T CANDIDATE ##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feebd0 var> ###### OP: <var_decl 0x7ffff7feebd0 var> ####### NO LONGER CANDIDATE ##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feec60 x> ###### IGNORED: IS_LHS Notice 'var' "NO LONGER CANDIDATE". Well, I cross-checked, and I'm getting this behavior (deep scanning) because I'm (intentionally) using (default) '*walk_subtrees = 1' in my 'callback_op'. Indeed if I specify '*walk_subtrees = 0', the debug log changes as follows: ##### LOOKING INTO OPERAND: <array_ref 0x7ffff67af3b8> ###### OP: <var_decl 0x7ffff7feeb40 a> ####### WASN'T CANDIDATE -##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feeb40 a> -###### OP: <var_decl 0x7ffff7feeb40 a> -####### WASN'T CANDIDATE -##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feebd0 var> -###### OP: <var_decl 0x7ffff7feebd0 var> -####### NO LONGER CANDIDATE ##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feec60 x> ###### IGNORED: IS_LHS ..., and your projected mis-optimization: +##### OPTIMIZING map(force_tofrom:var [len: 4]) -> firstprivate(var) If I continue to use (default) '*walk_subtrees = 1', then I might not actually need to use 'get_base_address', because we get all the individual sub operands visited. > I think you want to use walk_gimple_stmt and provide walk_tree_fn which > will recurse into the complex tree operands (also making get_base_address > unnecessary). I'm already using 'walk_gimple_stmt' via 'walk_gimple_seq', and I'll (later) look into removing 'get_base_address', and also look up 'walk_tree_fn'. Ah, wait, what you suggested with 'walk_tree_fn' is actually what I've already got; in my last email I said "I've now got a simple 'callback_op' [...]". Seems we're converging! ;-) Grüße Thomas >> Of course, the eventual IPA-based solution (see PR90591, PR94693, etc.) >> will be much better -- but we need something now. >> >> >> Grüße >> Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf