Hi! Thanks, Michael, and again Richard for your quick responses.
On 2021-03-16T15:25:10+0000, Michael Matz <m...@suse.de> wrote: > On Tue, 16 Mar 2021, Thomas Schwinge wrote: > >> >>Indeed, given (Fortran) 'zzz = 1', we produce GIMPLE: >> >> >> >> gimple_assign <real_cst, zzz, 1.0e+0, NULL, NULL> >> >> >> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, as >> >>expected, the 'visit_store' callback invoked, with 'rhs' and 'arg': >> >>'<var_decl zzz>'. >> >> >> >>However, given (Fortran) 'zzz = r + r2', we produce GIMPLE: >> >> >> >> gimple_assign <plus_expr, zzz, r, r2, NULL> > > But that's pre-ssa form. After writing into SSA 'zzz' will be replaced by > an SSA name, and the actual store into 'zzz' will happen in a store > instruction. Oh, I see, and... >> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, >> >>unexpectedly, no callback at all invoked: neither 'visit_load', nor >> >>'visit_store' (nor 'visit_address', obviously). >> > >> > The variables involved are registers. You only get called on memory >> > operands. >> >> How would I have told that from the 'walk_stmt_load_store_addr_ops' >> function description? (How to improve that one "to reflect relatity"?) >> >> But 'zzz' surely is the same in 'zzz = 1' vs. 'zzz = r + r2' -- for the >> former I *do* see the 'visit_store' callback invoked, for the latter I >> don't? > > The walk_gimple functions are intended to be used on the SSA form of > gimple (i.e. the one that it is in most of the time). Yes, "most of the time", but actually not in my case: I'm doing stuff right after gimplification (before OMP lowering)... So that's the "detail" I was missing -- sorry for not mentioning that right away. :-| As 'walk_gimple_[...]' are used during gimplification, OMP lowering, supposedly they're fine to use in non-SSA form -- but evidently some of the helper functions are not. (Might there be a way to add 'gcc_checking_assert (gimple_in_ssa_p (cfun))' or similar for that? Putting that into 'walk_stmt_load_store_addr_ops' triggers a lot, as called from 'gcc/cgraphbuild.c:cgraph_node::record_stmt_references', for example.) > And in that it's > not the case that 'zzz = 1' and 'zzz = r + r2' are similar. The former > can have memory as the lhs (that includes static variables, or indirection > through pointers), the latter can not. The lhs of a binary statement is > always an SSA name. A write to an SSA name is not a store, which is why > it's not walked for walk_stmt_load_store_addr_ops. > > Maybe it helps to look at simple C examples: [...] I see, many thanks for reminding me about these items! > If you want > to capture writes into SSA names as well ([...]) > you need the per-operand callback indeed. What I actually need is loads from/uses of actual variables (and I didn't see 'walk_stmt_load_store_addr_ops' give me these). > But that depends on > what you actually want to do. This is a prototype/"initial hack" for a very simple implementation of <https://gcc.gnu.org/PR90591> "Avoid unnecessary data transfer out of OMP construct". (... to be completed and posted later.) So simple that it will easily fail (gracefully, of course), but yet is effective for a lot of real-world code: subroutine [...] [...] real([...]) xx !temporary variable, for distance calculation [...] !$acc kernels pcopyin(x, zii) reduction(+:eva) ! implicit 'copy(xx)' for scalar used inside region; established during gimplification do 100 i=0,n-1 evx=0.0d0 do 90 j=0,n-1 xx=abs(x(1,i)-x(1,j)) [...] !$acc end kernels [...] ['xx' never used here] end subroutine [...] Inside 'kernels', we'd like to automatically parallelize loops (we've basically got that working; analysis by Graphite etc.), but the problem is that given "implicit 'copy(xx)' for scalar used inside region", when Graphite later looks at the outlined 'kernels' region's function, it must assume that 'xx' is still live after the OpenACC 'kernels' construct -- and thus cannot treat it as a thread-private temporary, cannot parallelize. Now, walking each function backwards (!), I'm taking note of any variables' uses, and if I reach an 'kernels' construct, but have not seen a use of 'xx', I may then optimize 'copy(xx)' -> 'firstprivate(xx)', enabling Graphite to do its thing. (Other such optimizations may be added later.) (This is inspired by Jakub's commit 1a80d6b87d81c3f336ab199a901cf80ae349c335 "re PR tree-optimization/68128 (A huge regression in Parboil v2.5 OpenMP CUTCP test (2.5 times lower performance))".) I've now got a simple 'callback_op', which for '!is_lhs' looks at 'get_base_address ([op])', and if that 'var' is contained in the set of current candidates (initialized per containg 'bind's, which we enter first, even if walking a 'gimple_seq' backwards), removes that 'var' as a candidate for such optimization. (Plus some "details", of couse.) This seems to work fine, as far as I can tell. :-) Of course, the eventual IPA-based solution (see PR90591, PR94693, etc.) will be much better -- but we need something now. Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf