On Wed, Mar 17, 2021 at 12:25 AM Thomas Schwinge <tho...@codesourcery.com> wrote: > > Hi! > > Thanks, Michael, and again Richard for your quick responses. > > On 2021-03-16T15:25:10+0000, Michael Matz <m...@suse.de> wrote: > > On Tue, 16 Mar 2021, Thomas Schwinge wrote: > > > >> >>Indeed, given (Fortran) 'zzz = 1', we produce GIMPLE: > >> >> > >> >> gimple_assign <real_cst, zzz, 1.0e+0, NULL, NULL> > >> >> > >> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, as > >> >>expected, the 'visit_store' callback invoked, with 'rhs' and 'arg': > >> >>'<var_decl zzz>'. > >> >> > >> >>However, given (Fortran) 'zzz = r + r2', we produce GIMPLE: > >> >> > >> >> gimple_assign <plus_expr, zzz, r, r2, NULL> > > > > But that's pre-ssa form. After writing into SSA 'zzz' will be replaced by > > an SSA name, and the actual store into 'zzz' will happen in a store > > instruction. > > Oh, I see, and... > > >> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, > >> >>unexpectedly, no callback at all invoked: neither 'visit_load', nor > >> >>'visit_store' (nor 'visit_address', obviously). > >> > > >> > The variables involved are registers. You only get called on memory > >> > operands. > >> > >> How would I have told that from the 'walk_stmt_load_store_addr_ops' > >> function description? (How to improve that one "to reflect relatity"?) > >> > >> But 'zzz' surely is the same in 'zzz = 1' vs. 'zzz = r + r2' -- for the > >> former I *do* see the 'visit_store' callback invoked, for the latter I > >> don't? > > > > The walk_gimple functions are intended to be used on the SSA form of > > gimple (i.e. the one that it is in most of the time). > > Yes, "most of the time", but actually not in my case: I'm doing stuff > right after gimplification (before OMP lowering)... So that's the > "detail" I was missing -- sorry for not mentioning that right away. :-| > > As 'walk_gimple_[...]' are used during gimplification, OMP lowering, > supposedly they're fine to use in non-SSA form -- but evidently some of > the helper functions are not. (Might there be a way to add > 'gcc_checking_assert (gimple_in_ssa_p (cfun))' or similar for that? > Putting that into 'walk_stmt_load_store_addr_ops' triggers a lot, as > called from 'gcc/cgraphbuild.c:cgraph_node::record_stmt_references', for > example.) > > > And in that it's > > not the case that 'zzz = 1' and 'zzz = r + r2' are similar. The former > > can have memory as the lhs (that includes static variables, or indirection > > through pointers), the latter can not. The lhs of a binary statement is > > always an SSA name. A write to an SSA name is not a store, which is why > > it's not walked for walk_stmt_load_store_addr_ops. > > > > Maybe it helps to look at simple C examples: [...] > > I see, many thanks for reminding me about these items! > > > If you want > > to capture writes into SSA names as well ([...]) > > you need the per-operand callback indeed. > > What I actually need is loads from/uses of actual variables (and I didn't > see 'walk_stmt_load_store_addr_ops' give me these). > > > But that depends on > > what you actually want to do. > > This is a prototype/"initial hack" for a very simple implementation of > <https://gcc.gnu.org/PR90591> "Avoid unnecessary data transfer out of OMP > construct". (... to be completed and posted later.) So simple that it > will easily fail (gracefully, of course), but yet is effective for a lot > of real-world code: > > subroutine [...] > [...] > real([...]) xx !temporary variable, for distance calculation > [...] > !$acc kernels pcopyin(x, zii) reduction(+:eva) ! implicit 'copy(xx)' for > scalar used inside region; established during gimplification > do 100 i=0,n-1 > evx=0.0d0 > do 90 j=0,n-1 > xx=abs(x(1,i)-x(1,j)) > [...] > !$acc end kernels > [...] > ['xx' never used here] > end subroutine [...] > > Inside 'kernels', we'd like to automatically parallelize loops (we've > basically got that working; analysis by Graphite etc.), but the problem > is that given "implicit 'copy(xx)' for scalar used inside region", when > Graphite later looks at the outlined 'kernels' region's function, it must > assume that 'xx' is still live after the OpenACC 'kernels' construct -- > and thus cannot treat it as a thread-private temporary, cannot > parallelize. > > Now, walking each function backwards (!), I'm taking note of any > variables' uses, and if I reach an 'kernels' construct, but have not seen > a use of 'xx', I may then optimize 'copy(xx)' -> 'firstprivate(xx)', > enabling Graphite to do its thing. (Other such optimizations may be > added later.) (This is inspired by Jakub's commit > 1a80d6b87d81c3f336ab199a901cf80ae349c335 "re PR tree-optimization/68128 > (A huge regression in Parboil v2.5 OpenMP CUTCP test (2.5 times lower > performance))".) > > I've now got a simple 'callback_op', which for '!is_lhs' looks at > 'get_base_address ([op])', and if that 'var' is contained in the set of > current candidates (initialized per containg 'bind's, which we enter > first, even if walking a 'gimple_seq' backwards), removes that 'var' as a > candidate for such optimization. (Plus some "details", of couse.) This > seems to work fine, as far as I can tell. :-)
It might then still fail for x = a[var] when you are interested in 'var'. I think you want to use walk_gimple_stmt and provide walk_tree_fn which will recurse into the complex tree operands (also making get_base_address unnecessary). > > Of course, the eventual IPA-based solution (see PR90591, PR94693, etc.) > will be much better -- but we need something now. > > > Grüße > Thomas > ----------------- > Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank > Thürauf