Hi! On 2022-10-26T20:27:19-0600, Sandra Loosemore <san...@codesourcery.com> wrote: > On 10/20/22 08:07, Jakub Jelinek wrote: >> Thus, IMHO it is exactly the pass_omp_simd_clone pass where you want to >> implement this auto-simdization discovery, guarded with >> #ifdef ACCEL_COMPILER and the new option (which means it will be done >> only for gcn and not on the host right now). > > I'm running into a practical difficulty with making this controlled by a > static #ifdef: namely, testing. > > One of my test cases examines the .s output to make sure that the clones > are emitted as local symbols and not global. I have not been able to > find the symbol linkage information in any of the dump files
Hmm, also some of '-fdump-ipa-all-details' doesn't help here? > and I have > also not been able to figure out how to get a .s file from the offload > compiler even outside of the DejaGnu test harness. (It's possible I am > just an extreme dummy about the latter problem, but so far none of my > colleagues here has been able to give me a recipe either.) Right, currently only 'scan-offload-tree-dump[...]', 'scan-offload-rtl-dump[...]' are implemented; I assume 'scan-offload-assembler[...]' could be added without too much effort. > On top of that, I worry that this should be tested more broadly than for > the one target we're presently focusing on (AMD GCN), and we'll get much > more regular test coverage if it's also enabled for x86_64 target which > has the necessary compute_vecsize_and_simdlen target hook. > > I remember Carlos O'Donnell used to have a favorite mantra, "design for > test". Heh, I don't remember him ever saying that to me -- but maybe that's because this is what I do anyway. ;-P > So, maybe generalize the new -fopenmp-target-simd-clone option > to take a parameter to force clones to be generated on the OpenMP host > for test purposes? The "declare target" directive already has a clause > > device_type(host|nohost|any) > > that defaults to "any"; maybe we could use that syntax like > -fopenmp-target-simd-clone=any > and use the intersection of the two sets to determine what to > auto-generate clones for? Seems reasonable to me (but I'm missing a lot of context here). There anyway is a goal (far out) to get rid of compilation-time '#ifdef ACCEL_COMPILER' etc., and instead make such code dependent on a command-line flag (or some other state), so that it's possible to use the the same compiler for target (host) as well as offload target compilation. (For example, to simulate offloading compilation with standard x86_64-pc-linux-gnu GCC.) And/or, where you implement the logic to "make sure that the clones are emitted as local symbols and not global", do emit some "tag" in the dump file, and the scan for that? Random examples that I just remembered: 'gcc/omp-offload.cc:execute_oacc_loop_designation' handling of 'OMP_CLAUSE_NOHOST', and how that's scanned (host-side) in test cases such as 'libgomp/testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c', 'libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90'. 'gcc/config/nvptx/nvptx.cc:nvptx_find_sese' doing 'fprintf (dump_file, "SESE regions:"); [...]', and that's scanned in: libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c-/* Match {N->N(.N)+} */ libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c:/* { dg-final { scan-offload-rtl-dump "SESE regions:.* \[0-9\]+{\[0-9\]+->\[0-9\]+(\\.\[0-9\]+)+}" "mach" } } */ (You'd be doing this at the 'scan-offload-tree-dump[...]' level, I suppose.) Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955