This patch adds runtime thread count detection to auto-parallelization. -ftree-parallelize-loops=0 option generates parallelized loops without specifying a fixed thread count, deferring this decision to program execution time where it is controlled by the OMP_NUM_THREADS environment variable.
The patch changes: 1. Flag semantics: - Default (-1): auto-parallelization disabled. - 0: runtime thread detection via OMP_NUM_THREADS. - N>1: fixed thread count (no change to previous behavior.) 2. Gate condition: allow pass execution for flag == 0 || flag > 1. 3. OpenMP builtin enablement: enable for flag >= 0 instead of > 1. 4. Thread count handling: when flag == 0, set n_threads=0 and omit num_threads clause, letting OpenMP runtime determine thread count. 5. Profitability checks: bypass thread-count-dependent checks when n_threads=0. 6. Driver integration: automatically link libgomp and enable pthread support when -ftree-parallelize-loops=0 is used. Bootstrap and regression tested on aarch64-linux. Compiled SPEC HPC pot3d https://www.spec.org/hpc2021/docs/benchmarks/628.pot3d_s.html with -ftree-parallelize-loops=0 and tested without having OMP_NUM_THREADS set in the environment and with OMP_NUM_THREADS set to different values. gcc/ChangeLog: * builtins.def (DEF_GOMP_BUILTIN): Enable OpenMP builtins for flag_tree_parallelize_loops >= 0. * common.opt (ftree-parallelize-loops): Change initial value to -1. * gcc/doc/invoke.texi(ftree-parallelize-loops=n): Document possible values for variable n. * gcc.cc (LINK_SPEC): Add automatic libgomp linking for -ftree-parallelize-loops=0. (GOMP_SELF_SPECS): Add automatic pthread linking for -ftree-parallelize-loops=0. * tree-parloops.cc (create_parallel_loop): Generate a "#pragma omp parallel" without num_threads(x) clause when n_threads is zero. (gen_parallel_loop): Use a conservative value of 2 for the auto- parallelization cost model in case it is a runtime check. (parallelize_loops): Handle flag_tree_parallelize_loops == 0 as n_threads = 0. (gate): Execute the pass when flag_tree_parallelize_loops >= 0. gcc/testsuite/ChangeLog: * gcc.dg/autopar/runtime-threads-1.c: New test. Signed-off-by: Sebastian Pop s...@nvidia.com<mailto:s...@nvidia.com>
0001-tree-parloops-Enable-runtime-thread-detection-with-f-2.patch
Description: 0001-tree-parloops-Enable-runtime-thread-detection-with-f-2.patch