This patch adds runtime thread count detection to auto-parallelization.
-ftree-parallelize-loops=0 option generates parallelized loops without
specifying a fixed thread count, deferring this decision to program execution
time where it is controlled by the OMP_NUM_THREADS environment variable.

The patch changes:

1. Flag semantics:
   - Default (-1): auto-parallelization disabled.
   - 0: runtime thread detection via OMP_NUM_THREADS.
   - N>1: fixed thread count (no change to previous behavior.)

2. Gate condition: allow pass execution for flag == 0 || flag > 1.

3. OpenMP builtin enablement: enable for flag >= 0 instead of > 1.

4. Thread count handling: when flag == 0, set n_threads=0 and omit
   num_threads clause, letting OpenMP runtime determine thread count.

5. Profitability checks: bypass thread-count-dependent checks when n_threads=0.

6. Driver integration: automatically link libgomp and enable pthread
   support when -ftree-parallelize-loops=0 is used.

Bootstrap and regression tested on aarch64-linux.  Compiled SPEC HPC pot3d
https://www.spec.org/hpc2021/docs/benchmarks/628.pot3d_s.html with
-ftree-parallelize-loops=0 and tested without having OMP_NUM_THREADS set in the
environment and with OMP_NUM_THREADS set to different values.

gcc/ChangeLog:

        * builtins.def (DEF_GOMP_BUILTIN): Enable OpenMP builtins for
        flag_tree_parallelize_loops >= 0.
        * common.opt (ftree-parallelize-loops): Change initial value to -1.
        * gcc/doc/invoke.texi(ftree-parallelize-loops=n): Document possible
        values for variable n.
        * gcc.cc (LINK_SPEC): Add automatic libgomp linking for
        -ftree-parallelize-loops=0.
        (GOMP_SELF_SPECS): Add automatic pthread linking for
        -ftree-parallelize-loops=0.
        * tree-parloops.cc (create_parallel_loop): Generate a "#pragma omp
        parallel" without num_threads(x) clause when n_threads is zero.
        (gen_parallel_loop): Use a conservative value of 2 for the auto-
        parallelization cost model in case it is a runtime check.
        (parallelize_loops): Handle flag_tree_parallelize_loops == 0 as
        n_threads = 0.
        (gate): Execute the pass when flag_tree_parallelize_loops >= 0.

gcc/testsuite/ChangeLog:

        * gcc.dg/autopar/runtime-threads-1.c: New test.

Signed-off-by: Sebastian Pop s...@nvidia.com<mailto:s...@nvidia.com>

Attachment: 0001-tree-parloops-Enable-runtime-thread-detection-with-f-2.patch
Description: 0001-tree-parloops-Enable-runtime-thread-detection-with-f-2.patch

Reply via email to