Hi Julian!

Three quick questions here:

On 2020-06-29T13:16:52-0700, Julian Brown <jul...@codesourcery.com> wrote:
> This patch implements an algorithm to lay out local data-share (LDS) space.  
> It currently works for AMD GCN.  [...]

Thanks!


> @@ -1449,8 +1634,120 @@ oacc_do_neutering (void)

> +      addr_range ar
> +     = first_fit_range (conflicts, size, align, &worker_shm_bounds);
> +
> +      splay_tree_delete (conflicts);
> +
> +      if (ar.invalid ())
> +     {
> +       unsigned HOST_WIDE_INT base;
> +       base = bounds_lo + random () % 512;
> +       base = (base + align - 1) & ~(align - 1);
> +       if (base + size > bounds_hi)
> +         error_at (UNKNOWN_LOCATION, "shared-memory region overflow");

My dice doesn't have 512 faces -- what am I to read into the expression
'random () % 512' here?  (Surely this must offend the folks of
<https://reproducible-builds.org/>.)  ;-)

> +       auto base_inrng = std::make_pair (base, false);
> +       blk_offset_map.put (BASIC_BLOCK_FOR_FN (cfun, blkno), base_inrng);
> +     }
> +      else
> +     {
> +       splay_tree_node old = splay_tree_lookup (used_ranges[blkno],
> +                                                (splay_tree_key) &ar);
> +       if (old)
> +         {
> +           fprintf (stderr, "trying to map [%d..%d] but [%d..%d] is "
> +                    "already mapped in block %d\n", (int) ar.lo,
> +                    (int) ar.hi, (int) ((addr_range *) old->key)->lo,
> +                    (int) ((addr_range *) old->key)->hi, blkno);
> +           abort ();
> +         }

Here, in a moment of inattention, I wondered if accidentally I had slid
into libgomp code; 'libgomp/target.c:gomp_map_vars_existing' has a very
similar "is already mapped" 'gomp_fatal'.  ;-) Should the whole
'if (old) { [...] }' just turn into 'gcc_asert (!old);' or should we
preserve the 'fprintf' in some way?


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c

> +  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
> +  {
> +    int w = 0;
> +    LOCALS2(h);
> +
> +    #pragma acc loop worker reduction(+:w)
> +    for (int i = 0; i < 32; i++)
> +      {
> +     int u = USES2(h,+);
> +     w += u;
> +      }
> +
> +    printf ("w=%d\n", w);
> +    /* { dg-output "w=2048(\n|\r\n|\r)" } */

Is there a reason for device-side 'printf' plus 'dg-output' matching
instead of 'assert (w == 2048);' or 'if (w != 2048) __builtin_abort ();'
(still device-side)?  (Same for the following instances.)


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to