Re: [PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

Richard Sandiford Wed, 30 Oct 2024 08:02:00 -0700

Evgeny Karpov <evgeny.kar...@microsoft.com> writes:
> Tuesday, October 29, 2024
> Richard Sandiford <richard.sandif...@arm.com> wrote:
>
>> Hmm, I see.  I think this is surprising enough that it would be worth
>> a comment.  How about:
>>
>>  /* Since the assembly directive only specifies a size, and not an
>>     alignment, we need to follow the default ASM_OUTPUT_LOCAL behavior
>>     and round the size up to at least a multiple of BIGGEST_ALIGNMENT bits,
>>     so that each uninitialized object starts on such a boundary.
>>     However, we also want to allow the alignment (and thus minimum size)
>>     to exceed BIGGEST_ALIGNMENT.  */
>
> Thanks for the suggestion. It will be included in the next version of the 
> patch.
>
>> But how does using a larger size force the linker to assign a larger
>> alignment than BIGGEST_ALIGNMENT?  Is there a second limit in play?
>> 
>> Or does this patch not guarantee that the ffmpeg variable gets the
>> alignment it wants?  Is it just about suppresing the error?
>> 
>> If it's just about suppressing the error without guaranteeing the
>> requested alignment, then, yeah, I think patching ffmpeg would
>> be better.  If the patch does guarantee the alignment, then the
>> patch seems ok, but I think the comment should explain how, and
>> explain why BIGGEST_ALIGNMENT isn't larger.
>
> It looks like it generates the expected assembly code for the alignments
> and the correct object file, and it should be the expected code for FFmpeg.
>
> The alignment cannot be larger than 8192, otherwise, it will generate an 
> error.
>
> error: requested alignment ‘16384’ exceeds object file maximum 8192
>    16 | float __attribute__((aligned (1 << 14))) large_aligned_array10[3];


OK, thanks.  But...

> Here an example:
>
> float large_aligned_array[3];
> float __attribute__((aligned (8))) large_aligned_array2[3];
> float __attribute__((aligned (16))) large_aligned_array3[3];
> float __attribute__((aligned (32))) large_aligned_array4[3];
> float __attribute__((aligned (64))) large_aligned_array5[3];
> float __attribute__((aligned (128))) large_aligned_array6[3];
> float __attribute__((aligned (256))) large_aligned_array7[3];
> float __attribute__((aligned (512))) large_aligned_array8[3];
> float __attribute__((aligned (1024))) large_aligned_array9[3];
>
>
>       .align  3
>       .def    large_aligned_array;    .scl    3;      .type   0;      .endef
> large_aligned_array:
>       .space  12  // skip

Is this the construct used by the patch?  The original patch was:

+#define ASM_OUTPUT_ALIGNED_LOCAL(FILE, NAME, SIZE, ALIGNMENT)  \
+  { \
+    unsigned HOST_WIDE_INT rounded = MAX ((SIZE), 1); \
+    unsigned HOST_WIDE_INT alignment = MAX ((ALIGNMENT), BIGGEST_ALIGNMENT); \
+    rounded += (alignment / BITS_PER_UNIT) - 1; \
+    rounded = (rounded / (alignment / BITS_PER_UNIT) \
+      * (alignment / BITS_PER_UNIT)); \
+    ASM_OUTPUT_LOCAL (FILE, NAME, SIZE, rounded); \
+  }

with patch 5 defining ASM_OUTPUT_LOCAL as:

+#define ASM_OUTPUT_LOCAL(FILE, NAME, SIZE, ROUNDED)  \
+( fputs (".lcomm ", (FILE)),                   \
+  assemble_name ((FILE), (NAME)),              \
+  fprintf ((FILE), ",%u\n", (int)(ROUNDED)))

So for the change in question, I was expecting, say, a 1024-byte-aligned
float[3] to be defined using:

        .lcomm  array, 1024

If we have access to .align, couldn't we define ASM_OUTPUT_ALIGNED_LOCAL
to use that, using the style you quoted above?  Or is the .lcomm approach
needed to work with -fcommon?

Thanks,
Richard

>       .global large_aligned_array2
>       .align  3
>       .def    large_aligned_array2;   .scl    3;      .type   0;      .endef
> large_aligned_array2:
>       .space  12  // skip
>
>       .global large_aligned_array3
>       .align  4
>       .def    large_aligned_array3;   .scl    3;      .type   0;      .endef
> large_aligned_array3:
>       .space  12  // skip

>
>       .global large_aligned_array4
>       .align  5
>       .def    large_aligned_array4;   .scl    3;      .type   0;      .endef
> large_aligned_array4:
>       .space  12  // skip
>
>       .global large_aligned_array5
>       .align  6
>       .def    large_aligned_array5;   .scl    3;      .type   0;      .endef
> large_aligned_array5:
>       .space  12  // skip
>
>       .global large_aligned_array6
>       .align  7
>       .def    large_aligned_array6;   .scl    3;      .type   0;      .endef
> large_aligned_array6:
>       .space  12  // skip
>
>       .global large_aligned_array7
>       .align  8
>       .def    large_aligned_array7;   .scl    3;      .type   0;      .endef
> large_aligned_array7:
>       .space  12  // skip
>
>       .global large_aligned_array8
>       .align  9
>       .def    large_aligned_array8;   .scl    3;      .type   0;      .endef
> large_aligned_array8:
>       .space  12  // skip
>
>       .global large_aligned_array9
>       .align  10
>     .def      large_aligned_array9;   .scl    3;      .type   0;      .endef
> large_aligned_array9:
>       .space  12  // skip
>
>
> Symbols in the object file also look good.
>
> 015 00000000 SECT2  notype       External     | large_aligned_array
> 016 00000010 SECT2  notype       External     | large_aligned_array2
> 017 00000020 SECT2  notype       External     | large_aligned_array3
> 018 00000040 SECT2  notype       External     | large_aligned_array4
> 019 00000080 SECT2  notype       External     | large_aligned_array5
> 01A 00000100 SECT2  notype       External     | large_aligned_array6
> 01B 00000200 SECT2  notype       External     | large_aligned_array7
> 01C 00000400 SECT2  notype       External     | large_aligned_array8
> 01D 00000800 SECT2  notype       External     | large_aligned_array9

Re: [PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

Reply via email to