Hi Richard:

> I do think that local variable layout probably doesn't belong in that IPA pass
> but elsewhere (way earlier).  But my main complaint was that the diff
> doesn't show changes you made to the pass because it first and foremost
> shows moving all the code.  That makes reviewing the change impossible.
>
> So at least split the patch into one moving the pass to a separate file and
> one doing the actual changes.

Oh, I got your point, I've added few code after moving, but the diff is
impossible to show which part is new added, I'll send v2 patch soon.

> static unsigned int
> align_local_variable (tree decl, bool really_expand)
> {
>   unsigned int align;
>
>   if (TREE_CODE (decl) == SSA_NAME)
>     align = TYPE_ALIGN (TREE_TYPE (decl));
>   else
>     {
>       align = LOCAL_DECL_ALIGNMENT (decl);
>       /* Don't change DECL_ALIGN when called from estimated_stack_frame_size.
>          That is done before IPA and could bump alignment based on host
>          backend even for offloaded code which wants different
>          LOCAL_DECL_ALIGNMENT.  */
>       if (really_expand)
>         SET_DECL_ALIGN (decl, align);
>
> here we should assert that we never lower a decls alignment since earlier
> optimization already made use of the alignment of a decl (which also means
> doing layout as far as alignment is concerned earlier will improve
> generated code).
>
> The documentation of LOCAL_DECL_ALIGNMENT says the purpose is
> to increase alignment.  Only x86 implements LOCAL_DECL_ALIGNMENT,
> others just LOCAL_ALIGNMENT (x86 also implements that).  The difference
> must be quite subtle ...  Oh, and LOCAL_ALIGNMENT seems completely
> unused (besides providing the default implementation for 
> LOCAL_DECL_ALIGNMENT).
> Huh.

BTW, here is the full story for ARM and RISC-V, why they need this patch:

- simplify_builtin_call call can_store_by_pieces check it's OK to
store by pieces?
- can_store_by_pieces call targetm.use_by_pieces_infrastructure_p to
ask back-end
- targetm.use_by_pieces_infrastructure_p call by_pieces_ninsns to
calculate how many instruction needed, it was 1 for RV64*, because
alignment is changed to 8 bits from 64 bits for char array due to this
patch g:26d7a5e690169ac04acde90070b0092c41b71c7e.
- by_pieces_ninsns got 4 instead of 1, and large than MOVE_RATIO,
failed to simplify memset to load/store.

Thanks.

Reply via email to