On 1/6/25 6:03 AM, Maciej W. Rozycki wrote:
For the BWX case we have a pessimisation in `alpha_expand_block_move'
for HImode loads where we place the data loaded into a HImode register
as well, therefore losing information that indeed the data loaded has
already been zero-extended to the full DImode width of the register.
Later on when we store this data in QImode quantities into an unaligned
destination, we zero-extend it again for the purpose of right-shifting,
such as with the test case included producing code at `-O2' as follows:

        ldah $2,unaligned_src_hi($29)           !gprelhigh
        lda $1,unaligned_src_hi($2)             !gprellow
        ldwu $6,unaligned_src_hi($2)            !gprellow
        ldwu $5,2($1)
        ldwu $4,4($1)
        bis $31,$31,$31
        zapnot $6,3,$3                          # Redundant!
        ldbu $7,6($1)
        zapnot $5,3,$2                          # Redundant!
        stb $6,0($16)
        zapnot $4,3,$1                          # Redundant!
        stb $5,2($16)
        srl $3,8,$3
        stb $4,4($16)
        srl $2,8,$2
        stb $3,1($16)
        srl $1,8,$1
        stb $2,3($16)
        stb $1,5($16)
        stb $7,6($16)

The non-BWX case is unaffected, because there we use byte insertion, so
we don't care that data is held in a HImode register.

Address this by making the holding RTX a HImode subreg of the original
DImode register, which the RTL passes can then see through and eliminate
the zero-extension where otherwise required, resulting in this shortened
code:

        ldah $2,unaligned_src_hi($29)           !gprelhigh
        lda $1,unaligned_src_hi($2)             !gprellow
        ldwu $4,unaligned_src_hi($2)            !gprellow
        ldwu $3,2($1)
        ldwu $2,4($1)
        bis $31,$31,$31
        srl $4,8,$6
        ldbu $1,6($1)
        srl $3,8,$5
        stb $4,0($16)
        stb $6,1($16)
        srl $2,8,$4
        stb $3,2($16)
        stb $5,3($16)
        stb $2,4($16)
        stb $4,5($16)
        stb $1,6($16)

While at it reformat the enclosing do-while statement according to the
GNU Coding Standards, observing that in this case it does not obfuscate
the change owing to the odd original indentation.

        gcc/
        * config/alpha/alpha.cc (alpha_expand_block_move): Use a HImode
        subreg of a DImode register to hold data from an aligned HImode
        load.
Generally OK.


+           data_regs[nregs++] = gen_rtx_SUBREG (HImode, tmp, 0);
Would this be better as a gen_lowpart or something like that rather than generating the SUBREG directly? Though I guess you don't have to worry about big endian at all, so i may not matter in practice. Your call.

jeff

Reply via email to