Re: Use of vector instructions in memmov/memset expanding

Michael Zolotukhin Mon, 22 Aug 2011 01:43:59 -0700

Ping.


On 18 July 2011 15:00, Michael Zolotukhin
<michael.v.zolotuk...@gmail.com> wrote:
> Here is a summary - probably, it doesn't cover every single piece in
> the patch, but I tried to describe the major changes. I hope this will
> help you a bit - and of course I'll answer your further questions if
> they appear.
>
> The changes could be logically divided into two parts (though, these
> parts have something in common).
> The first part is changes in target-independent part, in functions
> move_by_pieces() and store_by_pieces() - mostly located in expr.c.
> The second part touches ix86_expand_movmem() and ix86_expand_setmem()
> - mostly located in config/i386/i386.c.
>
> Changes in i386.c (target-dependent part):
> 1) Strategies for cases with known and unknown alignment are separated
> from each other.
> When alignment is known at compile time, we could generate optimized
> code without libcalls.
> When it's unknown, we sometimes could create runtime-checks to reach
> desired alignment, but not always.
> Strategies for atom and generic_32, generic_64 were chosen according
> to set of experiments, strategies in other
> cost models are unchanged (strategies for unknown alignment are copied
> from existing strategies).
> 2) unrolled_loop algorithm was modified - now it uses SSE move-modes,
> if they're available.
> 3) As size of data, moved in one iteration, greatly increased, and
> epilogues became bigger - so some changes were needed in epilogue
> generation. In some cases a special loop (not unrolled) is generated
> in epilogue to avoid slow copying by bytes (changes in
> expand_set_or_movmem_via_loop() and introducing of
> expand_set_or_movmem_via_loop_with_iter() is made for these cases).
> 4) As bigger alignment might be needed than previously, prologue
> generation was also modified.
>
> Changes in expr.c (target-independent part):
> There are two possible strategies now: use of aligned and unaligned
> moves. For each of them a cost model was implemented and the choice is
> made according to the cost of each option. Move-mode choice is made by
> functions widest_mode_for_unaligned_mov() and
> widest_mode_for_aligned_mov().
> Cost estimation is implemented in functions compute_aligned_cost() and
> compute_unaligned_cost().
> Choice between these two strategies and the generation of moves
> themselves are in function move_by_pieces().
>
> Function store_by_pieces() calls set_by_pieces_1() instead of
> store_by_pieces_1(), if this is memset-case (I needed to introduce
> set_by_pieces_1 to separate memset-case from others -
> store_by_pieces_1 is sometimes called for strcpy and some other
> functions, not only for memset).
>
> Set_by_pieces_1() estimates costs of aligned and unaligned strategies
> (as in move_by_pieces() ) and generates moves for memset. Single move
> is generated via
> generate_move_with_mode(). If it's called first time, a promoted value
> (register, filled with one-byte value of memset argument) is generated
> - later calls reuse this value.
>
> Changes in MD-files:
> For generation of promoted values, I made some changes in
> promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands
> for vec_dup4si and vec_dupv2di were introduced for this too (these
> expands differ from corresponding define_insns - existing define_insn
> work only with registers, while new expands could process memory
> operand as well).
>
> Some code were added to allow generation of MOVQ (with SSE-registers)
> - such moves aren't usual ones, because they use only half of
> xmm-register.
> There was a need to generate such moves explicitly, so I added a
> simple expand to sse.md.
>
>
> On 16 July 2011 03:24, Jan Hubicka <hubi...@ucw.cz> wrote:
>>> > New algorithm for move-mode selection is implemented for move_by_pieces,
>>> > store_by_pieces.
>>> > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
>>> > similar way, x86 cost-models parameters are slightly changed to support
>>> > this. This implementation checks if array's alignment is known at compile
>>> > time and chooses expanding algorithm and move-mode according to it.
>>
>> Can you give some sumary of changes you made?  It would make it a lot easier 
>> to
>> review if it was broken up int the generic changes (with rationaly why they 
>> are
>> needed) and i386 backend changes that I could review then.
>>
>> From first pass through the patch I don't quite see the need for i.e. adding
>> new move patterns when we can output all kinds of SSE moves already.  Will 
>> look
>> more into the patch to see if I can come up with useful comments.
>>
>> Honza
>>
>

Re: Use of vector instructions in memmov/memset expanding

Reply via email to