Ping.
On 18 July 2011 15:00, Michael Zolotukhin <michael.v.zolotuk...@gmail.com> wrote: > Here is a summary - probably, it doesn't cover every single piece in > the patch, but I tried to describe the major changes. I hope this will > help you a bit - and of course I'll answer your further questions if > they appear. > > The changes could be logically divided into two parts (though, these > parts have something in common). > The first part is changes in target-independent part, in functions > move_by_pieces() and store_by_pieces() - mostly located in expr.c. > The second part touches ix86_expand_movmem() and ix86_expand_setmem() > - mostly located in config/i386/i386.c. > > Changes in i386.c (target-dependent part): > 1) Strategies for cases with known and unknown alignment are separated > from each other. > When alignment is known at compile time, we could generate optimized > code without libcalls. > When it's unknown, we sometimes could create runtime-checks to reach > desired alignment, but not always. > Strategies for atom and generic_32, generic_64 were chosen according > to set of experiments, strategies in other > cost models are unchanged (strategies for unknown alignment are copied > from existing strategies). > 2) unrolled_loop algorithm was modified - now it uses SSE move-modes, > if they're available. > 3) As size of data, moved in one iteration, greatly increased, and > epilogues became bigger - so some changes were needed in epilogue > generation. In some cases a special loop (not unrolled) is generated > in epilogue to avoid slow copying by bytes (changes in > expand_set_or_movmem_via_loop() and introducing of > expand_set_or_movmem_via_loop_with_iter() is made for these cases). > 4) As bigger alignment might be needed than previously, prologue > generation was also modified. > > Changes in expr.c (target-independent part): > There are two possible strategies now: use of aligned and unaligned > moves. For each of them a cost model was implemented and the choice is > made according to the cost of each option. Move-mode choice is made by > functions widest_mode_for_unaligned_mov() and > widest_mode_for_aligned_mov(). > Cost estimation is implemented in functions compute_aligned_cost() and > compute_unaligned_cost(). > Choice between these two strategies and the generation of moves > themselves are in function move_by_pieces(). > > Function store_by_pieces() calls set_by_pieces_1() instead of > store_by_pieces_1(), if this is memset-case (I needed to introduce > set_by_pieces_1 to separate memset-case from others - > store_by_pieces_1 is sometimes called for strcpy and some other > functions, not only for memset). > > Set_by_pieces_1() estimates costs of aligned and unaligned strategies > (as in move_by_pieces() ) and generates moves for memset. Single move > is generated via > generate_move_with_mode(). If it's called first time, a promoted value > (register, filled with one-byte value of memset argument) is generated > - later calls reuse this value. > > Changes in MD-files: > For generation of promoted values, I made some changes in > promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands > for vec_dup4si and vec_dupv2di were introduced for this too (these > expands differ from corresponding define_insns - existing define_insn > work only with registers, while new expands could process memory > operand as well). > > Some code were added to allow generation of MOVQ (with SSE-registers) > - such moves aren't usual ones, because they use only half of > xmm-register. > There was a need to generate such moves explicitly, so I added a > simple expand to sse.md. > > > On 16 July 2011 03:24, Jan Hubicka <hubi...@ucw.cz> wrote: >>> > New algorithm for move-mode selection is implemented for move_by_pieces, >>> > store_by_pieces. >>> > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in >>> > similar way, x86 cost-models parameters are slightly changed to support >>> > this. This implementation checks if array's alignment is known at compile >>> > time and chooses expanding algorithm and move-mode according to it. >> >> Can you give some sumary of changes you made? It would make it a lot easier >> to >> review if it was broken up int the generic changes (with rationaly why they >> are >> needed) and i386 backend changes that I could review then. >> >> From first pass through the patch I don't quite see the need for i.e. adding >> new move patterns when we can output all kinds of SSE moves already. Will >> look >> more into the patch to see if I can come up with useful comments. >> >> Honza >> >