Here is a summary - probably, it doesn't cover every single piece in the patch, but I tried to describe the major changes. I hope this will help you a bit - and of course I'll answer your further questions if they appear.
The changes could be logically divided into two parts (though, these parts have something in common). The first part is changes in target-independent part, in functions move_by_pieces() and store_by_pieces() - mostly located in expr.c. The second part touches ix86_expand_movmem() and ix86_expand_setmem() - mostly located in config/i386/i386.c. Changes in i386.c (target-dependent part): 1) Strategies for cases with known and unknown alignment are separated from each other. When alignment is known at compile time, we could generate optimized code without libcalls. When it's unknown, we sometimes could create runtime-checks to reach desired alignment, but not always. Strategies for atom and generic_32, generic_64 were chosen according to set of experiments, strategies in other cost models are unchanged (strategies for unknown alignment are copied from existing strategies). 2) unrolled_loop algorithm was modified - now it uses SSE move-modes, if they're available. 3) As size of data, moved in one iteration, greatly increased, and epilogues became bigger - so some changes were needed in epilogue generation. In some cases a special loop (not unrolled) is generated in epilogue to avoid slow copying by bytes (changes in expand_set_or_movmem_via_loop() and introducing of expand_set_or_movmem_via_loop_with_iter() is made for these cases). 4) As bigger alignment might be needed than previously, prologue generation was also modified. Changes in expr.c (target-independent part): There are two possible strategies now: use of aligned and unaligned moves. For each of them a cost model was implemented and the choice is made according to the cost of each option. Move-mode choice is made by functions widest_mode_for_unaligned_mov() and widest_mode_for_aligned_mov(). Cost estimation is implemented in functions compute_aligned_cost() and compute_unaligned_cost(). Choice between these two strategies and the generation of moves themselves are in function move_by_pieces(). Function store_by_pieces() calls set_by_pieces_1() instead of store_by_pieces_1(), if this is memset-case (I needed to introduce set_by_pieces_1 to separate memset-case from others - store_by_pieces_1 is sometimes called for strcpy and some other functions, not only for memset). Set_by_pieces_1() estimates costs of aligned and unaligned strategies (as in move_by_pieces() ) and generates moves for memset. Single move is generated via generate_move_with_mode(). If it's called first time, a promoted value (register, filled with one-byte value of memset argument) is generated - later calls reuse this value. Changes in MD-files: For generation of promoted values, I made some changes in promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands for vec_dup4si and vec_dupv2di were introduced for this too (these expands differ from corresponding define_insns - existing define_insn work only with registers, while new expands could process memory operand as well). Some code were added to allow generation of MOVQ (with SSE-registers) - such moves aren't usual ones, because they use only half of xmm-register. There was a need to generate such moves explicitly, so I added a simple expand to sse.md. On 16 July 2011 03:24, Jan Hubicka <hubi...@ucw.cz> wrote: >> > New algorithm for move-mode selection is implemented for move_by_pieces, >> > store_by_pieces. >> > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in >> > similar way, x86 cost-models parameters are slightly changed to support >> > this. This implementation checks if array's alignment is known at compile >> > time and chooses expanding algorithm and move-mode according to it. > > Can you give some sumary of changes you made? It would make it a lot easier > to > review if it was broken up int the generic changes (with rationaly why they > are > needed) and i386 backend changes that I could review then. > > From first pass through the patch I don't quite see the need for i.e. adding > new move patterns when we can output all kinds of SSE moves already. Will > look > more into the patch to see if I can come up with useful comments. > > Honza >