I found and fixed another problem in the latest memcpy/memest changes - with this fix all the failing tests mentioned in #51134 started passing. Bootstraps are also ok. Though I still see fails in 32-bit make check, so probably, it'd be better to revert the changes till these fails are fixed.
On 21 November 2011 20:36, Michael Zolotukhin <michael.v.zolotuk...@gmail.com> wrote: > Hi, > > Continuing investigation of fails on bootstrap I found next problem > (besides the problem with unknown alignment described above): there is > a mess with size_needed and epilogue_size_needed when we generate > epilogue loop which also use SSE-moves, but no unrolled - that's > probably the reason of the fails we saw. > > Please check the attached patch - though the full testing isn't over > yet. bootstraps seem to be ok as well as arrayarg.f90-test (with > sse_loop enabled). > > On 19 November 2011 05:38, Jan Hubicka <hubi...@ucw.cz> wrote: >>> Given that x86 memset/memcpy is still broken, I think we should revert >>> it for now. >> >> Well, looking into the code, the SSE alignment issues needs work - the >> alignment test merely tests whether some alignmnet is known not whether 16 >> byte >> alignment is known that is the cause of failures in 32bit bootstrap. I >> originally >> convinced myself that this is safe since we soot for unaligned load/stores >> anyway. >> >> >> I've commited the following patch that disabled SSE codegen and unbreaks atom >> bootstrap. This seems more sensible to me given that the patch cumulated >> some >> good improvements on the non-SSE path as well and we could return into the >> SSE >> alignment issues incremntally. There is still falure in the fortran testcase >> that I am convinced is previously latent issue. >> >> I will be offline tomorrow. If there are futher serious problems, just fell >> free to revert the changes and we could look into them for next stage1. >> >> Honza >> >> * i386.c (atom_cost): Disable SSE loop until alignment issues are >> fixed. >> Index: i386.c >> =================================================================== >> --- i386.c (revision 181479) >> +++ i386.c (working copy) >> @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = { >> /* stringop_algs for memcpy. >> SSE loops works best on Atom, but fall back into non-SSE unrolled loop >> variant >> if that fails. */ >> - {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* >> Known alignment. */ >> - {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}}, >> - {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* >> Unknown alignment. */ >> - {libcall, {{2048, sse_loop}, {2048, unrolled_loop}, >> + {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. >> */ >> + {libcall, {{4096, unrolled_loop}, {-1, libcall}}}}, >> + {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown >> alignment. */ >> + {libcall, {{2048, unrolled_loop}, >> {-1, libcall}}}}}, >> >> /* stringop_algs for memset. */ >> - {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* >> Known alignment. */ >> - {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}}, >> - {{libcall, {{1024, sse_loop}, {1024, unrolled_loop}, /* Unknown >> alignment. */ >> + {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. >> */ >> + {libcall, {{4096, unrolled_loop}, {-1, libcall}}}}, >> + {{libcall, {{1024, unrolled_loop}, /* Unknown alignment. */ >> {-1, libcall}}}, >> - {libcall, {{2048, sse_loop}, {2048, unrolled_loop}, >> + {libcall, {{2048, unrolled_loop}, >> {-1, libcall}}}}}, >> 1, /* scalar_stmt_cost. */ >> 1, /* scalar load_cost. */ > > > > -- > --- > Best regards, > Michael V. Zolotukhin, > Software Engineer > Intel Corporation. -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.