------- Comment #1 from falk at debian dot org 2006-09-24 19:52 ------- For this test case:
void f(double *pds, double *pdd, unsigned long len) { while (len >= 8*sizeof(double)) { register double r1,r2,r3,r4; r1 = *pds++; r2 = *pds++; r3 = *pds++; r4 = *pds++; *pdd++ = r1; *pdd++ = r2; *pdd++ = r3; *pdd++ = r4; } } gcc starting from 4.0 produces this: .L3: fldds -16(%r26),%fr22 fldds -8(%r26),%fr23 fldds 0(%r26),%fr24 fldds 8(%r26),%fr25 ldo 32(%r26),%r26 fstds %fr22,-16(%r25) fstds %fr23,-8(%r25) fstds %fr24,0(%r25) fstds %fr25,8(%r25) b .L3 which I suspect is actually better, since it avoids dependencies between the loads. But I'm not familiar with hppa, can anybody comment? -- falk at debian dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Known to fail| |3.4.2 4.1.2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17264