On 2018-07-19 17:26, Rostislav Pehlivanov wrote: > On 19 July 2018 at 15:52, James Darnley <jdarn...@obe.tv> wrote: > >> int32_t *b1, int32_t *b2, int >> b1[i] = COMPOSE_DIRAC53iH0(b0[i], b1[i], b2[i]); >> } >> >> +static void dd97_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t *b2, >> + int32_t *b3, int32_t *b4, int width) >> +{ >> + int i = width & ~3; >> + ff_dd97_vertical_hi_sse2(b0, b1, b2, b3, b4, i); >> + for(; i<width; i++) >> + b2[i] = COMPOSE_DD97iH0(b0[i], b1[i], b2[i], b3[i], b4[i]); >> + >> +} >> > > > This, along with the rest of the patchset: what's up with the hybrid > implementations? Couldn't you put the second part in the asm code as well? > Now there are 2 function calls instead of 1.
The 8-bit code does this and I just followed it lead. I believe this is done because we cannot write junk data beyond what we think is the end of the line because this might be one of the higher depths and the coeffs for the next level sit beyond the end of the line. But now it has just occurred to me that maybe you meant "why didn't you do the scalar operations in SIMD?", is that what you meant? Answer is because it didn't occur to me at the time. Aside from that I always write do-while loops in assembly because I can usually guarantee 1 run of the block. I can certainly look at making that change. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel