On 6/27/2016 8:53 AM, Rostislav Pehlivanov wrote: > I've attached another patch which should work fine now. > I did this after the put_signed_rect so it does require the first patch, > but if this patch is okay I'll amend and tidy things before I push. > For some reason changing dstq to be stored at r4 or r3 broke it and I've no > idea why. Neither is used after loading m2 and m3. Should work on x86_32 > now, but I'm wondering why I can't save that register.
[...] > diff --git a/libavcodec/x86/diracdsp.asm b/libavcodec/x86/diracdsp.asm > index c5cc530..4bc8b2d 100644 > --- a/libavcodec/x86/diracdsp.asm > +++ b/libavcodec/x86/diracdsp.asm > @@ -266,9 +266,45 @@ HPEL_FILTER sse2 > ADD_OBMC 32, sse2 > ADD_OBMC 16, sse2 > > -%if ARCH_X86_64 == 1 > INIT_XMM sse4 > > +; void dequant_subband_32(uint8_t *src, uint8_t *dst, ptrdiff_t stride, > const int qf, const int qs, int tot_v, int tot_h) > +cglobal dequant_subband_32, 7, 8, 4, src, dst, stride, qf, qs, tot_v, tot_h x86_32 has 8 gprs but you can only use 7 as the last one is reserved to keep the stack pointer. > + > + movd m2, qfd > + movd m3, qsd > + SPLATD m2 > + SPLATD m3 > + mov r4, tot_hq > + mov r7, dstq > + > + .loop_v: > + mov tot_hq, r4 > + mov dstq, r7 > + > + .loop_h: > + movu m0, [srcq] > + > + pabsd m1, m0 > + pmulld m1, m2 > + paddd m1, m3 > + psrld m1, 2 > + psignd m1, m0 > + > + movu [dstq], m1 > + > + add srcq, mmsize > + add dstq, mmsize > + sub tot_hd, 4 > + jg .loop_h > + > + add r7, strideq > + dec tot_vd > + jg .loop_v > + > + RET I'm not sure why you say using r3 instead of r7 here didn't work for you. I just tried it (after applying all patches up to 6/10) and fate at least still passes, on both x86_64 and x86_32. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel