this is just another implement using C1 float registers, and the patch make functions more readable. i think using C1 registers may reduce the load of general registers. gsldlc1 and gsldrc1 are similar to ldl and ldr only different with which register to use.
在2015-08-06 05:29:58,周晓勇<zhouxiaoy...@loongson.cn>写道: > Hi, > > On Tue, Aug 4, 2015 at 8:05 AM, 周晓勇 <zhouxiaoy...@loongson.cn> wrote: > > > From 71478e642fac00b12b313723ee83acdfef732fd1 Mon Sep 17 00:00:00 2001 > > From: ZhouXiaoyong <zhouxiaoy...@loongson.cn> > > Date: Tue, 4 Aug 2015 16:28:02 +0800 > > Subject: [PATCH 1/2] avcodec: loongson optimized h264pred with mmi v2 > > > > > > Signed-off-by: ZhouXiaoyong <zhouxiaoy...@loongson.cn> > > --- > > libavcodec/mips/h264pred_init_mips.c | 1 - > > libavcodec/mips/h264pred_mips.h | 7 +- > > libavcodec/mips/h264pred_mmi.c | 459 > > +++++++++++++++++------------------ > > 3 files changed, 226 insertions(+), 241 deletions(-) > > [..] > > > void ff_pred16x16_vertical_8_mmi(uint8_t *src, ptrdiff_t stride) > > { > > __asm__ volatile ( > > - "dsubu $2, %0, %1 \r\n" > > - "daddu $3, %0, $0 \r\n" > > - "ldl $4, 7($2) \r\n" > > - "ldr $4, 0($2) \r\n" > > - "ldl $5, 15($2) \r\n" > > - "ldr $5, 8($2) \r\n" > > - "dli $6, 0x10 \r\n" > > + "dli $8, 16 \r\n" > > + "gsldlc1 $f2, 7(%[srcA]) \r\n" > > + "gsldrc1 $f2, 0(%[srcA]) \r\n" > > + "gsldlc1 $f4, 15(%[srcA]) \r\n" > > + "gsldrc1 $f4, 8(%[srcA]) \r\n" > > "1: \r\n" > > - "sdl $4, 7($3) \r\n" > > - "sdr $4, 0($3) \r\n" > > - "sdl $5, 15($3) \r\n" > > - "sdr $5, 8($3) \r\n" > > - "daddu $3, %1 \r\n" > > - "daddiu $6, -1 \r\n" > > - "bnez $6, 1b \r\n" > > - ::"r"(src),"r"(stride) > > - : "$2","$3","$4","$5","$6","memory" > > + "gssdlc1 $f2, 7(%[src]) \r\n" > > + "gssdrc1 $f2, 0(%[src]) \r\n" > > + "gssdlc1 $f4, 15(%[src]) \r\n" > > + "gssdrc1 $f4, 8(%[src]) \r\n" > > + "daddu %[src], %[src], %[stride] \r\n" > > + "daddi $8, $8, -1 \r\n" > > + "bnez $8, 1b \r\n" > > + : [src]"+&r"(src) > > + : [stride]"r"(stride),[srcA]"r"(src-stride) > > + : "$8","$f2","$f4" > > ); > > } > > > So... I'm confused. You're replacing one type of optimizations with > another. What happened? Was the old optimization bad? Was it for an old cpu > type and is yours for a newer one? Something else? > > Ronald > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel