At 2019-12-04 16:51:52, "Paul B Mahol" <one...@gmail.com> wrote: >On 12/4/19, Song, Ruiling <ruiling.s...@intel.com> wrote: >>> -----Original Message----- >>> From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of >>> chen
>>> >> At 2019-12-03 15:52:07, xuju...@sjtu.edu.cn wrote: >>> >> >From: Xu Jun <xuju...@sjtu.edu.cn> >>> >[...] >>> >> >+ >>> >> >+ cvtdq2ps m4, m4 >>> >> >+ mulps m4, m0 ; sum *= rdiv >>> >> >+ addps m4, m1 ; sum += bias >>> >> >>> >> >+ addps m4, m5 ; sum += 0.5 >>> >> I don't know how about precision mismatch if we pre-compute (bias+0.5) >>> >>> >I think it is hard to prove it is safe to do pre-compute. >>> Agree, I also worried precision issue since float operator is execute >>> order >>> dependent. >>> How about ROUNDPS? >> Seems no exactly match. Funny, I guess it is other issue, such as mistake on instruction's imm field. >>> >> >+ cvttps2dq m4, m4 >>> >> >+ packssdw m4, m4 >>> >> >+ packuswb m4, m4 >>> >> >+ movss [dstq + dst_offq], m4 >>> >> >+ add c_offq, mmsize/4 >>> >> >+ add dst_offq, mmsize/4 >>> >> >+ >>> >> >+ add off16q, mmsize/4 >>> >> >+ cmp off16q, widthq >>> >> >+ jl .loop16 >>> >> >+ >>> >> >+ add widthq, rq >>> >> >+ cmp off16q, widthq >>> >> >+ jge .paraend >>> >> >+ >>> >> >>> >> >+ .loopr: >>> >> no idea about this loop, if we can read beyond, we can reuse above >>> >> SIMD >>> >> code >>> >Reuse above SIMD code may write to the memory that does not belong to >>> this slice-thread. >>> >>> >IMO, the code to handle remainder columns is still necessary. >>> >>> >>> Depends on algorithm & size, >>> For example width=23 >>> Process #0 [0:15] >>> Process #1 [7:22] >>> Both of them is multiple of 16 >> Sounds interesting. But FFmpeg does not do like this now. >> One question is will this get a penalty for writing to same address of >> memory (both are writing to 7-15) from different threads? > >Yes, and even bad results may happen. > This is my problem, I don't speak clean, the "Process #x" is one step of loops, I guess the function must be atomic, we can place any threading that work on same address area. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".