Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-22 Thread Arnie Chang
On Sat, May 20, 2023 at 1:12 AM Rémi Denis-Courmont wrote: > > +lit4, 0 > > +lit2, 0 > > +addia5, t3, 1 > > +sllit3, a2, 2 > > +.LBB0_3:# if (xy != 0) > > +adda4, a1, t4 > > +vsetvlizero, a5, e8, m1, ta, ma > > +a

[FFmpeg-devel] [PATCH v3] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-22 Thread Arnie Chang
Optimize the put and avg filtering for 8x8 chroma blocks Signed-off-by: Arnie Chang --- V3: 1. Use a macro to extract repetitive segments 2. Fix coding style issues 3. Use macros in riscv/asm.S to handle function declarations 4. Replace vslidedown with vslide1down checkasm: using random seed

[FFmpeg-devel] [PATCH v4] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-23 Thread Arnie Chang
Optimize the put and avg filtering for 8x8 chroma blocks Signed-off-by: Arnie Chang --- v4: Assembly portion: 1. Fix issues raised during the code review 2. Initialize vxrm to ensure the rounding mode is as expected Non-asm: 1. Put the function decarlations to h264_chroma_init_riscv.c checkasm

[FFmpeg-devel] [PATCH v5] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-25 Thread Arnie Chang
Optimize the put and avg filtering for 8x8 chroma blocks Signed-off-by: Arnie Chang --- v5: Fix the mulw issue addressed during the v4 review checkasm: using random seed 1900907821 RVVi32: - h264chroma.chroma_mc [OK] checkasm: all 2 tests passed avg_h264_chroma_mc1_8_c: 1821.5

Re: [FFmpeg-devel] [PATCH v5] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-28 Thread Arnie Chang
I think the patch has resolved all the issues raised during the code review. If there are no further considerations, may I inquire about who could assist me in pushing the patch? On Thu, May 25, 2023 at 8:33 PM Arnie Chang wrote: > Optimize the put and avg filtering for 8x8 chroma blo

[FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 4xH and 2xH chroma blocks

2023-06-09 Thread Arnie Chang
Optimize the put and avg filtering for 4xH and 2xH blocks Signed-off-by: Arnie Chang --- checkasm: using random seed 3475799765 RVVi32: - h264chroma.chroma_mc [OK] checkasm: all 6 tests passed avg_h264_chroma_mc1_8_c: 1821.5 avg_h264_chroma_mc1_8_rvv_i32: 466.5 avg_h264_chroma_mc2_8_c: 939.2

Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 4xH and 2xH chroma blocks

2023-06-10 Thread Arnie Chang
On Sat, Jun 10, 2023 at 10:55 PM Lynne wrote: > Why do they all have the same timing? > The processing procedure for these workloads is the same, except for the difference in block width. (8xH, 4xH, 2xH) So, the number of instructions remains constant. Since these workloads handle a small amount

Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 4xH and 2xH chroma blocks

2023-06-12 Thread Arnie Chang
On Mon, Jun 12, 2023 at 10:59 PM Rémi Denis-Courmont wrote: > It would seem more simpler and more intuitive to just use `.if` here. > (Ditto > below.) > hi, Do you mean using .if to modify this line of code? +vsetivlit3, \width, e8, m1, ta, mu

Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 4xH and 2xH chroma blocks

2023-06-15 Thread Arnie Chang
On Wed, Jun 14, 2023 at 11:57 PM Rémi Denis-Courmont wrote: > It looks like \width is only ever used as AVL. You could advantageously > pass > it as a run-time argument to an internal function, and spare the > instruction > cache, instead of instantiating otherwise identical code thrice. > Since

[FFmpeg-devel] [PATCH v2] lavc/h264chroma: RISC-V V add motion compensation for 4xH and 2xH chroma blocks

2023-06-19 Thread Arnie Chang
Optimize the put and avg filtering for 4xH and 2xH blocks Signed-off-by: Arnie Chang --- V2: 1. Change the \width to an run time argument 2. Call to an internal function instead of instantiating similar code three times RVVi32: - h264chroma.chroma_mc [OK] checkasm: all 6 tests passed

Re: [FFmpeg-devel] [PATCH v2] lavc/h264chroma: RISC-V V add motion compensation for 4xH and 2xH chroma blocks

2023-07-24 Thread Arnie Chang
It appears that all the issues raised during the review have been fixed, and there have been no additional comments for over 1 month. Could I kindly request assistance in pushing the patch? On Mon, Jun 19, 2023 at 9:06 PM Arnie Chang wrote: > Optimize the put and avg filtering for 4xH and

[FFmpeg-devel] [PATCH 0/5] RISC-V: Improve H264 decoding performance using RVV intrinsic

2023-05-09 Thread Arnie Chang
code in the configure file Patch2: optimize chroma motion compensation Patch3: optimize luma motion compensation Patch4: optimize dsp functions, such as IDCT, in-loop filtering, and weighed filtering Patch5: optimize intra prediction Arnie Chang (5): configure: Add detection of RISC-V vector

[FFmpeg-devel] [PATCH 1/5] configure: Add detection of RISC-V vector intrinsic support

2023-05-09 Thread Arnie Chang
Check whether the toolchain has support for RISC-V intrinsic and then update the flag, HAVE_INTRINSICS_RVV, in the config.h Signed-off-by: Arnie Chang --- configure | 2 ++ 1 file changed, 2 insertions(+) diff --git a/configure b/configure index bb7be67676..883bee1e34 100755 --- a/configure

[FFmpeg-devel] [PATCH 2/5] lavc/h264chroma: Add vectorized implementation of chroma MC for RISC-V

2023-05-09 Thread Arnie Chang
Optimize chroma motion compensation using RISC-V vector intrinsics, resulting in an average 13% FPS improvement on 720P videos. Signed-off-by: Arnie Chang --- libavcodec/h264chroma.c | 2 + libavcodec/h264chroma.h | 1 + libavcodec/riscv/Makefile

[FFmpeg-devel] [PATCH 4/5] lavc/h264dsp: Add vectorized implementation of DSP functions for RISC-V

2023-05-09 Thread Arnie Chang
of 1.49x. Signed-off-by: Arnie Chang --- libavcodec/h264dsp.c | 2 + libavcodec/h264dsp.h | 3 +- libavcodec/riscv/Makefile | 4 + libavcodec/riscv/h264_dsp_init_riscv.c | 68 +++ libavcodec/riscv/h264_idct.c | 482

[FFmpeg-devel] [PATCH 5/5] lavc/h264pred: Add vectorized implementation of intra prediction for RISC-V

2023-05-09 Thread Arnie Chang
Optimize intra prediction using RISC-V vector intrinsics. Although the intra prediction in the decoder is not a computational hotspot, the FPS has further improved by 1% after vectorizing this part, as measured on 720P videos. Signed-off-by: Arnie Chang --- libavcodec/h264pred.c

Re: [FFmpeg-devel] [PATCH 0/5] RISC-V: Improve H264 decoding performance using RVV intrinsic

2023-05-10 Thread Arnie Chang
decoder on RISC-V can work on the assembly and decide whether to refer to intrinsic code. I believe this would be a good starting point for future optimization. On Wed, May 10, 2023 at 12:51 AM Rémi Denis-Courmont wrote: > Hi, > > Le tiistaina 9. toukokuuta 2023, 12.50.25 EEST Arni

[FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-17 Thread Arnie Chang
Optimize the put and avg filtering for 8x8 chroma blocks Signed-off-by: Arnie Chang --- libavcodec/h264chroma.c | 2 + libavcodec/h264chroma.h | 1 + libavcodec/riscv/Makefile | 3 + libavcodec/riscv/h264_chroma_init_riscv.c | 39

[FFmpeg-devel] [PATCH v2] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-17 Thread Arnie Chang
Optimize the put and avg filtering for 8x8 chroma blocks Signed-off-by: Arnie Chang --- libavcodec/h264chroma.c | 2 + libavcodec/h264chroma.h | 1 + libavcodec/riscv/Makefile | 3 + libavcodec/riscv/h264_chroma_init_riscv.c | 39

Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-18 Thread Arnie Chang
On Wed, May 17, 2023 at 10:54 PM Lynne wrote: > > Finally, run: > make checkasm && ./tests/checkasm/checkasm --bench > and report on the timings for both the C and assembly versions. > If you've made a mistake somewhere, (forgot to restore stack, or a > callee-saved register, > or your function p