On Sat, May 20, 2023 at 1:12 AM Rémi Denis-Courmont wrote:
> > +lit4, 0
> > +lit2, 0
> > +addia5, t3, 1
> > +sllit3, a2, 2
> > +.LBB0_3:# if (xy != 0)
> > +adda4, a1, t4
> > +vsetvlizero, a5, e8, m1, ta, ma
> > +a
Optimize the put and avg filtering for 8x8 chroma blocks
Signed-off-by: Arnie Chang
---
V3:
1. Use a macro to extract repetitive segments
2. Fix coding style issues
3. Use macros in riscv/asm.S to handle function declarations
4. Replace vslidedown with vslide1down
checkasm: using random seed
Optimize the put and avg filtering for 8x8 chroma blocks
Signed-off-by: Arnie Chang
---
v4:
Assembly portion:
1. Fix issues raised during the code review
2. Initialize vxrm to ensure the rounding mode is as expected
Non-asm:
1. Put the function decarlations to h264_chroma_init_riscv.c
checkasm
Optimize the put and avg filtering for 8x8 chroma blocks
Signed-off-by: Arnie Chang
---
v5:
Fix the mulw issue addressed during the v4 review
checkasm: using random seed 1900907821
RVVi32:
- h264chroma.chroma_mc [OK]
checkasm: all 2 tests passed
avg_h264_chroma_mc1_8_c: 1821.5
I think the patch has resolved all the issues raised during the code review.
If there are no further considerations,
may I inquire about who could assist me in pushing the patch?
On Thu, May 25, 2023 at 8:33 PM Arnie Chang wrote:
> Optimize the put and avg filtering for 8x8 chroma blo
Optimize the put and avg filtering for 4xH and 2xH blocks
Signed-off-by: Arnie Chang
---
checkasm: using random seed 3475799765
RVVi32:
- h264chroma.chroma_mc [OK]
checkasm: all 6 tests passed
avg_h264_chroma_mc1_8_c: 1821.5
avg_h264_chroma_mc1_8_rvv_i32: 466.5
avg_h264_chroma_mc2_8_c: 939.2
On Sat, Jun 10, 2023 at 10:55 PM Lynne wrote:
> Why do they all have the same timing?
>
The processing procedure for these workloads is the same,
except for the difference in block width. (8xH, 4xH, 2xH)
So, the number of instructions remains constant.
Since these workloads handle a small amount
On Mon, Jun 12, 2023 at 10:59 PM Rémi Denis-Courmont
wrote:
> It would seem more simpler and more intuitive to just use `.if` here.
> (Ditto
> below.)
>
hi,
Do you mean using .if to modify this line of code?
+vsetivlit3, \width, e8, m1, ta, mu
On Wed, Jun 14, 2023 at 11:57 PM Rémi Denis-Courmont
wrote:
> It looks like \width is only ever used as AVL. You could advantageously
> pass
> it as a run-time argument to an internal function, and spare the
> instruction
> cache, instead of instantiating otherwise identical code thrice.
>
Since
Optimize the put and avg filtering for 4xH and 2xH blocks
Signed-off-by: Arnie Chang
---
V2:
1. Change the \width to an run time argument
2. Call to an internal function instead of instantiating similar code three
times
RVVi32:
- h264chroma.chroma_mc [OK]
checkasm: all 6 tests passed
It appears that all the issues raised during the review have been fixed,
and there have been no additional comments for over 1 month.
Could I kindly request assistance in pushing the patch?
On Mon, Jun 19, 2023 at 9:06 PM Arnie Chang wrote:
> Optimize the put and avg filtering for 4xH and
code in the configure file
Patch2: optimize chroma motion compensation
Patch3: optimize luma motion compensation
Patch4: optimize dsp functions, such as IDCT, in-loop filtering, and weighed
filtering
Patch5: optimize intra prediction
Arnie Chang (5):
configure: Add detection of RISC-V vector
Check whether the toolchain has support for RISC-V intrinsic and then
update the flag, HAVE_INTRINSICS_RVV, in the config.h
Signed-off-by: Arnie Chang
---
configure | 2 ++
1 file changed, 2 insertions(+)
diff --git a/configure b/configure
index bb7be67676..883bee1e34 100755
--- a/configure
Optimize chroma motion compensation using RISC-V vector intrinsics,
resulting in an average 13% FPS improvement on 720P videos.
Signed-off-by: Arnie Chang
---
libavcodec/h264chroma.c | 2 +
libavcodec/h264chroma.h | 1 +
libavcodec/riscv/Makefile
of 1.49x.
Signed-off-by: Arnie Chang
---
libavcodec/h264dsp.c | 2 +
libavcodec/h264dsp.h | 3 +-
libavcodec/riscv/Makefile | 4 +
libavcodec/riscv/h264_dsp_init_riscv.c | 68 +++
libavcodec/riscv/h264_idct.c | 482
Optimize intra prediction using RISC-V vector intrinsics.
Although the intra prediction in the decoder is not a computational hotspot,
the FPS has further improved by 1% after vectorizing this part, as measured on
720P videos.
Signed-off-by: Arnie Chang
---
libavcodec/h264pred.c
decoder on
RISC-V can work on the assembly and decide whether to refer to intrinsic
code.
I believe this would be a good starting point for future optimization.
On Wed, May 10, 2023 at 12:51 AM Rémi Denis-Courmont
wrote:
> Hi,
>
> Le tiistaina 9. toukokuuta 2023, 12.50.25 EEST Arni
Optimize the put and avg filtering for 8x8 chroma blocks
Signed-off-by: Arnie Chang
---
libavcodec/h264chroma.c | 2 +
libavcodec/h264chroma.h | 1 +
libavcodec/riscv/Makefile | 3 +
libavcodec/riscv/h264_chroma_init_riscv.c | 39
Optimize the put and avg filtering for 8x8 chroma blocks
Signed-off-by: Arnie Chang
---
libavcodec/h264chroma.c | 2 +
libavcodec/h264chroma.h | 1 +
libavcodec/riscv/Makefile | 3 +
libavcodec/riscv/h264_chroma_init_riscv.c | 39
On Wed, May 17, 2023 at 10:54 PM Lynne wrote:
>
> Finally, run:
> make checkasm && ./tests/checkasm/checkasm --bench
> and report on the timings for both the C and assembly versions.
> If you've made a mistake somewhere, (forgot to restore stack, or a
> callee-saved register,
> or your function p
20 matches
Mail list logo