rv40dsp: cover more cases

Ronald S. Bultje Thu, 05 Dec 2024 04:45:24 -0800

Hi,

Christophe asked me to chime in.


On Wed, Dec 4, 2024 at 4:14 AM <uk7b-at-foxmail....@ffmpeg.org> wrote:

> --- a/tests/checkasm/rv40dsp.c
> +++ b/tests/checkasm/rv40dsp.c
> @@ -27,7 +27,7 @@
>  #define randomize_buffers()                  \
>      do {                                     \
>          for (int i = 0; i < 16*18*2; i++)    \
> -            src[i] = rnd() & 0x3;            \
> +            src[i] = rnd() & 0xff;           \
>      } while (0)
>
>  static void check_chroma_mc(void)
>

This is correct.


> @@ -47,8 +47,8 @@ static void check_chroma_mc(void)
>  #define CHECK_CHROMA_MC(name)
>                          \
>          do {
>                         \
>              if (check_func(h.name## <http://h.name#%23>
> _pixels_tab[size], #name "_mc%d", 1 << (3 - size))) {         \
> -                for (int x = 0; x < 2; x++) {
>                          \
> -                    for (int y = 0; y < 2; y++) {
>                          \
> +                for (int x = 0; x < 8; x++) {
>                          \
> +                    for (int y = 0; y < 8; y++) {
>                          \
>                          memcpy(dst0, src, 16 * 18);
>                          \
>                          memcpy(dst1, src, 16 * 18);
>                          \
>                          call_ref(dst0, src, 16, 16, x, y);
>                         \
> --
> 2.47.1
>

This is theoretically correct, but it's very inefficient. If you look at
the x86 mc8 mmx, for example, you'll notice that there's 3 codepaths: one
for x==0&&y==0 (no subpel), one for 1D filtering (mx!=0 ^ my!=0) and one
for 2D filtering. It can also be implemented branchless (I believe that's
what mc4 does). But it's highly unusual to have special codepaths for mx=3
vs. mx=4, for example.

So the optimal way to test this is to keep the original loop, but add the
following:
for (int x = 0, mx = 0; x < 2; x++, mx = 1 + (rnd() % 7)) {
    for (int y = 0, my = 0; y < 2; y++, my = 1 + (rnd() % 7)) {
        [..]
        cal_ref(dst0, src, 16, 16, mx, my);
        [..]
    }
}

This limits the number of tests to 4, while still covering all cases over
multiple checkasm runs, which is the correct way to test this.

Ronald
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 1/2] checkasm/rv40dsp: cover more cases

Reply via email to