Hi, Christophe asked me to chime in.
On Wed, Dec 4, 2024 at 4:14 AM <uk7b-at-foxmail....@ffmpeg.org> wrote: > --- a/tests/checkasm/rv40dsp.c > +++ b/tests/checkasm/rv40dsp.c > @@ -27,7 +27,7 @@ > #define randomize_buffers() \ > do { \ > for (int i = 0; i < 16*18*2; i++) \ > - src[i] = rnd() & 0x3; \ > + src[i] = rnd() & 0xff; \ > } while (0) > > static void check_chroma_mc(void) > This is correct. > @@ -47,8 +47,8 @@ static void check_chroma_mc(void) > #define CHECK_CHROMA_MC(name) > \ > do { > \ > if (check_func(h.name## <http://h.name#%23> > _pixels_tab[size], #name "_mc%d", 1 << (3 - size))) { \ > - for (int x = 0; x < 2; x++) { > \ > - for (int y = 0; y < 2; y++) { > \ > + for (int x = 0; x < 8; x++) { > \ > + for (int y = 0; y < 8; y++) { > \ > memcpy(dst0, src, 16 * 18); > \ > memcpy(dst1, src, 16 * 18); > \ > call_ref(dst0, src, 16, 16, x, y); > \ > -- > 2.47.1 > This is theoretically correct, but it's very inefficient. If you look at the x86 mc8 mmx, for example, you'll notice that there's 3 codepaths: one for x==0&&y==0 (no subpel), one for 1D filtering (mx!=0 ^ my!=0) and one for 2D filtering. It can also be implemented branchless (I believe that's what mc4 does). But it's highly unusual to have special codepaths for mx=3 vs. mx=4, for example. So the optimal way to test this is to keep the original loop, but add the following: for (int x = 0, mx = 0; x < 2; x++, mx = 1 + (rnd() % 7)) { for (int y = 0, my = 0; y < 2; y++, my = 1 + (rnd() % 7)) { [..] cal_ref(dst0, src, 16, 16, mx, my); [..] } } This limits the number of tests to 4, while still covering all cases over multiple checkasm runs, which is the correct way to test this. Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".