> On Jun 3, 2024, at 16:07, Martin Storsjö <mar...@martin.st> wrote: > > On Mon, 3 Jun 2024, Zhao Zhili wrote: > >> diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S >> new file mode 100644 >> index 0000000000..0a46475723 >> --- /dev/null >> +++ b/libswscale/aarch64/input.S >> @@ -0,0 +1,229 @@ >> +/* >> + * Copyright (c) 2024 Zhao Zhili <quinkbl...@foxmail.com> >> + * >> + * This file is part of FFmpeg. >> + * >> + * FFmpeg is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU Lesser General Public >> + * License as published by the Free Software Foundation; either >> + * version 2.1 of the License, or (at your option) any later version. >> + * >> + * FFmpeg is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + * Lesser General Public License for more details. >> + * >> + * You should have received a copy of the GNU Lesser General Public >> + * License along with FFmpeg; if not, write to the Free Software >> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 >> USA >> + */ >> + >> +#include "libavutil/aarch64/asm.S" >> + >> +.macro rgb24_to_yuv_load_rgb, src >> + ld3.16b { v16, v17, v18 }, [\src] >> + ushll.8h v19, v16, #0 // v19: r >> + ushll.8h v20, v17, #0 // v20: g >> + ushll.8h v21, v18, #0 // v21: b >> + ushll2.8h v22, v16, #0 // v22: r >> + ushll2.8h v23, v17, #0 // v23: g >> + ushll2.8h v24, v18, #0 // v24: b > > Don't use this nonstandard, Apple specific aarch64 syntax. This was used by > Apple tools at the start, when the proper standardized aarch64 syntax wasn't > quite settled yet, and it is still accepted. (And apparently this is still > the preferred form to disassemble things in, for apple platforms.) > > With this syntax, the assembly is rejected by GNU binutils and MSVC. > >> +function ff_rgb24ToY_neon, export=1 >> + cmp w4, #0 // check width > 0 >> + b.le 4f >> + >> + ldp w10, w11, [x5], #8 // w10: ry, w11: gy >> + dup v0.8H, w10 >> + dup v1.8H, w11 >> + ldr w12, [x5] // w12: by >> + dup v2.8H, w12 > > Don't use uppercase .8H for field layout configurations, we prefer to stick > to all lowercase here - see 184103b3105f02f1189fa0047af4269e027dfbd6. The > same goes for a number of places in this patch. > >> + add w9, w9, #1 // i++ >> + add x3, x3, #6 // src += 6 >> +3: >> + cmp w9, w5 >> + b.lt 2b >> +4: > > Incorrect indentation for the cmp/b.lt instructions here. > > > I have set up a bunch of github actions for testing aarch64 assembly - see > https://github.com/mstorsjo/ffmpeg/commits/gha-aarch64. If you have a github > account, grab a copy of this branch into your repo, add your own commits on > top, and push to your fork (and if necessary, activate running the actions), > then you should get a wide testing of your patches. > > See https://github.com/mstorsjo/FFmpeg/actions/runs/9346228714 for one > example run of these actions with your patches.
Wow, it’s very helpful. This is the action result of the updated patch: https://github.com/quink-black/FFmpeg/actions/runs/9350348848 https://ffmpeg.org/pipermail/ffmpeg-devel/2024-June/328786.html The test still failed on x86, but success on all arm64 platform and longarch. I have tried to call rgb24ToY_c and ff_rgb24ToY_avx directly and compare the results, they don't match. I’m confused. > > // Martin > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".