Le 20 juin 2024 18:02:31 GMT+02:00, Zhao Zhili <quinkbl...@foxmail.com> a écrit : > > >> On Jun 20, 2024, at 20:49, Martin Storsjö <mar...@martin.st> wrote: >> >> On Thu, 20 Jun 2024, Zhao Zhili wrote: >> >>>> On Jun 19, 2024, at 20:05, Rémi Denis-Courmont <r...@remlab.net> wrote: >>>> Le 19 juin 2024 11:24:28 GMT+02:00, Zhao Zhili <quinkbl...@foxmail.com >>>> <mailto:quinkbl...@foxmail.com>> a écrit : >>>>>> On Jun 19, 2024, at 15:07, Rémi Denis-Courmont <r...@remlab.net> wrote: >>>>>> Le 15 juin 2024 11:57:18 GMT+02:00, Zhao Zhili <quinkbl...@foxmail.com> >>>>>> a écrit : >>>>>>> diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S >>>>>>> index 2b956fe5c2..37f1158504 100644 >>>>>>> --- a/libswscale/aarch64/input.S >>>>>>> +++ b/libswscale/aarch64/input.S >>>>>>> @@ -20,8 +20,12 @@ >>>>>>> #include "libavutil/aarch64/asm.S" >>>>>>> -.macro rgb_to_yuv_load_rgb src >>>>>>> +.macro rgb_to_yuv_load_rgb src, element=3 >>>>>>> + .if \element == 3 >>>>>>> ld3 { v16.16b, v17.16b, v18.16b }, [\src] >>>>>>> + .else >>>>>>> + ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [\src] >>>>>>> + .endif >>>>>>> uxtl v19.8h, v16.8b // v19: r >>>>>>> uxtl v20.8h, v17.8b // v20: g >>>>>>> uxtl v21.8h, v18.8b // v21: b >>>>>>> @@ -43,7 +47,7 @@ >>>>>>> sqshrn2 \dst\().8h, \dst2\().4s, \right_shift // >>>>>>> dst_higher_half = dst2 >> right_shift >>>>>>> .endm >>>>>>> -.macro rgbToY bgr >>>>>>> +.macro rgbToY bgr, element=3 >>>>>> AFAICT, you don't need to a macro parameter for component order. Just >>>>>> swap red and blue coefficients in the prologue and then run the >>>>>> bit-exact same loops for bgr/rgb, rgba/bgra and argb/abgr. This adds one >>>>>> branch in the prologue but that's mostly negligible compared to the loop. >>>>> I’m not sure where to add the branch. Could you elaborate? Do you mean >>>>> load coefficients first like the following: >>>>> function ff_bgr24ToUV_half_neon, export=1 >>>>> ldr w12, [x6, #12] >>>>> ldr w11, [x6, #16] >>>>> ldr w10, [x6, #20] >>>>> ldr w15, [x6, #24] >>>>> ldr w14, [x6, #28] >>>>> ldr w13, [x6, #32] >>>>> rgbToUV_half >>>>> endfunc >>>> Hmm, no. You need to jump past the loading of red and blue coefficients. >>>> It might help to load green coefficients last. >>>> By the way, I think you can use LDP instead of LDR. >>> >>> Patch v2 replace LDR by LDP, then the "jump past the loading of red and >>> blue coefficients” doesn’t apply now. >> >> Rémi's point is that you don't need to duplicate the whole function, when >> the only thing you're changing is a couple of instructions in the prologue >> of the function. By reusing the actual bulk of the function, you save on >> binary size. > >Thank you for the detailed examples. I missed the key point here is to save >binary size. > >I have seen similar example of fall through in risk/input_rvv.s. Is it well >defined to jump to a local label in another function?
Falling through is well defined so long as we don't use function-sections. Jumping to a label inside another function is well defined, as the assembler has no notion of what a function is. `func` and `endfunc` are just FFmpeg macros for defining symbols. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".