On 2023/02/06 4:16, Richard Henderson wrote:
On 2/5/23 08:44, BALATON Zoltan wrote:
On Sun, 5 Feb 2023, Richard Henderson wrote:
On 2/4/23 06:57, BALATON Zoltan wrote:
This has just bounced, I hoped to still be able to post after
moderation but now I'm resending it after subscribing to the pixman
list. Meanwhile I've found this ticket as well:
https://gitlab.freedesktop.org/pixman/pixman/-/merge_requests/71
See the rest of the message below. Looks like this is being worked
on but I'm not sure how far is it from getting resolved. Any info on
that?
Please try this:
https://gitlab.freedesktop.org/rth7680/pixman/-/tree/general
It provides a pure C version for ultimate fallback.
Unfortunately, there are no test cases for this, nor documentation.
It can share the implementation with fast_composite_src_memcpy().
fast_composite_src_memcpy() should be well-tested with the tests for
pixman_image_composite(). arm-neon does similar so we can trust
fast_composite_src_memcpy() functions as blt.
Thanks, I don't have hardware to test this but maybe Akihiko or
somebody else here cam try. Do you think pixman_fill won't have the
same problem? It seems to have at least a fast_path implementation but
I'm not sure how pixman selects these.
For fill, I think the fast_path implementation should work, so long as
it isn't disabled via environment variable. I'm not sure why that is,
and why _fast_path isn't part of _general.
The implementation of fill should be moved to pixman-general.c but the
other part of pixman-fast-path.c shouldn't be.
By isolating the non-essential fast-path code to pixman-fast-path.c, you
can disable it with the environment variable when you are not confident
with the implementation, and that may help debugging. However, if
pixman-fast-path.c has some essential code like the implementation of
fill, the utility of the environment variable will be impaired as
setting the environment variable may break things.
Indeed, the fast_path implementation of fill should be easily vectorized
by the compiler. I would expect it to be competitive with an assembly
implementation. I would expect the implementation chain design to only
be useful when multiple vector implementations are supported and
selected at runtime -- e.g. the x86 SSE2 vs SSSE3 stuff.
r~