Hello gcc
I have been looking at optimizations of pixel-format conversion recently and
have noticed that gcc does take advantage of SSE4a extrq, BMI1 bextr TBM
bextri or BMI2 pext instructions when it could be useful.
As far as I can tell it should not be that hard. A bextr expression can
typically be recognized as ((x >> s) & mask) or ((x << s1)) >> s2). But I am
unsure where to do such a matching since the mask needs to have specific form
to be valid for bextr, so it seems it needs to be done before instruction
selection.
Secondly the bextr instruction in itself only replace two already fast
instructions so is very minor (unless extracting variable bit-fields which is
harder recognize). The real optimization comes from being able to use pext
(parallel bit extract), which can implement several bextr expressions in
parallel.
So, where would be the right place to implement such instructions. Would it
make sense to recognize bextr early before we get to i386 code, or would it be
better to recognize it late. And where do I put such instruction selection
optimizations?
Motivating example:
unsigned rgb32_to_rgb16(unsigned rgb32) {
unsigned char red = (rgb32 >> 19) & 0x1f;
unsigned char green = (rgb32 >> 10) & 0x3f;
unsigned char blue = rgb32 & 0x1f;
return (red << 11) | (green << 5) | blue;
}
can be implemented as pext(rgb32, 0x001f3f1f)
Best regards
`Allan Sandfeld