On Tue, 26 Jul 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> This is a RFC for my prototype for bitfield read vectorization. This patch
> enables bit-field read vectorization by removing the rejection of bit-field
> read's during DR analysis and by adding two vect patterns. The first one
> transforms TREE_COMPONENT's with BIT_FIELD_DECL's into BIT_FIELD_REF's, this
> is a temporary one as I believe there are plans to do this lowering earlier in
> the compiler. To avoid having to wait for that to happen we decided to
> implement this temporary vect pattern.
> The second one looks for conversions of the result of BIT_FIELD_REF's from a
> 'partial' type to a 'full-type' and transforms it into a 'full-type' load
> followed by the necessary shifting and masking.
> 
> The patch is not perfect, one thing I'd like to change for instance is the way
> the 'full-type' load is represented. I currently abuse the fact that the
> vectorizer transforms the original TREE_COMPONENT with a BIT_FIELD_DECL into a
> full-type vector load, because it uses the smallest mode necessary for that
> precision. The reason I do this is because I was struggling to construct a
> MEM_REF that the vectorizer would accept and this for some reason seemed to
> work ... I'd appreciate some pointers on how to do this properly :)
> 
> Another aspect that I haven't paid much attention to yet is costing, I've
> noticed some testcases fail to vectorize due to costing where I think it might
> be wrong, but like I said, I haven't paid much attention to it.
> 
> Finally another aspect I'd eventually like to tackle is the sinking of the
> masking when possible, for instance in bit-field-read-3.c the 'masking' does
> not need to be inside the loop because we are doing bitwise operations. Though
> I suspect we are better off looking at things like this elsewhere, maybe where
> we do codegen for the reduction... Haven't looked at this at all yet.
> 
> Let me know if you believe this is a good approach? I've ran regression tests
> and this hasn't broken anything so far...

I don't think this is a good approach for what you gain and how 
necessarily limited it will be.  Similar to the recent experiment with
handling _Complex loads/stores this is much better tackled by lowering
things earlier (it will be lowered at RTL expansion time).

One place to do this experimentation would be to piggy-back on the
if-conversion pass so the lowering would happen only on the
vectorized code path.  Note that the desired lowering would look like
the following for reads:

  _1 = a.b;

to

  _2 = a.<representative for b>;
  _1 = BIT_FIELD_REF <2, ...>; // extract bits

and for writes:

  a.b = _1;

to

  _2 = a.<representative for b>;
  _3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits
  a.<representative for b> = _3;

so you trade now handled loads/stores with not handled
BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to
pattern match to shifts and logical ops in the vectorizer.

There's a separate thing of actually promoting all uses, for
example

struct { long long x : 33; } a;

 a.a = a.a + 1;

will get you 33bit precision adds (for bit fields less than 32bits
they get promoted to int but not for larger bit fields).  RTL
expansion again will rewrite this into larger ops plus masking.

So I think the time is better spent in working on the lowering of
bitfield accesses, if sufficiently separated it could be used
from if-conversion by working on loop SEME regions.  The patches
doing previous implementations are probably not too useful anymore
(I find one from 2011 and one from 2016, both pre-dating BIT_INSERT_EXPR)

Richard.

Reply via email to