On Tue, Aug 27, 2024 at 12:25 PM Richard Biener <richard.guent...@gmail.com>
wrote:

> On Mon, Aug 26, 2024 at 5:26 PM Matevos Mehrabyan
> <matevosmehrab...@gmail.com> wrote:
> >
> >
> >
> > On Mon, Aug 26, 2024 at 2:44 AM Jeff Law <jeffreya...@gmail.com> wrote:
> >>
> >>
> >>
> >> On 8/20/24 5:41 AM, Richard Biener wrote:
> >>
> >> >
> >> > So the store-merging variant IIRC tracks a single overall source
> >> > only (unless it was extended and I missed that) and operates at
> >> > a byte granularity.  I did want to extend it to support vector
> shuffles
> >> > at one point (with two sources then), but didn't get to it.  The
> >> > current implementation manages to be quite efficient - mainly due
> >> > to the restriction to a single source I guess.
> >> >
> >> > How does that compare to the symbolic execution engine?
> >> >
> >> > What can the symbolic execution engine handle?  The store-merging
> >> > machinery can only handle plain copies, can the symbolic
> >> > execution engine tell that for example bits 3-7 are bits 1-5 from A
> >> > plus constant 9 with appropriately truncated result?
> >> Conceptually this is the kind of thing it's supposed to handle, but
> >> there may be implementation details that are missing for the case you
> want.
> >>
> >> More importantly, the execution engine has a limited set of expressions
> >> it knows how to evaluate, so there's a reasonable chance if you feed it
> >> something more general than what is typically seen in a CRC loop that
> >> it's going to give up because it doesn't know how to handle more than
> >> just a few operators.
> >>
> >>
> >
> > By using this symbolic execution engine, you can determine that bits 3-7
> are bits 1-5 from A.
> > I think the documentation will help others to understand how it works
> and what it does.
> > Since the documentation is not ready, here is a simple demo example:
> > For the following code:
> >
> >     foo(byte A) {
> >         byte tmp = A ^ 5;
> >         byte result = tmp << 2;
> >         result = result | 4;
> >         return result;
> >     }
> >
> > the symbolic executor would:
> >
> > define(A);  // A = <A7, A6, A5, A4, A3, A2, A1, A0>
> > // Here, each bit of A is mapped to its origin A. So A[3]->get_origin()
> will return A.
> > // Besides that, each bit has an index field that denotes its initial
> position.
> > // So A[3]->get_index() will return 3 even if it is moved or assigned to
> another variable.
> > xor(tmp, A, 5);  // tmp = <A7 ^ 0, A6 ^ 0, A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^
> 1, A1 ^ 0, A0 ^ 1>
> > shift_left(result, tmp, 2);  // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^
> 1, A1 ^ 0, A0 ^ 1,0,0>
> > or(result, result, 4);  // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A1
> ^ 0, 1,0,0>, set result[2] = 1
> >
> > After these operations, we can examine the result and see that bits 3-7
> of the result are 1-5 bits of the A argument.
> > For example, result[4] is the (A2 ^ 1) xor expression (can be checked by
> is_a<bit_xor_expression>),
> > so it has left and right operands: one of them is the A2 symbolic bit,
> and the other is the constant 1.
> > So result[4]->get_left()->get_origin() will return A and
> result[4]->get_left()->get_index() will return 2
> > as its initial bit position was that.
> >
> > The symbolic executor supports few operations, it may need to be
> extended to use elsewhere.
> > Supported operations: AND, OR, XOR, SHIFT_RIGHT, SHIFT_LEFT, ADD, SUB,
> MUL, and COMPLEMENT.
>
> OK, so it seems it should be able to handle what the bswap pass
> requires as well (just with unnecessary
> bit precision and possibly some memory/compile-time overhead).  The
> bswap pass also handles
> {L,R}ROTATE_EXPR but that should be trivially to add if you can handle
> shifts.  It can also handle
> conversions (zero-/sign-extend and truncate), those should be easy as well.


Yes, the support can be added.



>

Can it handle
>
>   tmp = A & 0x00ff00ff00ff0000;
>   tmp2 = B & 0xff00ff00ff00ffff;
>   result = A | B;
>
> ?  That is, make recognizing a blend (or as extension a shuffle) of
> two sources into one?


Yes, it can handle these kinds of cases too.
In this case we will have
tmp = <0, 0, A13, A12, 0, 0, A10, A9, 0, 0, A6, A5, 0,0,0,0> (Simple
optimizations, such as A15 & 0 -> 0 are performed in place.)
tmp2 = <B15, B14, 0, 0, B11, B10, 0, 0, B7, B6, 0, 0, B3, B2, B1, B0>
result = <B15, B14, A13, A12, ...., B1, B0>


>
I would guess that parameterizing the engine on the granularity (byte
> vs. bit) would be
> possible to implement as well as possibly making the granularity
> variable as to "split"
> bits only when necessary?  I'm thinking of the cost of simulating a
> whole function "forward",
> keeping a lattice of SSA name -> symbolic execution result here.  Cost
> in terms of
> both memory and compile-time.
>

Yes, parameterizing the engine based on granularity (byte vs. bit) is
possible, but it would require significant changes to the code.



> Note it shouldn't be a requirement for you to merge the bswap
> byte-tracking code but
> it would be good to have the symbolic execution engine extensible
> enough to eventually
> cover what bswap does today and make the long-wanted extension of
> recognizing
> two-source vector permutes possible.
>

In the future, we are ready to add if needed.

Best regards,
Mariam


Richard.
>
> >>
> >>
> >> >
> >> > Note we should always have an eye on compile-time complexity,
> >> > GCC does get used on multi-megabyte machine-generated sources
> >> > that tend to look very uniform - variants with loops and bit
> operations
> >> > supported by symbolic execution would not catch me in surprise.
> >> Which is why it's a two phase recognition.  It uses simple tests to
> >> filter out the vast majority of loops, leaving just a few that have a
> >> good chance of being a CRC for the more expensive verification step
> >> using the symbolic execution engine.
> >>
> >> jeff
> >>
> >
> > Best Regards,
> > Matevos.
>

Reply via email to