On Mon, Aug 26, 2024 at 5:26 PM Matevos Mehrabyan
<matevosmehrab...@gmail.com> wrote:
>
>
>
> On Mon, Aug 26, 2024 at 2:44 AM Jeff Law <jeffreya...@gmail.com> wrote:
>>
>>
>>
>> On 8/20/24 5:41 AM, Richard Biener wrote:
>>
>> >
>> > So the store-merging variant IIRC tracks a single overall source
>> > only (unless it was extended and I missed that) and operates at
>> > a byte granularity.  I did want to extend it to support vector shuffles
>> > at one point (with two sources then), but didn't get to it.  The
>> > current implementation manages to be quite efficient - mainly due
>> > to the restriction to a single source I guess.
>> >
>> > How does that compare to the symbolic execution engine?
>> >
>> > What can the symbolic execution engine handle?  The store-merging
>> > machinery can only handle plain copies, can the symbolic
>> > execution engine tell that for example bits 3-7 are bits 1-5 from A
>> > plus constant 9 with appropriately truncated result?
>> Conceptually this is the kind of thing it's supposed to handle, but
>> there may be implementation details that are missing for the case you want.
>>
>> More importantly, the execution engine has a limited set of expressions
>> it knows how to evaluate, so there's a reasonable chance if you feed it
>> something more general than what is typically seen in a CRC loop that
>> it's going to give up because it doesn't know how to handle more than
>> just a few operators.
>>
>>
>
> By using this symbolic execution engine, you can determine that bits 3-7 are 
> bits 1-5 from A.
> I think the documentation will help others to understand how it works and 
> what it does.
> Since the documentation is not ready, here is a simple demo example:
> For the following code:
>
>     foo(byte A) {
>         byte tmp = A ^ 5;
>         byte result = tmp << 2;
>         result = result | 4;
>         return result;
>     }
>
> the symbolic executor would:
>
> define(A);  // A = <A7, A6, A5, A4, A3, A2, A1, A0>
> // Here, each bit of A is mapped to its origin A. So A[3]->get_origin() will 
> return A.
> // Besides that, each bit has an index field that denotes its initial 
> position.
> // So A[3]->get_index() will return 3 even if it is moved or assigned to 
> another variable.
> xor(tmp, A, 5);  // tmp = <A7 ^ 0, A6 ^ 0, A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A1 
> ^ 0, A0 ^ 1>
> shift_left(result, tmp, 2);  // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A1 
> ^ 0, A0 ^ 1,0,0>
> or(result, result, 4);  // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A1 ^ 0, 
> 1,0,0>, set result[2] = 1
>
> After these operations, we can examine the result and see that bits 3-7 of 
> the result are 1-5 bits of the A argument.
> For example, result[4] is the (A2 ^ 1) xor expression (can be checked by 
> is_a<bit_xor_expression>),
> so it has left and right operands: one of them is the A2 symbolic bit, and 
> the other is the constant 1.
> So result[4]->get_left()->get_origin() will return A and 
> result[4]->get_left()->get_index() will return 2
> as its initial bit position was that.
>
> The symbolic executor supports few operations, it may need to be extended to 
> use elsewhere.
> Supported operations: AND, OR, XOR, SHIFT_RIGHT, SHIFT_LEFT, ADD, SUB, MUL, 
> and COMPLEMENT.

OK, so it seems it should be able to handle what the bswap pass
requires as well (just with unnecessary
bit precision and possibly some memory/compile-time overhead).  The
bswap pass also handles
{L,R}ROTATE_EXPR but that should be trivially to add if you can handle
shifts.  It can also handle
conversions (zero-/sign-extend and truncate), those should be easy as well.

Can it handle

  tmp = A & 0x00ff00ff00ff0000;
  tmp2 = B & 0xff00ff00ff00ffff;
  result = A | B;

?  That is, make recognizing a blend (or as extension a shuffle) of
two sources into one?

I would guess that parameterizing the engine on the granularity (byte
vs. bit) would be
possible to implement as well as possibly making the granularity
variable as to "split"
bits only when necessary?  I'm thinking of the cost of simulating a
whole function "forward",
keeping a lattice of SSA name -> symbolic execution result here.  Cost
in terms of
both memory and compile-time.

Note it shouldn't be a requirement for you to merge the bswap
byte-tracking code but
it would be good to have the symbolic execution engine extensible
enough to eventually
cover what bswap does today and make the long-wanted extension of recognizing
two-source vector permutes possible.

Richard.

>>
>>
>> >
>> > Note we should always have an eye on compile-time complexity,
>> > GCC does get used on multi-megabyte machine-generated sources
>> > that tend to look very uniform - variants with loops and bit operations
>> > supported by symbolic execution would not catch me in surprise.
>> Which is why it's a two phase recognition.  It uses simple tests to
>> filter out the vast majority of loops, leaving just a few that have a
>> good chance of being a CRC for the more expensive verification step
>> using the symbolic execution engine.
>>
>> jeff
>>
>
> Best Regards,
> Matevos.

Reply via email to