On Mon, Aug 26, 2024 at 2:44 AM Jeff Law <jeffreya...@gmail.com> wrote:

>
>
> On 8/20/24 5:41 AM, Richard Biener wrote:
>
> >
> > So the store-merging variant IIRC tracks a single overall source
> > only (unless it was extended and I missed that) and operates at
> > a byte granularity.  I did want to extend it to support vector shuffles
> > at one point (with two sources then), but didn't get to it.  The
> > current implementation manages to be quite efficient - mainly due
> > to the restriction to a single source I guess.
> >
> > How does that compare to the symbolic execution engine?
> >
> > What can the symbolic execution engine handle?  The store-merging
> > machinery can only handle plain copies, can the symbolic
> > execution engine tell that for example bits 3-7 are bits 1-5 from A
> > plus constant 9 with appropriately truncated result?
> Conceptually this is the kind of thing it's supposed to handle, but
> there may be implementation details that are missing for the case you want.
>
> More importantly, the execution engine has a limited set of expressions
> it knows how to evaluate, so there's a reasonable chance if you feed it
> something more general than what is typically seen in a CRC loop that
> it's going to give up because it doesn't know how to handle more than
> just a few operators.
>
>
>
By using this symbolic execution engine, you can determine that bits 3-7
are bits 1-5 from A.
I think the documentation will help others to understand how it works and
what it does.
Since the documentation is not ready, here is a simple demo example:
For the following code:

    foo(byte A) {
        byte tmp = A ^ 5;
        byte result = tmp << 2;
        result = result | 4;
        return result;
    }

the symbolic executor would:

define(A);  // A = <A7, A6, A5, A4, A3, A2, A1, A0>
// Here, each bit of A is mapped to its origin A. So
A[3]->get_origin() will return A.
// Besides that, each bit has an index field that denotes its initial
position.
// So A[3]->get_index() will return 3 even if it is moved or assigned to
another variable.
xor(tmp, A, 5);  // tmp = <A7 ^ 0, A6 ^ 0, A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A
1 ^ 0, A0 ^ 1>
shift_left(result, tmp, 2);  // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A
1 ^ 0, A0 ^ 1,0,0>
or(result, result, 4);  // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A1 ^
0, 1,0,0>, set result[2] = 1

After these operations, we can examine the result and see that bits 3-7 of
the result are 1-5 bits of the A argument.
For example, result[4] is the (A2 ^ 1) xor expression (can be checked by
is_a<bit_xor_expression>),
so it has left and right operands: one of them is the A2 symbolic bit, and
the other is the constant 1.
So result[4]->get_left()->get_origin() will return A and
result[4]->get_left()->get_index() will return 2
as its initial bit position was that.

The symbolic executor supports few operations, it may need to be extended
to use elsewhere.
Supported operations: AND, OR, XOR, SHIFT_RIGHT, SHIFT_LEFT, ADD, SUB, MUL,
and COMPLEMENT.


>
> >
> > Note we should always have an eye on compile-time complexity,
> > GCC does get used on multi-megabyte machine-generated sources
> > that tend to look very uniform - variants with loops and bit operations
> > supported by symbolic execution would not catch me in surprise.
> Which is why it's a two phase recognition.  It uses simple tests to
> filter out the vast majority of loops, leaving just a few that have a
> good chance of being a CRC for the more expensive verification step
> using the symbolic execution engine.
>
> jeff
>
>
Best Regards,
Matevos.

Reply via email to