On Mon, Aug 26, 2024 at 2:44 AM Jeff Law <jeffreya...@gmail.com> wrote:
> > > On 8/20/24 5:41 AM, Richard Biener wrote: > > > > > So the store-merging variant IIRC tracks a single overall source > > only (unless it was extended and I missed that) and operates at > > a byte granularity. I did want to extend it to support vector shuffles > > at one point (with two sources then), but didn't get to it. The > > current implementation manages to be quite efficient - mainly due > > to the restriction to a single source I guess. > > > > How does that compare to the symbolic execution engine? > > > > What can the symbolic execution engine handle? The store-merging > > machinery can only handle plain copies, can the symbolic > > execution engine tell that for example bits 3-7 are bits 1-5 from A > > plus constant 9 with appropriately truncated result? > Conceptually this is the kind of thing it's supposed to handle, but > there may be implementation details that are missing for the case you want. > > More importantly, the execution engine has a limited set of expressions > it knows how to evaluate, so there's a reasonable chance if you feed it > something more general than what is typically seen in a CRC loop that > it's going to give up because it doesn't know how to handle more than > just a few operators. > > > By using this symbolic execution engine, you can determine that bits 3-7 are bits 1-5 from A. I think the documentation will help others to understand how it works and what it does. Since the documentation is not ready, here is a simple demo example: For the following code: foo(byte A) { byte tmp = A ^ 5; byte result = tmp << 2; result = result | 4; return result; } the symbolic executor would: define(A); // A = <A7, A6, A5, A4, A3, A2, A1, A0> // Here, each bit of A is mapped to its origin A. So A[3]->get_origin() will return A. // Besides that, each bit has an index field that denotes its initial position. // So A[3]->get_index() will return 3 even if it is moved or assigned to another variable. xor(tmp, A, 5); // tmp = <A7 ^ 0, A6 ^ 0, A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A 1 ^ 0, A0 ^ 1> shift_left(result, tmp, 2); // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A 1 ^ 0, A0 ^ 1,0,0> or(result, result, 4); // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A1 ^ 0, 1,0,0>, set result[2] = 1 After these operations, we can examine the result and see that bits 3-7 of the result are 1-5 bits of the A argument. For example, result[4] is the (A2 ^ 1) xor expression (can be checked by is_a<bit_xor_expression>), so it has left and right operands: one of them is the A2 symbolic bit, and the other is the constant 1. So result[4]->get_left()->get_origin() will return A and result[4]->get_left()->get_index() will return 2 as its initial bit position was that. The symbolic executor supports few operations, it may need to be extended to use elsewhere. Supported operations: AND, OR, XOR, SHIFT_RIGHT, SHIFT_LEFT, ADD, SUB, MUL, and COMPLEMENT. > > > > > Note we should always have an eye on compile-time complexity, > > GCC does get used on multi-megabyte machine-generated sources > > that tend to look very uniform - variants with loops and bit operations > > supported by symbolic execution would not catch me in surprise. > Which is why it's a two phase recognition. It uses simple tests to > filter out the vast majority of loops, leaving just a few that have a > good chance of being a CRC for the more expensive verification step > using the symbolic execution engine. > > jeff > > Best Regards, Matevos.