On Tue, Aug 27, 2024 at 12:25 PM Richard Biener <richard.guent...@gmail.com> wrote:
> On Mon, Aug 26, 2024 at 5:26 PM Matevos Mehrabyan > <matevosmehrab...@gmail.com> wrote: > > > > > > > > On Mon, Aug 26, 2024 at 2:44 AM Jeff Law <jeffreya...@gmail.com> wrote: > >> > >> > >> > >> On 8/20/24 5:41 AM, Richard Biener wrote: > >> > >> > > >> > So the store-merging variant IIRC tracks a single overall source > >> > only (unless it was extended and I missed that) and operates at > >> > a byte granularity. I did want to extend it to support vector > shuffles > >> > at one point (with two sources then), but didn't get to it. The > >> > current implementation manages to be quite efficient - mainly due > >> > to the restriction to a single source I guess. > >> > > >> > How does that compare to the symbolic execution engine? > >> > > >> > What can the symbolic execution engine handle? The store-merging > >> > machinery can only handle plain copies, can the symbolic > >> > execution engine tell that for example bits 3-7 are bits 1-5 from A > >> > plus constant 9 with appropriately truncated result? > >> Conceptually this is the kind of thing it's supposed to handle, but > >> there may be implementation details that are missing for the case you > want. > >> > >> More importantly, the execution engine has a limited set of expressions > >> it knows how to evaluate, so there's a reasonable chance if you feed it > >> something more general than what is typically seen in a CRC loop that > >> it's going to give up because it doesn't know how to handle more than > >> just a few operators. > >> > >> > > > > By using this symbolic execution engine, you can determine that bits 3-7 > are bits 1-5 from A. > > I think the documentation will help others to understand how it works > and what it does. > > Since the documentation is not ready, here is a simple demo example: > > For the following code: > > > > foo(byte A) { > > byte tmp = A ^ 5; > > byte result = tmp << 2; > > result = result | 4; > > return result; > > } > > > > the symbolic executor would: > > > > define(A); // A = <A7, A6, A5, A4, A3, A2, A1, A0> > > // Here, each bit of A is mapped to its origin A. So A[3]->get_origin() > will return A. > > // Besides that, each bit has an index field that denotes its initial > position. > > // So A[3]->get_index() will return 3 even if it is moved or assigned to > another variable. > > xor(tmp, A, 5); // tmp = <A7 ^ 0, A6 ^ 0, A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ > 1, A1 ^ 0, A0 ^ 1> > > shift_left(result, tmp, 2); // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ > 1, A1 ^ 0, A0 ^ 1,0,0> > > or(result, result, 4); // result = <A5 ^ 0, A4 ^ 0, A3 ^ 0, A2 ^ 1, A1 > ^ 0, 1,0,0>, set result[2] = 1 > > > > After these operations, we can examine the result and see that bits 3-7 > of the result are 1-5 bits of the A argument. > > For example, result[4] is the (A2 ^ 1) xor expression (can be checked by > is_a<bit_xor_expression>), > > so it has left and right operands: one of them is the A2 symbolic bit, > and the other is the constant 1. > > So result[4]->get_left()->get_origin() will return A and > result[4]->get_left()->get_index() will return 2 > > as its initial bit position was that. > > > > The symbolic executor supports few operations, it may need to be > extended to use elsewhere. > > Supported operations: AND, OR, XOR, SHIFT_RIGHT, SHIFT_LEFT, ADD, SUB, > MUL, and COMPLEMENT. > > OK, so it seems it should be able to handle what the bswap pass > requires as well (just with unnecessary > bit precision and possibly some memory/compile-time overhead). The > bswap pass also handles > {L,R}ROTATE_EXPR but that should be trivially to add if you can handle > shifts. It can also handle > conversions (zero-/sign-extend and truncate), those should be easy as well. Yes, the support can be added. > Can it handle > > tmp = A & 0x00ff00ff00ff0000; > tmp2 = B & 0xff00ff00ff00ffff; > result = A | B; > > ? That is, make recognizing a blend (or as extension a shuffle) of > two sources into one? Yes, it can handle these kinds of cases too. In this case we will have tmp = <0, 0, A13, A12, 0, 0, A10, A9, 0, 0, A6, A5, 0,0,0,0> (Simple optimizations, such as A15 & 0 -> 0 are performed in place.) tmp2 = <B15, B14, 0, 0, B11, B10, 0, 0, B7, B6, 0, 0, B3, B2, B1, B0> result = <B15, B14, A13, A12, ...., B1, B0> > I would guess that parameterizing the engine on the granularity (byte > vs. bit) would be > possible to implement as well as possibly making the granularity > variable as to "split" > bits only when necessary? I'm thinking of the cost of simulating a > whole function "forward", > keeping a lattice of SSA name -> symbolic execution result here. Cost > in terms of > both memory and compile-time. > Yes, parameterizing the engine based on granularity (byte vs. bit) is possible, but it would require significant changes to the code. > Note it shouldn't be a requirement for you to merge the bswap > byte-tracking code but > it would be good to have the symbolic execution engine extensible > enough to eventually > cover what bswap does today and make the long-wanted extension of > recognizing > two-source vector permutes possible. > In the future, we are ready to add if needed. Best regards, Mariam Richard. > > >> > >> > >> > > >> > Note we should always have an eye on compile-time complexity, > >> > GCC does get used on multi-megabyte machine-generated sources > >> > that tend to look very uniform - variants with loops and bit > operations > >> > supported by symbolic execution would not catch me in surprise. > >> Which is why it's a two phase recognition. It uses simple tests to > >> filter out the vast majority of loops, leaving just a few that have a > >> good chance of being a CRC for the more expensive verification step > >> using the symbolic execution engine. > >> > >> jeff > >> > > > > Best Regards, > > Matevos. >