On Tue, Jun 4, 2019 at 5:26 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Fri, May 31, 2019 at 5:40 PM Andrew MacLeod <amacl...@redhat.com> wrote: > > > > On 5/29/19 7:15 AM, Richard Biener wrote: > > > On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacl...@redhat.com> > > > wrote: > > >> On 5/27/19 9:02 AM, Richard Biener wrote: > > >>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacl...@redhat.com> > > >>> wrote: > > >>>>> The above suggests that iff this is done at all it is not in GORI > > >>>>> because > > >>>>> those are not conditional stmts or ranges from feeding those. The > > >>>>> machinery doing the use-def walking from stmt context also cannot > > >>>>> come along these so I have the suspicion that Ranger cannot handle > > >>>>> telling us that for the stmt following above, for example > > >>>>> > > >>>>> if (_5 != 0) > > >>>>> > > >>>>> that _5 is not zero? > > >>>>> > > >>>>> Can you clarify? > > >>>> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if > > >>>> asked for the range of op2 () would return ~[0,0] for _5. > > >>>> But you are also correct in that the walk backwards would not find > > >>>> this. > > >>>> > > >>>> This is similar functionality to how null_derefs are currently handled, > > >>>> and in fact could probably be done simultaneously using the same code > > >>>> base. I didn't bring null derefs up, but this is a good time :-) > > >>>> > > >>>> There is a separate class used by the gori-cache which tracks the > > >>>> non-nullness property at the block level. It has a single API: > > >>>> non_null_deref_p (name, bb) which determines whether the is a > > >>>> dereference in any BB for NAME, which indicates whether the range has > > >>>> an > > >>>> implicit ~[0,0] range in that basic block or not. > > >>> So when we then have > > >>> > > >>> _1 = *_2; // after this _2 is non-NULL > > >>> _3 = _1 + 1; // _3 is non-NULL > > >>> _4 = *_3; > > >>> ... > > >>> > > >>> when a on-demand user asks whether _3 is non-NULL at the > > >>> point of _4 = *_3 we don't have this information? Since the > > >>> per-BB caching will only say _1 is non-NULL after the BB. > > >>> I'm also not sure whether _3 ever gets non-NULL during > > >>> non-NULL processing of the block since walking immediate uses > > >>> doesn't really help here? > > >> presumably _3 is globally non-null due to the definition being (pointer > > >> + x) ... ie, _3 has a global range o f ~[0,0] ? > > > No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and > > > you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer. > > > > I'm confused. > > > > _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no > > idea what the range of _1 is, so how do you assert _1 is [~0,0] ? > > Bug on my part - it should offset _2: > > _1 = *_2; // after this _2 is non-NULL > _3 = _2 + 1; // _3 is non-NULL > _4 = *_3; > > > The only way I see to determine _3 is non-NULL is through the _4 = *_3 > > statement. > > So how do you derive a ~[0,0] _2 and thus know that *_3 does not trap? > > > > > > > > >>> So this seems to be a fundamental limitation [to the caching scheme], > > >>> not sure if it is bad in practice. > > >>> > > >>> Or am I missing something? > > >>> > > >> Not missing anything The non-nullness property is maintains globally at > > >> the basic block level. both _1 and _3 are flagged as being non-null in > > >> the block. Upon exit, its a bit check. If the global information does > > >> not have the non-nullness property, then when a request is made for > > >> non-nullness and the def and the use are both within the same block, > > >> and its flagged as being non-null in that block, then the request is > > >> forced back to a quick walk between the def and the use to see if there > > >> is any non-nulless introduced in between. Yes, that makes it a linear > > >> walk, but its infrequent, and often short. to the best of our knowledge > > >> at this point anyway :-) > > > So with the clarification above do we ever see that _3 is non-NULL? > > > I suppose the worker processing _3 = _1 + 1 would ask for > > > _1 non-nullness but we do not record any non-NULL-ness of _1 in > > > this basic-block (but only at its end). Consider stmts > > > > > > _4 = (uintptr_t) _2; > > > _5 = _6 / _4; > > > _1 = *_2; > > > ... > > > > > > here at _1 we know _2 is not NULL. But if we ask for non-NULLness > > > of _2 at the definition of _4 we may not compute ~[0, 0] and thus > > > conclude that _6 / _4 does not trap. > > EVRP must look backwards to figure this out since the forward walk will > > process _5 = _6 / _4 before it sees the dereference to _2... so how does > > it know that _4 is non-zero without looking backwards at things after it > > sees the dereference?? Does it actually do this? > > It actually doesn't do this for divisions (but I have a patch lying > somewhere). > During the forward walk when coming along _5 = _6 / _4; it would > in evrp_range_analyzer::record_ranges_from_stmt via infer_value_range > push a new value-range for _4. To infer the same for _2 it would need to > look backward which I think it doesn't do right now but it easily could > (I'm just trying to make up examples to understand ranger better). > When visiting _1 = *_2 we then know _2 is not NULL. > > There may be more "useful" cases, for example we could derive > ranges for shift operands based on undefinedness and their range > may be useful for simplifying adjacent stmts. > > I'm trying to discover how ranger appearantly recording ranges only > at BB boundaries interacts with ranges that become "live" at > some point inside the BB. > > > > > > > stmt-level tracking of ranges are sometimes important. This is > > > something the machinery cannot provide - correct? At least not > > > optimistically enough with ranges derived about uses. > > > > Maybe I'm the one missing something, but in the absence of statement > > level exception throwing via 'can_throw_non_call_exceptions' being true, > > any assertion made anywhere in the block to an ssa_name applies to the > > entire block does it not? ie it doesn't matter if the deference > > happens first thing in the block or last thing, its not going to change > > its value within the block.. its going to be non-null throughout the > > entire block. > > Well, one complication I had with the division by zero patch is that > for _2 = _1 / _3 you can derive that _3 is not zero _after_ the stmt > but you may of course not assume it is not zero during evaluation > of the stmt because that may lead you to assume it may not trap > and thus move it to a place it was not executed previously. > > > so if one statement in the block asserts that references to _2 are > > non-null, we can assert that all references to _2 in the block are > > non-null. Meaning we get all these cases by knowing that the specified > > name is non-zero through-out the block. This also means we could know > > things earlier in the block than a forward walk would provide. > > > > So with the 2 examples: > > > > _1 = *_2; // after this _2 is non-NULL > > _3 = _1 + 1; > > _4 = *_3; > > > > > > > > both _2 and _3 are flagged as non-null in the block due to the > > de-references. Im not sure what we know about _1 from above, but we do > > know > > ~[0,0] = _1 + 1 . so whatever that tells you , if anything, about > > _1 we know. it seems to me _1 is ~[-1,-1] based on that... > > > > Regardless, I think we know all the same things EVRP does. > > > > likewise > > > > _4 = (uintptr_t) _2; > > _5 = _6 / _4; > > _1 = *_2; > > > > _2 will be non-null in the entire block, so _4 must also be non-null > > and we can conclude that the divide does not trap. > > And move the divide up like so? > > _4 = (uintptr_t) _2; > _5 = _6 / _4; > if (flag) > { > _1 = *_2; > ... > } > > most definitely not. > > > > > > > now, when we set the ' can_throw_non_call_exceptions' flag, then we'd > > have to resort to statement walks, and we cannot determine that _5 does > > not trap anyway. EVRP is in the same boat.. It doesn't know its not > > going to trap either because we may never get to the *_2.. > > The point is not that we know *_2 does not trap - we don't! We just know > that if the program reaches the next stmt it did not and thus _2 was not NULL > _for the stmts following the dereference_. > > Note I'm not sure it will make a difference in practice since any transform > on a stmt after *_2 relying on the above creates constraints on stmt ordering > which we cannot represent in the IL and thus the result is likely going to be > very fragile. > > As said, I was just wondering if the non-NULL machinery you have > integrates with the range machinery or is really independent > (thus the pointer derived from the non-NULL one due to the dereference > case above). > > > >>>> yes, compile-time complexity is from empirical speed timings and > > >>>> theory-crafting from the algorithms, and that the on-entry cache > > >>>> prevents multiple passes over things. > > >>>> > > >>>> we have not done a memory analysis yet, not done anything to see if we > > >>>> can improve it. > > >>>> It makes very heavy use of bitmaps, which are typically quite sparse. > > >>>> The on-entry cache is a vector of pointers to a range, initially 0, and > > >>>> we allocate ranges as needed. There will be an on-entry range entry > > >>>> for any ssa-name which has been queried between the query point and the > > >>>> definition. > > >>> So that's similar to LIVE-on-entry thus a SSA name with a range > > >>> will have an on-entry range entry on each BB it dominates? > > >>> That makes storage requirement quadratic in the worst case. > > >> Yes, assuming a request has been made for all ssa-names everywhere they > > >> are live. > > > You did propose to replace [E]VRP with a walk over the whole function > > > querying ranges and simplifying stmts, did you? > > yes, for the case of EVRP. But not all use cases query every range > > everywhere. > > > > its also a bit lower than that since we cache certain types of ranges. > > we cache a range for varying (or range for type if you prefer) of > > whatever the ssa-name type is. so if the range is varying everywhere, > > we actually only have once instance of the range rather than N of them. > > So any name that doesn't have a range reduction anywhere will only > > create a single range instance for the entire CFG. > > But you still have a reference to the range in evry BB dominated by the > definition?
Btw, I was thinking of unsigned char foo(long); void baz(unsigned char c) { try{ long i = c; i += foo(i); i += foo(i); i += foo(i); i += foo(i); ... repeat N times ... alloca(i); // query range of i } catch (...) { } } where the single(!) query of the range of i on the alloca call will populate N BBs caches with the Nth having ranges for all SSA defs of i running into quadraticness in N. If you actually try you probably will have to find a persisting stmt that some user of on-demand query looks for. You converted some of the warning passes so choose a stmt that triggers it from there. Note we've run into "interesting" testcases mostly from machine generated code which we still want to be able to optimize at -O1 at least with reasonable use of time and memory (and quadraticnesses are to be avoided like a plague - not that we don't have some). > > I think thats the > > most common value, so that should reduce a large number of them. I've > > also considering caching ranges like we cache tree constants... but I > > haven't delved into that. I figured if memory turns out to be a > > problem, then we'll look at it then. > > > > Andrew > >