On 11/6/20 2:54 PM, Andrew MacLeod wrote:
On 11/6/20 11:13 AM, Martin Sebor wrote:On 11/5/20 8:08 PM, Andrew MacLeod wrote:On 11/5/20 7:50 PM, Martin Sebor wrote:That seems like somethings that the range code for builtins should take care of rather than be calculated externally and injectedOn 11/5/20 5:02 PM, Andrew MacLeod wrote:On 11/5/20 4:02 PM, Martin Sebor wrote:On 11/5/20 12:29 PM, Martin Sebor wrote:On 10/1/20 11:25 AM, Martin Sebor wrote:On 10/1/20 9:34 AM, Aldy Hernandez wrote:On 10/1/20 3:22 PM, Andrew MacLeod wrote: > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:>>> Thanks for doing all this! There isn't anything I don't understand >>> in the sprintf changes so no questions from me (well, almost none).>>> Just some comments: >> Thanks for your comments on the sprintf/strlen API conversion. >>>>> The current call statement is available in all functions that take >>> a directive argument, as dir->info.callstmt. There should be no need >>> to also add it as a new argument to the functions that now need it.>> Fixed. >> >>> The change adds code along these lines in a bunch of places: >>> >>> + value_range vr; >>> + if (!query->range_of_expr (vr, arg, stmt)) >>> + vr.set_varying (TREE_TYPE (arg)); >>> >>> I thought under the new Ranger APIs when a range couldn't be >>> determined it would be automatically set to the maximum for >>> the type. I like that and have been moving in that direction >>> with my code myself (rather than having an API fail, have it >>> set the max range and succeed).>> I went through all the above idioms and noticed all are being used on >> supported types (integers or pointers). So range_of_expr will always>> return true. I've removed the if() and the set_varying. >>>>> Since that isn't so in this case, I think it would still be nice >>> if the added code could be written as if the range were set to >>> varying in this case and (ideally) reduced to just initialization:>>> >>> value_range vr = some-function (query, stmt, arg); >>>>>> some-function could be an inline helper defined just for the sprintf >>> pass (and maybe also strlen which also seems to use the same pattern), >>> or it could be a value_range AKA irange ctor, or it could be a member>>> of range_query, whatever is the most appropriate. >>>>>> (If assigning/copying a value_range is thought to be too expensive, >>> declaring it first and then passing it to that helper to set it>>> would work too). >>>>>> In strlen, is the removed comment no longer relevant? (I.e., does>>> the ranger solve the problem?) >>>>>> - /* The range below may be "inaccurate" if a constant has been >>> - substituted earlier for VAL by this pass that hasn't been >>> - propagated through the CFG. This shoud be fixed by the new >>> - on-demand VRP if/when it becomes available (hopefully in>>> - GCC 11). */ >> It should. >>>>> I'm wondering about the comment added to get_range_strlen_dynamic>>> and other places: >>> >>> + // FIXME: Use range_query instead of global ranges. >>>>>> Is that something you're planning to do in a followup or should>>> I remember to do it at some point?>> I'm not planning on doing it. It's just a reminder that it would be>> beneficial to do so. >> >>> Otherwise I have no concern with the changes.>> It's not cleared whether Andrew approved all 3 parts of the patchset >> or just the valuation part. I'll wait for his nod before committing>> this chunk. >> >> Aldy >> > I have no issue with it, so OK. Pushed all 3 patches. >> Just an observation that should be pointed out, I believe Aldy has all > the code for converting to a ranger, but we have not pursued that any > further yet since there is a regression due to our lack of equivalence > processing I think? That should be resolved in the coming month, but at > the moment is a holdback/concern for converting these passes... iirc.Yes. Martin, the take away here is that the strlen/sprintf pass has been converted to the new API, but ranger is still not up and running on it (even on the branch).With the new API, all you have to do is remove all instances of evrp_range_analyzer and replace them with a ranger. That's it. Below is an untested patch that would convert you to a ranger once it's contributed.IIRC when I enabled the ranger for your pass a while back, there was one or two regressions due to missing equivalences, and the rest were because the tests were expecting an actual specific range, and the ranger returned a slightly different/better one. You'll need to adjust your tests.Ack. I'll be on the lookout for the ranger commit (if you hppen to remember and CC me on it just in case I might miss it that would be great).I have applied the patch and ran some tests. There are quite a few failures (see the list below). I have only looked at a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c boils down to the following test case. There should be no warning for either sprintf call. The one in h() is a false positive and the reason for at least some of the regressions. Somehow, the conversions between int and char are causing Ranger to lose the range. $ cat t.c && gcc -O2 -S -Wall t.c char a[2]; extern int x; signed char f (int min, int max) { signed char i = x; return i < min || max < i ? min : i; } void ff (signed char i) { __builtin_sprintf (a, "%i", f (0, 9)); // okay } signed char g (signed char min, signed char max) { signed char i = x; return i < min || max < i ? min : i; } void gg (void) { __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning } t.c: In function ‘gg’:t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into a region of size 2 [-Wformat-overflow=]24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning | ^~ t.c:24:25: note: directive argument in the range [-128, 127] 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning | ^~~~t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into a destination of size 224 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Another failure (the one in builtin-sprintf-warn-22.c) is this where the false positive is due to the strlen pass somehow having lost the upper bound on the sum of the two string lengths.I'm guessing this might have something to do with the strlen pass calling set_range_info() on the nonconstant results of strlen calls it can determine the range for. How would I go about doing it with ranger? I see ranger_cache::set_global_range() and the class ctor takes a gimple_ranger as argument. Would calling the function be the right way to do it? (I couldn't find anything on the Wiki.)I haven't had time to write a new modern wiki yet... probably won't until well after stage 1 closes either.And no, thats the internal cache of the ranger, which you dont have access to.At the moment, there isnt a way to inject a "new" global range out of the blue. It hasnt been a requirement before, but wouldn't be particularly difficult.Short answer, there is no current equivalent of set_range_info() because we dont have a persistent global cache in the same way, and the ranger doesnt set the RANGE_INFO field at this point.. we havent gotten to the point of reworking that yet.So the question is, what exactly are you trying to do? I presume you are analyzing a strlen statement, and are making range conclusions about either the result or some of the operands are that are not otherwise exposed in the code?And you are looking to have those values injected into the rangers calculations as refinements?Yes, exactly. This is used both for optimization and for warnings (both to enable them and to suppress false positives). Besides folding strlen and sprintf calls into constants, the strlen pass sets the range of the results of the calls when the lengths of the arguments/output are known to be at least some number of bytes. Besides exposing optimization opportunities downstream, the pass uses this information to either issue or suppress warnings like in the sprintf case above. -Wformat-overflow issues warnings either based on string lengths (or ranges) if they're known, or based on the sizes of the arrays the strings are stored in when nothing is known about the lengths. The array size is used as the worst case estimate, which can cause false positives. Here's an example of a downstream optimization made possible by this. Because i's range is known, the range of the snprintf result is also known (the range is computed for all snprintf arguments). The strlen pass sets the range and subsequent passes eliminate the unreachable code. void f (int i) { if (i < 100 || 9999 < i) i = 100; int n = __builtin_snprintf (0, 0, "%i", i); if (n < 3 || 4 < n) __builtin_abort (); } Martinthey are just like any other builtins that if the range of certain arguments are known, you can produce a result?The strlen optimization is more involved in that it keeps track of the state of the strings as they're being created by calls to string functions like strcpy (or single and multi-byte stores). So the only way for Ranger to figure out the range of the result of a strlen call is to query the strlen pass. That might work via a callback but it would of course only work while the strlen pass is running. The same goes for the sprintf results (which are based on the strlen data). It might be good enough as long as the ranges can also be exported to downstream passes. It might be possible to compute this on demand too (and introduce something like a strlen_range_query) but it would need reworking the strlen pass. I like the on-demand design but in the strlen case I'm not sure the benefits would justify the costs.I gather the range of i being known to be [100,9999] means we know something about n. The the ranger would be able to do that anywhere and everywhere, not just within this pass. One of the ranger goals is to centralize all range trackingand the warning pass can continue to check the ranges and issue warnings when the ranges arent what it likes.In this simple case, yes. But consider a more involved example: char a[8]; void f (int i) { if (i < 100 || 9999 < i) i = 100; memcpy (a, "123", 3); // strlen(a) is in [3, 7] int n = snprintf (0, 0, "%s %i", a, i); // n is in [7, 12] if (n < 7 || 12 < n) // folded to false abort (); } To compute n's range you need to know not only that of i but also the length of a (or its range in this case). That's what the strlen optimization enables.Thats why we need precise examples :-)Ive attached a thouroughly untested patch which you can apply to the latest trunk stuff I've checked in and try. It gives you an entry point into the ranger:bool import_global_range (tree name, irange &r); NAME is the ssa_name you are updating the global value for, andR is the new range you wish to add. Note it is not a replacement, but rather an intergration. ie, and extra source of information the ranger can't see.you can call it with your newly calculated value,and it will enhance the value the ranger already has with the extra information.. so above, you would call it withranger->import_global_range (n_6, [7,12])this will set the range for n_6 to the INTERSECTION of [7,12] and whatever the ranger already knows. If its varying, then it'll set it to [7,12]. If the ranger had already figured it was [9,20] for some reason, then the result will be [9,12] aftewr this call, and it will reurn FALSE indicating the new range is different than the range you passed in. if you care.It should also push the range of n_6 forward into pother blocks, so on entry to the block with the abort(), it should have a range of [], indicating it can't eb reached.There are a couple of caveats.1) It may or may not affect other dependent value at this point depending on the order you visit things since the propagation of dependency chains is not complete yet. ie if the abort() block hash_4 = n_6 + 2g_2 = h_4 + 6 , g_2 may not be reevaluated with the new range... It might.. but I cant guarantee it just now.2) If you wish this value to be exported into SSA_NAME_RANGE_INFO, you will have to continue doing that manually. The ranger cache is not currently ever exported the RANGE_INFO fields. We've run into numerous issues with "differences": that occur when a multi-range is converted to a value_range and although the range is correct, it may be different that the orignal and produce different results. We are waiting until we replace the SSA_NAME_RANGE_INFO mechanism.Anyway, let me known if this works. I havent had a chance to test it yet. If it works we can add it to the full API
Thanks a lot! I'll try to get to it sometime this week if I can. Martin