On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <qing.z...@oracle.com> wrote: > > > > On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guent...@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <qing.z...@oracle.com> wrote: > > > > > On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guent...@gmail.com> wrote: > > On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <qing.z...@oracle.com> wrote: > > > > > On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guent...@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <qing.z...@oracle.com> wrote: > > Another issue is, in order to check whether an auto-variable has initializer, > I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag during gimplification. > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see > expand_stack_vars) > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > Haven't thought about this much, so it might be a daft idea, but would a > compromise be to use a const internal function: > > X1 = .DEFERRED_INIT (X0, INIT) > > where the X0 argument is an uninitialised value and the INIT argument > describes the initialisation pattern? So for a decl we'd have: > > X = .DEFERRED_INIT (X, INIT) > > and for an SSA name we'd have: > > X_2 = .DEFERRED_INIT (X_1(D), INIT) > > with all other uses of X_1(D) being replaced by X_2. The idea is that: > > * Having the X0 argument would keep the uninitialised use of the > variable around for the later warning passes. > > * Using a const function should still allow the UB to be deleted as dead > if X1 isn't needed. > > * Having a function in the way should stop passes from taking advantage > of direct uninitialised uses for optimisation. > > This means we won't be able to optimise based on the actual init > value at the gimple level, but that seems like a fair trade-off. > AIUI this is really a security feature or anti-UB hardening feature > (in the sense that users are more likely to see predictable behaviour > “in the field” even if the program has UB). > > > The question is whether it's in line of peoples expectation that > explicitely zero-initialized code behaves differently from > implicitely zero-initialized code with respect to optimization > and secondary side-effects (late diagnostics, latent bugs, etc.). > > Introducing a new concept like .DEFERRED_INIT is much more > heavy-weight than an explicit zero initializer. > > > What exactly you mean by “heavy-weight”? More difficult to implement or much > more run-time overhead or both? Or something else? > > The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep > the current -Wuninitialized analysis untouched and also pass > the “uninitialized” info from source code level to “pass_expand”. > > > Well, "untouched" is a bit oversimplified. You do need to handle > .DEFERRED_INIT as not > being an initialization which will definitely get interesting. > > > Yes, during uninitialized variable analysis pass, we should specially handle > the defs with “.DEFERRED_INIT”, to treat them as uninitializations. > > If we want to keep the current -Wuninitialized analysis untouched, this is a > quite reasonable approach. > > However, if it’s not required to keep the current -Wuninitialized analysis > untouched, adding zero-initializer directly during gimplification should > be much easier and simpler, and also smaller run-time overhead. > > > As for optimization I fear you'll get a load of redundant zero-init > actually emitted if you can just rely on RTL DSE/DCE to remove it. > > > Runtime overhead for -fauto-init=zero is one important consideration for the > whole feature, we should minimize the runtime overhead for zero > Initialization since it will be used in production build. > We can do some run-time performance evaluation when we have an implementation > ready. > > > Note there will be other passes "confused" by .DEFERRED_INIT. Note > that there's going to be other > considerations - namely where to emit the .DEFERRED_INIT - when > emitting it during gimplification > you can emit it at the start of the block of block-scope variables. > When emitting after gimplification > you have to emit at function start which will probably make stack slot > sharing inefficient because > the deferred init will cause overlapping lifetimes. With emitting at > block boundary the .DEFERRED_INIT > will act as code-motion barrier (and it itself likely cannot be moved) > so for example invariant motion > will no longer happen. Likewise optimizations like SRA will be > confused by .DEFERRED_INIT which > again will lead to bigger stack usage (and less optimization). > > > Yes, looks like that the inserted “.DEFERRED_INIT” function calls will > negatively impact tree optimizations. > > > But sure, you can try implement a few variants but definitely > .DEFERRED_INIT will be the most > work. > > > How about implement the following two approaches and compare the run-time > cost: > > A. Insert the real initialization during gimplification phase. > B. Insert the .DEFERRED_INIT during gimplification phase, and then expand > this call to real initialization during expand phase. > > The Approach A will have less run-time overhead, but will mess up the current > uninitialized variable analysis in GCC. > The Approach B will have more run-time overhead, but will keep the current > uninitialized variable analysis in GCC. > > And then decide which approach we will go with? > > What’s your opinion on this? > > > Well, in the end you have to try. Note for the purpose of stack slot > sharing you do want the > instrumentation to happen during gimplification. > > Another possibility is to materialize .DEFERRED_INIT earlier than > expand, for example shortly > after IPA optimizations to avoid pessimizing loop transforms and allow > SRA. At the point you > materialize the inits you could run the late uninit warning pass > (which would then be earlier > than regular but would still see the .DEFERRED_INIT). > > > If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested > above, > the late uninitialized warning pass has to be moved earlier in order to > utilize the “.DEFERRED_INIT”. > Then we might miss some opportunities for the late uninitialized warning. I > think that this is not we really > want. > > > While users may be happy to pay some performance stack usage is > probably more critical > > > So, which pass is for computing the stack usage?
There is no pass doing that, stack slot assignment and sharing (when lifetimes do not overlap) is done by RTL expansion. > (just thinking of the kernel) so not regressing there should be as > important as preserving > uninit warnings (which I for practical purposes see not important at > all - people can do > "debug" builds without -fzero-init). > > > Looks like that the major issue with the “.DERERRED_INIT” approach is: the > new inserted calls to internal const function > might inhibit some important tree optimizations. > > So, I am thinking again the following another approach I raised in the very > beginning: > > During gimplification phase, mark the DECL for an auto variable without > initialization as “no_explicit_init”, then maintain this > “no_explicit_init” bit till after pass_late_warn_uninitialized, or till > pass_expand, add zero-iniitiazation for all DECLs that are > marked with “no_explicit_init”. > > This approach will not have the issue to interrupt tree optimizations, > however, I guess that “maintaining this “no_explicit_init” bit > might be very difficult? > > Do you have any comments on this approach? As said earlier you'll still get optimistic propagation bypassing the still missing implicit zero init. Maybe that's OK - you don't get "garbage" but you'll get some other defined value. As said, you have to implement a few options and compare. Richard. > thanks. > > Qing > > > > Richard. > > > Btw, I don't think theres any reason to cling onto clangs semantics > for a particular switch. We'll never be able to emulate 1:1 behavior > and our -Wuninit behavior is probably wastly different already. > > > From my study so far, yes, the currently behavior of -Wunit for Clang and GCC > is not exactly the same. > > For example, for the following small testing case: > void blah(int); > > int foo_2 (int n, int l, int m, int r) > { > int v; > > if ( (n > 10) && (m != 100) && (r < 20) ) > v = r; > > if (l > 100) > if ( (n <= 8) && (m < 102) && (r < 19) ) > blah(v); /* { dg-warning "uninitialized" "real warning" } */ > > return 0; > } > > GCC is able to report maybe uninitialized warning, but Clang cannot. > Looks like that GCC’s uninitialized analysis relies on more analysis and > optimization information than CLANG. > > Really curious on how clang implement its uninitialized analysis? > > > > Actually, I studied a little bit on how clang implement its uninitialized > analysis last Friday. > And noticed that CLANG has a data flow analysis phase based on CLANG's AST. > http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html > > And clang’s uninitialized analysis is based on this data flow analysis. > > Therefore, adding initialization AFTER clang’s uninitialization analysis > phase is straightforward. > > However, for GCC, we don’t have data flow analysis in FE. The uninitialized > variable analysis is put in TREE optimization phase, > Therefore, it’s much more difficult to implement this feature in GCC than > that in CLANG. > > Qing > > > > Qing > > > > > Richard. > > Thanks, > Richard > >