Re: New no-undefined-overflow branch
On Thu, 5 Mar 2009, Geert Bosch wrote: > Hi Richard, > > Great to see that you're addressing this issue. If I understand correctly, > for RTL all operations are always wrapping, right? That's true (we don't have signedness information there) and I don't plan to change that. > I have been considering adding "V" variants for operations that trap on > overflow. The main reason I have not (yet) pursued this, is the daunting > task of teaching the folders about all these new codes. initially I'd > like to lower the "V" operations to explicit checks (calling abort() or > raising an exception, depending on the language) during gimplification, > but with the idea of eventually delaying expansion as more code learns > how to handle the new expressions. I already have most of the code necessary > for the expansions, as I now do them during translation of Ada trees to > GENERIC trees. Actually, the new and saner wrapping semantics for > {PLUS,MINUS,MULT}_EXPR simplify these a bit, by avoiding the need to use > unsigned types. Correct. > As you obviously doing something very similar now by introducing "NV" > variants, > do you think this would fit in with your scheme? If so, I'd be happy to try > and expand on your work to have, wrapping, no-overflow and overflow-checking > variants for basic arithmetic. I didn't spend too much time thinking about the trapping variants (well, I believe it isn't that important ;)). But in general we would have to expand the non-NV variants via the trapping expanders if flag_trapv was true (so yeah, combining TUs with different flag_trapv settings would be difficult again and it would ask for explicitly encoding this variant in the IL ...). There is of course the problem that we have to be careful not to introduce new traps via folding, a problem that doesn't exist with the no-overflow variants (I can simply drop to the wrapping variants). With for example (a -/v 10) +/v 10 would you want to preserve the possibly trapping a -/v 10? For (a -/v 10) +/v (b -/v 10) do we want to be careful to not introduce extra traps when simplifying to (a +/v b) -/v 20? So while trapping variants can certainly be introduced it looks like this task may be more difficult. So lowering them early during gimplification looks like a more reasonable plan IMHO. Richard.
lrint lrintf problems
Hi, I see in compiler source there is in builtin.c lrint lrintf.so the gcc need no extern linker lib when i understand right. But get linker error when use them.I test compiler 3.4.0 4.3.2 and 4.4.0 I also link with libgcc.but does not help also my includes-fixed/math.h contain a declaration but it is long int. extern long int lrint _PARAMS((double)); This mean 64 bit int ? But i use 32 bit 68k compiler and as far i see lrint return a long in docu. only solution i find short is add in library. long lrintf(float x) { return (long)x; } But when do this code, then there should be used current rounding.68k for example support a simple fmove.l fp0,d0 and use current rounding to convert from float. But gcc (test with 4.3.2)output this code in -m68020 -m68881 -03. FMOVE.S #+1.2344999E+4,FP0 FADD.L D4,FP0 FINTRZ.X FP0 FMOVE.L FP0,-(A7) But i think there is no fintrz need and this code do the same job and use the correct rounding setting. FMOVE.S #+1.2344999E+4,FP0 FADD.L D4,FP0 FMOVE.L FP0,-(A7) Or i am wrong ?.Or is "return (long)x;" specified to always round to zero ? How can i then implement lrint func without asm Code ? please help
Re: lrint lrintf problems
On Fri, Mar 6, 2009 at 12:27 PM, Bernd Roesch wrote: > Hi, > > I see in compiler source there is in builtin.c lrint lrintf.so the gcc need > no extern linker lib when i understand right. > > But get linker error when use them.I test compiler 3.4.0 4.3.2 and 4.4.0 > > I also link with libgcc.but does not help > > also my includes-fixed/math.h > > contain a declaration but it is long int. > extern long int lrint _PARAMS((double)); > > This mean 64 bit int ? > > But i use 32 bit 68k compiler and as far i see lrint return a long in docu. > > only solution i find short is add in library. > > long lrintf(float x) > { > return (long)x; > } > > But when do this code, then there should be used current rounding.68k for > example support a simple fmove.l fp0,d0 and use current rounding to convert > from float. > > But gcc (test with 4.3.2)output this code in -m68020 -m68881 -03. > > FMOVE.S #+1.2344999E+4,FP0 > FADD.L D4,FP0 > FINTRZ.X FP0 > FMOVE.L FP0,-(A7) > > But i think there is no fintrz need and this code do the same job and use > the correct rounding setting. > > FMOVE.S #+1.2344999E+4,FP0 > FADD.L D4,FP0 > FMOVE.L FP0,-(A7) > > Or i am wrong ?.Or is "return (long)x;" specified to always round to zero ? > > How can i then implement lrint func without asm Code ? If the backend does not support inlining lrint then you need to link against a C99 math library (-lm). You can add lrint patterns to m68k.md to provide inline expansions. Richard. > please help > >
GCC 4.4 is not able to build itself under Cygwin
This build problem has been occuring for at least a month, both on the most recent snapshots and on trunk. The compiler is configured as follows: ../configure --prefix=/opt/gcc-4.4.0-20090227 -v --enable-bootstrap --enable-version-specific-runtime-libs --enable-static --enable-shared --enable-shared-libgcc --with-gnu-ld --with-gnu-as --enable-sjlj-exceptions --enable-languages=c,c++ --disable-symvers --enable-libjava --disable-nls --with-cpu-32=core2 --with-cpu-64=core2 --enable-threads=win32 There are actually two problems in libstdc++. The first one is is caused by the existence of the min/max macros, injected by windows.h. make CCXFLAGS=-DNOMINMAX solves the issue. The next one is more serious. The file gthr-default.h, line 620 contains the macro #define CONST_CAST2(TOTYPE,FROMTYPE,X) ((__extension__(union {FROMTYPE _q; TOTYPE _nq;})(X))._nq) but the compiler claims that it is not allowed to declare a type within a cast. I've temporarily replaced it with #define CONST_CAST2(TOTYPE,FROMTYPE,X) ((TOTYPE)(X)) and that solved the issue, however, a cleaner patch should be applied. Best regards, Piotr Wyderski
Re: GCC 4.4 is not able to build itself under Cygwin
On Fri, Mar 6, 2009 at 1:48 PM, Piotr Wyderski wrote: > This build problem has been occuring for at least a month, > both on the most recent snapshots and on trunk. > > The compiler is configured as follows: > > ../configure --prefix=/opt/gcc-4.4.0-20090227 -v --enable-bootstrap > --enable-version-specific-runtime-libs --enable-static --enable-shared > --enable-shared-libgcc --with-gnu-ld --with-gnu-as --enable-sjlj-exceptions > --enable-languages=c,c++ --disable-symvers --enable-libjava --disable-nls > --with-cpu-32=core2 --with-cpu-64=core2 --enable-threads=win32 > > There are actually two problems in libstdc++. The first one is is caused > by the existence of the min/max macros, injected by windows.h. > > make CCXFLAGS=-DNOMINMAX Where is windows.h included? I guess the libstdc++ headers should be fixed to #define NOMINMAX in the algorithms headers or cygwin should define this macro unconditionally. > solves the issue. The next one is more serious. The file gthr-default.h, > line 620 contains the macro > > #define CONST_CAST2(TOTYPE,FROMTYPE,X) ((__extension__(union {FROMTYPE > _q; TOTYPE _nq;})(X))._nq) > > but the compiler claims that it is not allowed to declare a type within a > cast. > I've temporarily replaced it with > > #define CONST_CAST2(TOTYPE,FROMTYPE,X) ((TOTYPE)(X)) > > and that solved the issue, however, a cleaner patch should be applied. During what part of compilation is CONST_CAST2 a problem? Richard. > Best regards, > Piotr Wyderski >
Re: GCC 4.4 is not able to build itself under Cygwin
Richard Guenther wrote: > Where is windows.h included? I am not 100% sure yet -- the build process is in progress right now, so I don't want to interfere. I suspect it is the one included by gthr-win32.h. It contains the following lines: #include /* Now undef the windows BOOL. */ #undef BOOL so I think that adding #undef MIN #undef MAX will solve the situation, but, as I said, I am not completely sure it's the troublemaker. Firstly I would like to check whether those are the only two bugs to be fixed. Additionally, I have no write access to trunk (or anything else in the tree), so somebody else should include the fix. > During what part of compilation is CONST_CAST2 a problem? Stage 3 in libstdc++ > > Richard. > >> Best regards, >> Piotr Wyderski >> >
Re: GCC 4.4 is not able to build itself under Cygwin
On Fri, Mar 6, 2009 at 2:05 PM, Piotr Wyderski wrote: > Richard Guenther wrote: > >> Where is windows.h included? > > I am not 100% sure yet -- the build process is in progress right > now, so I don't want to interfere. I suspect it is the one included > by gthr-win32.h. It contains the following lines: > > #include > /* Now undef the windows BOOL. */ > #undef BOOL > > so I think that adding > > #undef MIN > #undef MAX more likely #undef min #undef max > will solve the situation, but, as I said, I am not completely sure > it's the troublemaker. Firstly I would like to check whether those > are the only two bugs to be fixed. Additionally, I have no write > access to trunk (or anything else in the tree), so somebody else > should include the fix. > >> During what part of compilation is CONST_CAST2 a problem? > > Stage 3 in libstdc++ Ok, then it should probably be #if !defined(__cplusplus) #define CONST_CAST2(TOTYPE,FROMTYPE,X) ((__extension__(union {FROMTYPE _q; TOTYPE _nq;})(X))._nq) #else #define CONST_CAST2(TOTYPE,FROMTYPE,X) const_cast(x) #endif Richard. > - Show quoted text - > >> >> Richard. >> >>> Best regards, >>> Piotr Wyderski >>> >> >
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Richard Guenther wrote: > There is of course the problem that we have to be careful not to > introduce new traps via folding, a problem that doesn't exist with > the no-overflow variants (I can simply drop to the wrapping variants). > With for example (a -/v 10) +/v 10 would you want to preserve > the possibly trapping a -/v 10? For (a -/v 10) +/v (b -/v 10) do > we want to be careful to not introduce extra traps when simplifying > to (a +/v b) -/v 20? I think we should preserve the property of whether the code evaluated between sequence points causes no traps, or at least one trap. (But need not preserve the difference between one trap and two, for example. This is the same as C99 Annex F requirements for floating-point exceptions: the set of exceptions produced between calls to fenv.h functions checking or modifying exception state needs to be preserved, but not the number (> 0) of times each exception is produced.) This probably means that most folding should not apply to trapping variants. (In principle I believe folding should be done on GIMPLE rather than folding functions being called from the front ends on trees. If you do always lower to explicit overflow checks and only run folding after that lowering stage, the problem of changing whether a trap occurs does not arise. But with trapping variants in GIMPLE, the folding code would still need to allow for them even with folding all happening on GIMPLE.) > So while trapping variants can certainly be introduced it looks like > this task may be more difficult. So lowering them early during > gimplification looks like a more reasonable plan IMHO. If you lower to whatever forms of checks for overflow can be conveniently expressed in GIMPLE then you have the interesting problem of reconstituting trapping operations for those targets that have trapping instructions or instructions setting overflow flags that can be checked directly. It's possible that you want to lower to explicit overflow checks for targets without the operations, but to trapping operation codes for targets with the operations. (Which would suggest starting with trapping variants, and then lowering to explicit checks part way through the GIMPLE optimizations, only on targets without suitable instructions for the relevant type precision and depending on -Os to determine whether inline code or libgcc function calls might be better.) -- Joseph S. Myers jos...@codesourcery.com
Re: GCC 4.4 is not able to build itself under Cygwin
Piotr Wyderski wrote: > Richard Guenther wrote: > >> Where is windows.h included? > > I am not 100% sure yet -- the build process is in progress right > now, so I don't want to interfere. I suspect it is the one included > by gthr-win32.h. It contains the following lines: WAG: Do not pass --enable-threads=win32 to configure on Cygwin. Use --enable-threads=posix instead. Amirite? cheers, DaveK
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Joseph S. Myers wrote: > On Fri, 6 Mar 2009, Richard Guenther wrote: > > > There is of course the problem that we have to be careful not to > > introduce new traps via folding, a problem that doesn't exist with > > the no-overflow variants (I can simply drop to the wrapping variants). > > With for example (a -/v 10) +/v 10 would you want to preserve > > the possibly trapping a -/v 10? For (a -/v 10) +/v (b -/v 10) do > > we want to be careful to not introduce extra traps when simplifying > > to (a +/v b) -/v 20? > > I think we should preserve the property of whether the code evaluated > between sequence points causes no traps, or at least one trap. (But need > not preserve the difference between one trap and two, for example. This > is the same as C99 Annex F requirements for floating-point exceptions: the > set of exceptions produced between calls to fenv.h functions checking or > modifying exception state needs to be preserved, but not the number (> 0) > of times each exception is produced.) That sounds sensible. Of course we don't know about sequence points during folding (as that may get trees combined from several GIMPLE statements). > This probably means that most folding should not apply to trapping > variants. Indeed. > (In principle I believe folding should be done on GIMPLE rather than > folding functions being called from the front ends on trees. If you do > always lower to explicit overflow checks and only run folding after that > lowering stage, the problem of changing whether a trap occurs does not > arise. But with trapping variants in GIMPLE, the folding code would still > need to allow for them even with folding all happening on GIMPLE.) Yes. > > So while trapping variants can certainly be introduced it looks like > > this task may be more difficult. So lowering them early during > > gimplification looks like a more reasonable plan IMHO. > > If you lower to whatever forms of checks for overflow can be conveniently > expressed in GIMPLE then you have the interesting problem of > reconstituting trapping operations for those targets that have trapping > instructions or instructions setting overflow flags that can be checked > directly. It's possible that you want to lower to explicit overflow > checks for targets without the operations, but to trapping operation codes > for targets with the operations. (Which would suggest starting with > trapping variants, and then lowering to explicit checks part way through > the GIMPLE optimizations, only on targets without suitable instructions > for the relevant type precision and depending on -Os to determine whether > inline code or libgcc function calls might be better.) Ok, I didn't think of generating optimal code for the overflow checks, but it should be reasonably possible to if-convert the explicit overflow checks back to builtins that can be specially expanded (that would also help explicit overflow checks in source code). So my opinion still holds that explicit overflow checks would be prefered. Richard.
Re: GCC 4.4 is not able to build itself under Cygwin
Dave Korn wrote: > Piotr Wyderski wrote: >> Richard Guenther wrote: >> >>> Where is windows.h included? >> I am not 100% sure yet -- the build process is in progress right >> now, so I don't want to interfere. I suspect it is the one included >> by gthr-win32.h. It contains the following lines: > > WAG: Do not pass --enable-threads=win32 to configure on Cygwin. Use > --enable-threads=posix instead. Amirite? Gah, yes of course, you showed the config in your first post. Well, that's the problem. cheers, DaveK
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Richard Guenther wrote: > Ok, I didn't think of generating optimal code for the overflow checks, > but it should be reasonably possible to if-convert the explicit > overflow checks back to builtins that can be specially expanded > (that would also help explicit overflow checks in source code). It looks like only alpha and pa presently have insn patterns such as addvsi3 that would be used by the present -ftrapv code, but I expect several other processors also have instructions that would help in overflow-checking code. (For example, Power Architecture has versions of many arithmetic instructions that set overflow flags, so such instructions could be used followed by conditional traps.) -- Joseph S. Myers jos...@codesourcery.com
Re: New no-undefined-overflow branch
> So while trapping variants can certainly be introduced it looks like > this task may be more difficult. I don't think you need to introduce trapping tree codes. You can introduce them directly in the front-end as s = x +nv y (((s ^ x) & (s ^ y)) < 0) ? trap () : s d = x -nv y (((d ^ x) & (x ^ y)) < 0) ? trap () : d (b == INT_MIN ? trap () : -nv b) (int)((long long) a * (long long) b) == a *nv b ? trap () : a *nv b Making sure they are compiled efficiently is another story, but especially for the sake of LTO I think this is the way to go. Paolo
Re: New no-undefined-overflow branch
On Fri, Mar 6, 2009 at 3:29 PM, Paolo Bonzini wrote: > >> So while trapping variants can certainly be introduced it looks like >> this task may be more difficult. > > I don't think you need to introduce trapping tree codes. You can > introduce them directly in the front-end as > > s = x +nv y I think this should be s = x + y otherwise the compiler can assume that for the following check the addition did not overflow. > (((s ^ x) & (s ^ y)) < 0) ? trap () : s > > d = x -nv y > (((d ^ x) & (x ^ y)) < 0) ? trap () : d > > (b == INT_MIN ? trap () : -nv b) > > (int)((long long) a * (long long) b) == a *nv b ? trap () : a *nv b > > Making sure they are compiled efficiently is another story, but > especially for the sake of LTO I think this is the way to go. I agree. Btw, for the addition case we generate leal(%rsi,%rdi), %eax xorl%eax, %esi xorl%eax, %edi testl %edi, %esi jns .L2 .value 0x0b0f .L2: rep ret which isn't too bad. Richard.
Re: New no-undefined-overflow branch
Richard Guenther wrote: > On Fri, Mar 6, 2009 at 3:29 PM, Paolo Bonzini wrote: >>> So while trapping variants can certainly be introduced it looks like >>> this task may be more difficult. >> I don't think you need to introduce trapping tree codes. You can >> introduce them directly in the front-end as >> >> s = x +nv y > > I think this should be > > s = x + y > (((s ^ x) & (s ^ y)) < 0) ? trap () : s > > otherwise the compiler can assume that for the following check > the addition did not overflow. Ah yeah I've not yet looked at the patches and I did not know which one was which. I actually wrote x + y first and then went back to carefully check them. :-P >> Making sure they are compiled efficiently is another story, but >> especially for the sake of LTO I think this is the way to go. > > I agree. Btw, for the addition case we generate > > leal(%rsi,%rdi), %eax > xorl%eax, %esi > xorl%eax, %edi > testl %edi, %esi > jns .L2 > .value 0x0b0f > .L2: > rep > ret > > which isn't too bad. Well, for x86 it requires the addends to die. This is unfortunately four insns, and combine has a limit of three. but maybe you could make combine recognize the check and turn it to an addv pattern (with the add result unused!); and then CSE or maybe combine as well would, well, eliminate the duplicate ADD... If this does not work, on ARM you can also hope for something like this: ADDR0, R1, R2 XORS R0, R2, R3 XORSMI R1, R2, R3 SWIMI #trap But hey, whatever you get, it's anyway faster than a libcall. :-) Of course there are better choices for x+CONSTANT; using (b == INT_MIN ? trap () : -b) for negation is one example. Paolo
Re: New no-undefined-overflow branch
Richard Guenther writes: > I agree. Btw, for the addition case we generate > > leal(%rsi,%rdi), %eax > xorl%eax, %esi > xorl%eax, %edi > testl %edi, %esi > jns .L2 > .value 0x0b0f > .L2: > rep > ret > > which isn't too bad. But I think it would normally be faster to do something like movl%esi,%eax addl%edi,%eax jo .Ltrap ret .Ltrap: diediedie Ian
Re: New no-undefined-overflow branch
On Fri, Mar 6, 2009 at 4:09 PM, Paolo Bonzini wrote: > Richard Guenther wrote: >> On Fri, Mar 6, 2009 at 3:29 PM, Paolo Bonzini wrote: So while trapping variants can certainly be introduced it looks like this task may be more difficult. >>> I don't think you need to introduce trapping tree codes. You can >>> introduce them directly in the front-end as >>> >>> s = x +nv y >> >> I think this should be >> >> s = x + y >> (((s ^ x) & (s ^ y)) < 0) ? trap () : s >> >> otherwise the compiler can assume that for the following check >> the addition did not overflow. > > Ah yeah I've not yet looked at the patches and I did not know which one > was which. I actually wrote x + y first and then went back to carefully > check them. :-P > >>> Making sure they are compiled efficiently is another story, but >>> especially for the sake of LTO I think this is the way to go. >> >> I agree. Btw, for the addition case we generate >> >> leal (%rsi,%rdi), %eax >> xorl %eax, %esi >> xorl %eax, %edi >> testl %edi, %esi >> jns .L2 >> .value 0x0b0f >> .L2: >> rep >> ret >> >> which isn't too bad. > > Well, for x86 it requires the addends to die. > > This is unfortunately four insns, and combine has a limit of three. but > maybe you could make combine recognize the check and turn it to an addv > pattern (with the add result unused!); and then CSE or maybe combine as > well would, well, eliminate the duplicate ADD... Well, I was thinking about detecting the pattern on the tree level instead. s_6 = x.0_2 + y.1_4; D.1597_7 = s_6 ^ x_1(D); D.1598_8 = s_6 ^ y_3(D); D.1599_9 = D.1597_7 & D.1598_8; if (D.1599_9 < 0) goto ; else goto ; : __builtin_trap (); : This should be recognizable in the ifcombine pass for example, which recognizes CFG patterns. Transforming it to just s_6 = __builtin_addv (x.0_2, y.1_4); : Only ifcombine runs a little too early for that maybe. Richard.
Re: New no-undefined-overflow branch
On Mar 6, 2009, at 09:15, Joseph S. Myers wrote: It looks like only alpha and pa presently have insn patterns such as addvsi3 that would be used by the present -ftrapv code, but I expect several other processors also have instructions that would help in overflow-checking code. (For example, Power Architecture has versions of many arithmetic instructions that set overflow flags, so such instructions could be used followed by conditional traps.) Most architectures have similar flags or conditions, but during my work on more efficient overflow checking for Ada I have become less convinced of the need for them. For Ada code, the overflow checking is now less expensive than many other checks. In most cases, one of the operands will be either constant or have a known sign. Then the overflow check can be expanded as a simple comparison. The benefit of this is that later optimization passes can use these tests to derive range information, combine checks, or fold them. When using some opaque construct, removing it will be hard. Also, for many languages calling abort() for overflow would not be desirable, and an exception should be raised. Doing this directly from expanded code rather than using a trap handler will avoid the need to write target-specific trap handlers and has the advantage that a message with location information can easily be included. In any case, while I'd really like to move the checked signed integer overflow from Gigi (GNAT-to-GNU tree translator) to the language independent part of GCC, I want to have the absolute minimum amount of changes that is necessary to achieve this goal. Since only Ada uses integer overflow checking at this point, any non-standard GIMPLE will be a maintenance burden and is likely to be mishandled by later optimizers. When we lower all checks during gimplification, we can re-implement -ftrapv while avoiding all of the pitfalls the old implementation had. In particular, there has to be a clear distinction between which signed integer operations must be checked and which not. During compilation many new operations are created and these operations need to have clear semantics. To me that is the great improvement of the no-undefined-overflow branch. -Geert
Re: New no-undefined-overflow branch
On Mar 6, 2009, at 04:11, Richard Guenther wrote: I didn't spend too much time thinking about the trapping variants (well, I believe it isn't that important ;)). But in general we would have to expand the non-NV variants via the trapping expanders if flag_trapv was true (so yeah, combining TUs with different flag_trapv settings would be difficult again and it would ask for explicitly encoding this variant in the IL ...). The non-NV variants have wrap-around semantics on the no-undefined- overflow branch, right? I'm not about to change that based on some global flag! :) I'm proposing something like: {PLUS,MINUS,MULT,NEGATE}_EXPR: - signed integer operation with wrap-around {PLUS,MINUS,MULT,NEGATE)NV_EXPR - signed integer operations known to not overflow {PLUS,MINUS,MULT,NEGATE)V_EXPR - signed integer operation with overflow check that traps, aborts or raises an exception on overflow There is of course the problem that we have to be careful not to introduce new traps via folding, a problem that doesn't exist with the no-overflow variants (I can simply drop to the wrapping variants). With for example (a -/v 10) +/v 10 would you want to preserve the possibly trapping a -/v 10? For (a -/v 10) +/v (b -/v 10) do we want to be careful to not introduce extra traps when simplifying to (a +/v b) -/v 20? Indeed, there has to be a very clear boundary as we'd be changing semantics. The original -ftrapv implementation muddled that issue, something I absolutely want to avoid So while trapping variants can certainly be introduced it looks like this task may be more difficult. So lowering them early during gimplification looks like a more reasonable plan IMHO. Right, that was my intention. Still, I'll need to add code to handle the new tree codes in fold(), right? -Geert
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Richard Guenther wrote: > Well, I was thinking about detecting the pattern on the tree level instead. > > s_6 = x.0_2 + y.1_4; > D.1597_7 = s_6 ^ x_1(D); > D.1598_8 = s_6 ^ y_3(D); > D.1599_9 = D.1597_7 & D.1598_8; > if (D.1599_9 < 0) > goto ; > else > goto ; > > : > __builtin_trap (); > > : > > This should be recognizable in the ifcombine pass for example, which > recognizes CFG patterns. Transforming it to just > > s_6 = __builtin_addv (x.0_2, y.1_4); > > : > > Only ifcombine runs a little too early for that maybe. In addition to detecting to transform into something like the above (for addv insn patterns or libgcc function), you may also want to detect when a constant has been propagated into the above and make sure it can get optimized into a range check on the non-constant operand. (I don't know if existing optimizers will be able to handle the above with a constant for one of the operands and convert it to a range check, or whether special code would be needed.) -- Joseph S. Myers jos...@codesourcery.com
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Paolo Bonzini wrote: > I don't think you need to introduce trapping tree codes. You can > introduce them directly in the front-end as Multiple front ends want the same thing. This is why it would be better to introduce the codes in GENERIC and have the language-independent gimplifier contain the code to lower them, even if they don't become part of GIMPLE. >(int)((long long) a * (long long) b) == a *nv b ? trap () : a *nv b This is not a solution for trapping multiplication in the widest supported type. I'm sure you can represent that with e.g. some checks for whether values are positive or negative and then dividing the largest / smallest values of the type by one of the arguments, but you might not want to lose symmetry like that until after constant propagation has had a chance to make one of the arguments constant (since if that happens you get a range check on the other argument). -- Joseph S. Myers jos...@codesourcery.com
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Geert Bosch wrote: > > this task may be more difficult. So lowering them early during > > gimplification looks like a more reasonable plan IMHO. > > Right, that was my intention. Still, I'll need to add code to > handle the new tree codes in fold(), right? If you add new trapping codes to GENERIC I'd recommend *not* making fold() handle them. I don't think much folding is safe for the trapping codes when you want to avoid either removing or introducing traps. Either lower the codes in gimplification, or handle them explicitly in a few GIMPLE optimizations e.g. when constants are propagated in, but avoid general folding for them. Front ends should set TREE_SIDE_EFFECTS on trapping expressions so that fold knows it can't discard a subexpression (whose value doesn't matter to the value of the final expression) containing a trapping expression, e.g. 0 * (a +trap b) needs to evaluate (a +trap b) for its side effects. With this done, front ends generating trapping codes for -ftrapv and fold not trying to optimize the trapping codes, I'd hope fold and the rest of the language-independent compiler could stop caring about flag_trapv. -- Joseph S. Myers jos...@codesourcery.com
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Geert Bosch wrote: > In any case, while I'd really like to move the checked signed > integer overflow from Gigi (GNAT-to-GNU tree translator) to > the language independent part of GCC, I want to have the absolute > minimum amount of changes that is necessary to achieve this goal. I think the absolute minimum is that new codes for trapping operations are added to GENERIC but not GIMPLE, that Gigi generates those codes rather than explicit checks, and the lowering to explicit checks is done in the gimplifier (possibly based on code presently in Gigi). -- Joseph S. Myers jos...@codesourcery.com
Re: New no-undefined-overflow branch
Joseph S. Myers wrote: > On Fri, 6 Mar 2009, Paolo Bonzini wrote: > >> I don't think you need to introduce trapping tree codes. You can >> introduce them directly in the front-end as > > Multiple front ends want the same thing. This is why it would be better > to introduce the codes in GENERIC and have the language-independent > gimplifier contain the code to lower them, even if they don't become part > of GIMPLE. I see your point. What I'm worried of, is that this codes would be tested more lightly and, until folding is a middle-end thing only, the risk of unwanted optimization on -ftrapv code would be high. You can have common code shared by front-ends. They could apply it at GENERICization time (Fortran, Ada) or directly while parsing (C, C++). >>(int)((long long) a * (long long) b) == a *nv b ? trap () : a *nv b > > This is not a solution for trapping multiplication in the widest supported > type. There's always range checking, I was pointing out optimization possibilities; the above one can be optimized like (h,l) = a*b if (h != l >> 31) trap ();// signed shift Paolo
[plugins] [patch] allows to see AST in C
Dear GCC developers, I tried to use plugin frameworks svn://gcc.gnu.org/svn/gcc/branches/plugin (old?) and svn://gcc.gnu.org/svn/gcc/branches/plugins (current). 'plugin' has transform_ctrees() API. - callback on every fndecl 'plugins' has PLUGIN_CXX_CP_PRE_GENERICIZE. - only supported in C++ This patch (for the plugins branch) allows to see AST in C. Thanks, Kenji Koyanagi Index: c-decl.c === --- c-decl.c(revision 144675) +++ c-decl.c(working copy) @@ -63,6 +63,7 @@ #include "langhooks-def.h" #include "pointer-set.h" #include "gimple.h" +#include "plugin.h" /* In grokdeclarator, distinguish syntactic contexts of declarators. */ enum decl_context @@ -6791,6 +6792,8 @@ { if (!decl_function_context (fndecl)) { + /* Invoke registered plugin callbacks if any. */ + invoke_plugin_callbacks (PLUGIN_C_PRE_GENERICIZE, fndecl); c_genericize (fndecl); c_gimple_diagnostics_recursively (fndecl); Index: gcc-plugin.h === --- gcc-plugin.h(revision 144675) +++ gcc-plugin.h(working copy) @@ -25,6 +25,7 @@ PLUGIN_PASS_MANAGER_SETUP,/* To hook into pass manager. */ PLUGIN_FINISH_STRUCT, /* After finishing parsing a struct/ class. */ PLUGIN_FINISH_UNIT, /* Useful for summary processing. */ + PLUGIN_C_PRE_GENERICIZE, /* Allows to see low level AST in C FE. */ PLUGIN_CXX_CP_PRE_GENERICIZE, /* Allows to see low level AST in C+ + FE. */ PLUGIN_FINISH,/* Called before GCC exits. */ PLUGIN_EVENT_LAST /* Dummy event used for indexing callback Index: plugin.c === --- plugin.c(revision 144675) +++ plugin.c(working copy) @@ -435,6 +435,7 @@ break; case PLUGIN_FINISH_STRUCT: case PLUGIN_FINISH_UNIT: + case PLUGIN_C_PRE_GENERICIZE: case PLUGIN_CXX_CP_PRE_GENERICIZE: case PLUGIN_FINISH: { @@ -473,6 +474,7 @@ { case PLUGIN_FINISH_STRUCT: case PLUGIN_FINISH_UNIT: + case PLUGIN_C_PRE_GENERICIZE: case PLUGIN_CXX_CP_PRE_GENERICIZE: case PLUGIN_FINISH: {
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Paolo Bonzini wrote: > Joseph S. Myers wrote: > > On Fri, 6 Mar 2009, Paolo Bonzini wrote: > > > >> I don't think you need to introduce trapping tree codes. You can > >> introduce them directly in the front-end as > > > > Multiple front ends want the same thing. This is why it would be better > > to introduce the codes in GENERIC and have the language-independent > > gimplifier contain the code to lower them, even if they don't become part > > of GIMPLE. > > I see your point. What I'm worried of, is that this codes would be > tested more lightly and, until folding is a middle-end thing only, the > risk of unwanted optimization on -ftrapv code would be high. The risk (and fact) of unwanted optimization is there at present. Simply using new codes and not teaching fold about them will reduce the risk over the present state where many bits of code have to check flag_trapv. -- Joseph S. Myers jos...@codesourcery.com
Re: New no-undefined-overflow branch
Joseph S. Myers wrote: > On Fri, 6 Mar 2009, Geert Bosch wrote: > >>> this task may be more difficult. So lowering them early during >>> gimplification looks like a more reasonable plan IMHO. >> Right, that was my intention. Still, I'll need to add code to >> handle the new tree codes in fold(), right? > > If you add new trapping codes to GENERIC I'd recommend *not* making fold() > handle them. Constant folding should be done for them, though. > Either lower > the codes in gimplification, or handle them explicitly in a few GIMPLE > optimizations e.g. when constants are propagated in, but avoid general > folding for them. Definitely the former. Paolo
Re: New no-undefined-overflow branch
On Fri, 2009-03-06 at 16:09 +0100, Paolo Bonzini wrote: > If this does not work, on ARM you can also hope for something like this: > > ADDR0, R1, R2 > XORS R0, R2, R3 > XORSMI R1, R2, R3 > SWIMI #trap On ARM you can just check for overflow directly... ADDSR0, R1, R2 SWIVS #trap :=)
Re: Minimum required GNAT version for bootstrap of GNAT on gcc trunk
On Thu, 2009-03-05 at 18:24 +0100, Laurent GUERBY wrote: > On Tue, 2009-02-24 at 20:40 +0100, Laurent GUERBY wrote: > > On Tue, 2009-02-24 at 19:36 +, Dave Korn wrote: > > > Laurent GUERBY wrote: > > > > On Tue, 2009-02-24 at 18:59 +, Dave Korn wrote: > > > >> Laurent GUERBY wrote: > > > >> > > > >>> I'm not sure 3.4 will work for trunk > > > >> I was just entirely unable to get 3.4.4 to bootstrap a 4.3.2 > > > >> compiler. I > > > >> used 4.3.0 and it worked. I forget what I used to build the 4.3.0 > > > >> with in the > > > >> first place. I think the documentation needs updating to say you'll > > > >> need to > > > >> use an early/intermediate 4.x version to bootstrap your way past the > > > >> 3.x/4.x > > > >> boundary, but I couldn't say exactly where the break lies. > > > > > > > > Was this on cygwin or Linux? > > > > > > Cygwin. Would that be likely to make a difference? > > > > Depending on the error it could make a difference. The oldest > > system I have access to is Ubuntu 5.10 which I believe > > came with 4.0 GCC default and optional 3.4 GCC packages (the system is > > offline right now so I rely on memory, I'll confirm > > when it comes back online). > > The old system just came back online so I checked and GCC 3.4.6 (the > oldest version I have, released Mar 06 2006) is able to bootstrap c,ada > for 4.3.2 release (2008-08-27) on Linux i686. > > So at least 3.4.x is able to build 4.3.x with Ada enabled on Linux. > > I will build and test ability of 3.4.0 (released Apr 18 2004) to 5 to > bootstrap 4.3.x in the coming weeks. On i686-linux for c,ada: 3.4.4 bootstraped 4.3.2 fine (your configuration) 3.4.0 bootstraped 4.3.1 fine 3.4.0 bootstraped trunk 143919 fine I'm not going to test more, a priori what the documentation currently says for Ada is correct at least for i686-linux, the minimum required GNAT version for bootstrap is 3.4.x. Sincerely, Laurent
Re: [RFC] Better debug info by substitution tracking for inliner (and other passes eliminating whole user variables)
On Mar 5, 2009, Jan Hubicka wrote: > The patch adds mechanizm for tracking inline substitutions, so we > actually note that on whole scope of its existence ARGUMENT is having > same value as OTHERARGUMENT. Beautiful! > So if we want to have DEBUG_INSN approach, those aproaches should > combine themselves well: inline substitution tracking will handle most > cases chaply and DEBUG_INSN will help to keep the substitutions valid. I concur. I don't see any incompatibility whatsoever. It's a perfect fit, and a great improvement. > I make attempt in tree-inline to look up stable enough replacement in > simplify_replacement (i.e. walking SSA graph and trying to find something > that is not going to disappear, as function parameter, user variable > or address of user variable) and I am not making any attempts to update > replacement removing them when variables they refer to become dead > though I think we can do some job here. VTA could help here, I suppose, if the value of the dead variable remains available somewhere. Gotta think a bit about this, but I don't see that this would be too difficult. > Finally when it comes to dwarf encoding the main problem seems to be that > dwarf insist on describing function value as location in memory. This is > not quite compatible with values passed by reference where we optimize out > the pointer to value and thus we still know address it points to but we no > longer have memory location for the pointer itself. I feel your pain ;-) > There is DW_OP_value in dwarf4 for that but it is not supported by > GDB. Long term, this is certainly what we should use. > Important fact is also that tracking is very cheap. +1 > So I wonder if there are some comments? Three words: way to go! :-) > Since dwarf allows to express memory location consisting > of multiple places, it should be possible to fully track simple SRAed out > struct Yup. > we are however lost when we have pointer to those struct since there > is no means describing "there is no memory location for this pointer, > but it would be pointing to this structure" except for my > aforementioned hack of unpeeling references/pointers. Uhh... I get the impression that, if the struct could be fully SRAed, that's because any pointers to it were determined to be totally dead, in which case not having debug info for it is perfectly legitimate (complete and correct, as per the VTA criteria). Or are you speaking of cases in which the struct is only temporarily SRAed, with the pieces put back in place whenever a pointer that could point to it is dereferenced? This case would be tricky to handle, indeed. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: New no-undefined-overflow branch
>> If this does not work, on ARM you can also hope for something like this: >> >> ADDR0, R1, R2 >> XORS R0, R2, R3 >> XORSMI R1, R2, R3 >> SWIMI #trap > > On ARM you can just check for overflow directly... > > ADDSR0, R1, R2 > SWIVS #trap Of course, I was thinking explicitly of what happens with no MD support. Paolo
Re: New no-undefined-overflow branch
Richard Earnshaw writes: > On Fri, 2009-03-06 at 16:09 +0100, Paolo Bonzini wrote: >> If this does not work, on ARM you can also hope for something like this: >> >> ADDR0, R1, R2 >> XORS R0, R2, R3 >> XORSMI R1, R2, R3 >> SWIMI #trap > > On ARM you can just check for overflow directly... > > ADDSR0, R1, R2 > SWIVS #trap This point should not be missed. Some processors (MIPS) have trapping arithmetic instructions, but many processors have an overflow flag which can be tested. Any useful design for -ftrapv needs to make it possible to use that overflow flag in the generated code. It will always be more efficient than using arithmetic to check for overflow. Ian
Re: New no-undefined-overflow branch
Ian Lance Taylor wrote: Richard Earnshaw writes: On Fri, 2009-03-06 at 16:09 +0100, Paolo Bonzini wrote: If this does not work, on ARM you can also hope for something like this: ADDR0, R1, R2 XORS R0, R2, R3 XORSMI R1, R2, R3 SWIMI #trap On ARM you can just check for overflow directly... ADDSR0, R1, R2 SWIVS #trap This point should not be missed. Some processors (MIPS) have trapping arithmetic instructions, but many processors have an overflow flag which can be tested. Any useful design for -ftrapv needs to make it possible to use that overflow flag in the generated code. It will always be more efficient than using arithmetic to check for overflow. One needs to be careful on efficiency here, thinking about pipelines etc. for example, into looks nice on the x86, but can be expensive to use compared with explicit tests of the overflow flag. Ian
Re: Minimum required GNAT version for bootstrap of GNAT on gcc trunk
Laurent GUERBY wrote: > > On i686-linux for c,ada: > > 3.4.4 bootstraped 4.3.2 fine (your configuration) > 3.4.0 bootstraped 4.3.1 fine > 3.4.0 bootstraped trunk 143919 fine > > I'm not going to test more, a priori what the documentation > currently says for Ada is correct at least for i686-linux, > the minimum required GNAT version for bootstrap is 3.4.x. Thanks Laurent. I guess I'll have to try and reproduce the problem to figure out why cygwin and linux diverge. cheers, DaveK
Re: New no-undefined-overflow branch
On Mar 6, 2009, at 12:22, Joseph S. Myers wrote: If you add new trapping codes to GENERIC I'd recommend *not* making fold() handle them. I don't think much folding is safe for the trapping codes when you want to avoid either removing or introducing traps. Either lower the codes in gimplification, or handle them explicitly in a few GIMPLE optimizations e.g. when constants are propagated in, but avoid general folding for them. The point here is not to think in terms of the old -trapv and trapping instructions, but instead at the slightly higher level of a well-defined model of signed integer arithmetic. That is why signed integer overflow checking and the no-undefined-overflow branch are closely related. There are essentially two models to evaluate a signed integer expression. The one Ada uses is that a check may be omitted if the value of the expression, in absence of the check, would have no effect on the external interactions of the program. An implementation need not always raise an exception when a language- defined check fails. Instead, the operation that failed the check can simply yield an undefined result. The exception need be raised by the implementation only if, in the absence of raising it, the value of this undefined result would have some effect on the external interactions of the program. In determining this, the implementation shall not presume that an undefined result has a value that belongs to its subtype, nor even to the base range of its type, if scalar. Having removed the raise of the exception, the canonical semantics will in general allow the implementation to omit the code for the check, and some or all of the operation itself. The other one is the one you suggest: Front ends should set TREE_SIDE_EFFECTS on trapping expressions so that fold knows it can't discard a subexpression (whose value doesn't matter to the value of the final expression) containing a trapping expression, e.g. 0 * (a +trap b) needs to evaluate (a +trap b) for its side effects. With this done, front ends generating trapping codes for -ftrapv and fold not trying to optimize the trapping codes, I'd hope fold and the rest of the language-independent compiler could stop caring about flag_trapv. Setting the TREE_SIDE_EFFECTS seriously limits optimizations. Also, as quality of implementation issue, while an expression may have an undefined result, if that result is not used, removing the entire computation is generally preferable over raising an exception. Arguments can be made for and against both models, so probably we could make setting of TREE_SIDE_EFFECTS optional. -Geert
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Geert Bosch wrote: > > On Mar 6, 2009, at 12:22, Joseph S. Myers wrote: > > If you add new trapping codes to GENERIC I'd recommend *not* making fold() > > handle them. I don't think much folding is safe for the trapping codes > > when you want to avoid either removing or introducing traps. Either lower > > the codes in gimplification, or handle them explicitly in a few GIMPLE > > optimizations e.g. when constants are propagated in, but avoid general > > folding for them. > > The point here is not to think in terms of the old -trapv and trapping > instructions, but instead at the slightly higher level of a well-defined > model of signed integer arithmetic. That is why signed integer overflow > checking and the no-undefined-overflow branch are closely related. > > There are essentially two models to evaluate a signed integer expression. > The one Ada uses is that a check may be omitted if the value of the > expression, > in absence of the check, would have no effect on the external interactions > of the program. > > > An implementation need not always raise an exception when a language-defined > > check > > fails. Instead, the operation that failed the check can simply yield an > > undefined > > result. The exception need be raised by the implementation only if, in the > > absence > > of raising it, the value of this undefined result would have some effect on > > the > > external interactions of the program. In determining this, the > > implementation shall > > not presume that an undefined result has a value that belongs to its > > subtype, > > nor even to the base range of its type, if scalar. Having removed the raise > > of > > the exception, the canonical semantics will in general allow the > > implementation > > to omit the code for the check, and some or all of the operation itself. > > The other one is the one you suggest: > > Front ends should set TREE_SIDE_EFFECTS on trapping expressions so that > > fold knows it can't discard a subexpression (whose value doesn't matter to > > the value of the final expression) containing a trapping expression, e.g. > > 0 * (a +trap b) needs to evaluate (a +trap b) for its side effects. With > > this done, front ends generating trapping codes for -ftrapv and fold not > > trying to optimize the trapping codes, I'd hope fold and the rest of the > > language-independent compiler could stop caring about flag_trapv. > > Setting the TREE_SIDE_EFFECTS seriously limits optimizations. Also, as quality > of implementation issue, while an expression may have an undefined result, > if that result is not used, removing the entire computation is generally > preferable over raising an exception. Arguments can be made for and against > both > models, so probably we could make setting of TREE_SIDE_EFFECTS optional. Yes. If we introduce new GENERIC tree codes for overflow checking operations that is certainly useful. Though once you lowered GENERIC to explicit checks and trapping (or raising exceptions) then this side-effect is explicit and we probably won't remove them anymore. I played with the idea to just implement a -ftrapv substitute by lowering non-NV operation variants during gimplification, but of course new GENERIC tree codes are more flexible. You can use no-undefined-overflow branch to commit patches once you have them. Thanks, Richard.
gcc-4.4-20090306 is now available
Snapshot gcc-4.4-20090306 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20090306/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 144680 You'll find: gcc-4.4-20090306.tar.bz2 Complete GCC (includes all of below) gcc-core-4.4-20090306.tar.bz2 C front end and core compiler gcc-ada-4.4-20090306.tar.bz2 Ada front end and runtime gcc-fortran-4.4-20090306.tar.bz2 Fortran front end and runtime gcc-g++-4.4-20090306.tar.bz2 C++ front end and runtime gcc-java-4.4-20090306.tar.bz2 Java front end and runtime gcc-objc-4.4-20090306.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.4-20090306.tar.bz2The GCC testsuite Diffs from 4.4-20090227 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: New no-undefined-overflow branch
On Fri, 6 Mar 2009, Paolo Bonzini wrote: > Joseph S. Myers wrote: > > On Fri, 6 Mar 2009, Geert Bosch wrote: > > > >>> this task may be more difficult. So lowering them early during > >>> gimplification looks like a more reasonable plan IMHO. > >> Right, that was my intention. Still, I'll need to add code to > >> handle the new tree codes in fold(), right? > > > > If you add new trapping codes to GENERIC I'd recommend *not* making fold() > > handle them. > > Constant folding should be done for them, though. Yes, constant folding should be done, so that very tiny subset of fold needs to handle them. The vast bulk of fold that does other more complicated transformations should not, in my view. (There may be isolated transformations that are safe for these codes. For example, you could transform (x *trap 2) *trap 2 to x *trap 4 (this is a case where two traps can become one), but not (x *trap 2) *trap 0 to x *trap 0 or (x *trap -1) *trap -1 to x *trap 1. I think however it would be reasonable to replace -ftrapv without going through all fold transformations to enable the safe ones for the new tree codes; it's better to start by making it reliable and then look at cases where it can be made to generate better code. And I expect that a large proportion of what fold does is not safe for trapping codes.) -- Joseph S. Myers jos...@codesourcery.com
The gcc-in-cxx branch now completes bootstrap
I'm happy to report that the gcc-in-cxx branch can now bootstrap. That is, the code in gcc proper can now be compiled with a C++ compiler. My plan going forward is as follows (when we are back in stage 1): * For each difference between trunk and gcc-in-cxx: + Try to implement a -Wc++-compat warning which detects the change. - If it is possible, implement the warning, and make the changes to let gcc bootstrap with the warning. - If a warning is not possible for some reason, I will simply propose the change by itself. I expect this will be a small subset of the changes, mostly related to the build system and to low-level configuration like ansidecl.h. + This process will eventually eliminate all differences between trunk and gcc-in-cxx, at which point gcc-in-cxx can be retired. * Implement a configure option, --enable-c++-build or something like that, which builds gcc with a C++ compiler. + Start running regular builds with that option, to avoid any regressions in C++ buildability for cases for which there is no -Wc++-compat warning. * Begin the lobbying process for changing the default value of the configure option. I certainly welcome help on any of these steps. Ian
bitwise dataflow
I'm thinking about adding bitwise dataflow analysis support to RTL. Is this a good idea? Bad idea? Already done? Please review if interested. Thank you, Silvius Motivation == int foo(int x) { int i = 100; do { if (x > 0) x = x & 1; /* After this insn, all bits except 1 are 0. */ else x = x & 2; /* After this insn, all bits except 2 are 0. */ i--; } while (i); return x & 4; /* Before this insn, only bits 1 and 2 may be nonzero. */ } "foo" should simply return 0. This optimization is currently missed at -O2 and -O3 on x86_64. (Cases with simpler control are solved by the "combine" pass.) Proposal 1. Dataflow framework to propagate bitwise register properties. (Integrated with the current dataflow framework.) 2. Forward bitwise dataflow analysis: constant bit propagation. 3. Backward bitwise dataflow analysis: dead bit propagation. 4. Target applications: improve dce and see. (Others?) Preliminary Design == 1. I don't have enough understanding of GCC to decide whether it should be done at RTL level or tree level. After some brainstorming with Ian Taylor, Diego Novillo and others, we decided to go with RTL. 2. This problem could be solved using iterative dataflow with bitmaps. However, memory size would increase significantly compared to scalar analysis, as would processing time. For constant bit propagation, we need to keep 2 bits of state for each register bit. For 64 bit registers, that's a factor of 128x over the scalar reaching definition problem. Instead, I propose a sparse bitwise dataflow framework. We would still use the existing RTL dataflow framework to build scalar DU/UD chains. Once they are available, bitwise information is propagated only over these chains, analogous to the sparse constant propagation described by Wegman & Zadeck TOPLAS 1991. 3. This might be too much detail at this point, but just in case, here is a brief description of a bit constant propagation algorithm. For each instruction I in the function body For each register R in instruction I def_constant_bits(I, R) = collect constants from AND/OR/... operations. Iterate until the def_constant_bits don't change: For each instruction I in the function body For each register R used at I use_constant_bits(I, R) = merge (def_constant_bits(D, R)) across all definitions D of R that reach I For each register R defined at I def_constant_bits(I, R) = transfer (use_constant_bits(I, RU)) for all register uses RU, based on opcodes. The data structures and routines "collect", "merge" and "transfer" depend on the problem solved. 4. Complexity considerations. The solver visits every DU edge once for each iteration of the fixed point convergence loop. The maximum number of iterations is given by the height of the state lattice multiplied by the number of bits. Although this can be as high as 128 for constant bit propagation on x86_64, in practice we expect much lower times. Also, lower complexity guarantees can be given if less accurate information is allowed, e.g., byte level rather than bit level. For byte constants, the upper bound constant factor drops from 128 to 16. Some of these ideas came from discussions with Preston Briggs, Sriraman Tallam and others.
Re: bitwise dataflow
I'm thinking about adding bitwise dataflow analysis support to RTL. Before embarking on this, I'd suggest playing with the bitwise domain analysis that one of my students did as part of his cXprop tool: http://www.cs.utah.edu/~coop/research/cxprop/#DOWNLOADS This is a source-level analysis in CIL and so is not quite analogous to what you propose. However, it should give you some ideas about what kind of results you can expect at the RTL level. Our experience was that the bitwise domain is not that powerful. On the other hand, it converges quickly compared to intervals. John Regehr
Re: bitwise dataflow
Sent from my iPhone On Mar 6, 2009, at 7:00 PM, Silvius Rus wrote: I'm thinking about adding bitwise dataflow analysis support to RTL. Is this a good idea? Bad idea? Already done? Please review if interested. There is already some bitwise dataflow implemented in combine. And I think df supports bytewise already but it is turned off because of problems with some targets and modes. I also think Steven B. had an implementation on the tree level or at least ideas on how to implement there. Thank you, Silvius Motivation == int foo(int x) { int i = 100; do { if (x > 0) x = x & 1; /* After this insn, all bits except 1 are 0. */ else x = x & 2; /* After this insn, all bits except 2 are 0. */ i--; } while (i); return x & 4; /* Before this insn, only bits 1 and 2 may be nonzero. */ } "foo" should simply return 0. This optimization is currently missed at -O2 and -O3 on x86_64. (Cases with simpler control are solved by the "combine" pass.) Proposal 1. Dataflow framework to propagate bitwise register properties. (Integrated with the current dataflow framework.) 2. Forward bitwise dataflow analysis: constant bit propagation. 3. Backward bitwise dataflow analysis: dead bit propagation. 4. Target applications: improve dce and see. (Others?) Preliminary Design == 1. I don't have enough understanding of GCC to decide whether it should be done at RTL level or tree level. After some brainstorming with Ian Taylor, Diego Novillo and others, we decided to go with RTL. 2. This problem could be solved using iterative dataflow with bitmaps. However, memory size would increase significantly compared to scalar analysis, as would processing time. For constant bit propagation, we need to keep 2 bits of state for each register bit. For 64 bit registers, that's a factor of 128x over the scalar reaching definition problem. Instead, I propose a sparse bitwise dataflow framework. We would still use the existing RTL dataflow framework to build scalar DU/UD chains. Once they are available, bitwise information is propagated only over these chains, analogous to the sparse constant propagation described by Wegman & Zadeck TOPLAS 1991. 3. This might be too much detail at this point, but just in case, here is a brief description of a bit constant propagation algorithm. For each instruction I in the function body For each register R in instruction I def_constant_bits(I, R) = collect constants from AND/OR/... operations. Iterate until the def_constant_bits don't change: For each instruction I in the function body For each register R used at I use_constant_bits(I, R) = merge (def_constant_bits(D, R)) across all definitions D of R that reach I For each register R defined at I def_constant_bits(I, R) = transfer (use_constant_bits(I, RU)) for all register uses RU, based on opcodes. The data structures and routines "collect", "merge" and "transfer" depend on the problem solved. 4. Complexity considerations. The solver visits every DU edge once for each iteration of the fixed point convergence loop. The maximum number of iterations is given by the height of the state lattice multiplied by the number of bits. Although this can be as high as 128 for constant bit propagation on x86_64, in practice we expect much lower times. Also, lower complexity guarantees can be given if less accurate information is allowed, e.g., byte level rather than bit level. For byte constants, the upper bound constant factor drops from 128 to 16. Some of these ideas came from discussions with Preston Briggs, Sriraman Tallam and others.