Re: RFC: LRA for x86/x86-64 [7/9]
On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford wrote: > >> +/* Compress pseudo live ranges by removing program points where >> + nothing happens. Complexity of many algorithms in LRA is linear >> + function of program points number. To speed up the code we try to >> + minimize the number of the program points here. */ >> +static void >> +remove_some_program_points_and_update_live_ranges (void) > > Genuine question, but could we do this on the fly instead, > by not incrementing curr_point if the current point had no value? > > I suppose the main complication would be checking cases where > all births are recorded by extending the previous just-closed live > range rather than starting a new one, in which case it's the previous > point that needs to be reused. Hmm... It does seem to be something worth investigating further. Things like: Compressing live ranges: from 1742579 to 554532 - 31% Compressing live ranges: from 1742569 to 73069 - 4% look extreme, but they're actually the norm. For the same test case (PR54146 still) but looking only at functions with initially 1000- program points, you get this picture: Compressing live ranges: from 1766 to 705 - 39% Compressing live ranges: from 1449 to 370 - 25% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 2464 to 676 - 27% Compressing live ranges: from 1433 to 379 - 26% Compressing live ranges: from 1261 to 348 - 27% Compressing live ranges: from 2835 to 755 - 26% Compressing live ranges: from 5426 to 1678 - 30% Compressing live ranges: from 5227 to 1477 - 28% Compressing live ranges: from 1845 to 467 - 25% Compressing live ranges: from 4868 to 1378 - 28% Compressing live ranges: from 4875 to 1388 - 28% Compressing live ranges: from 4882 to 1384 - 28% Compressing live ranges: from 5688 to 1714 - 30% Compressing live ranges: from 4943 to 1310 - 26% Compressing live ranges: from 2976 to 792 - 26% Compressing live ranges: from 5463 to 1526 - 27% Compressing live ranges: from 2854 to 730 - 25% Compressing live ranges: from 1810 to 745 - 41% Compressing live ranges: from 2771 to 904 - 32% Compressing live ranges: from 4916 to 1429 - 29% Compressing live ranges: from 6505 to 2238 - 34% Compressing live ranges: from 6493 to 166 - 2% Compressing live ranges: from 5498 to 1734 - 31% Compressing live ranges: from 1810 to 745 - 41% Compressing live ranges: from 5043 to 1420 - 28% Compressing live ranges: from 3094 to 788 - 25% Compressing live ranges: from 4563 to 1311 - 28% Compressing live ranges: from 4557 to 158 - 3% Compressing live ranges: from 1050 to 274 - 26% Compressing live ranges: from 1602 to 434 - 27% Compressing live ranges: from 2474 to 600 - 24% Compressing live ranges: from 2718 to 866 - 31% Compressing live ranges: from 2097 to 716 - 34% Compressing live ranges: from 4152 to 1099 - 26% Compressing live ranges: from 5065 to 1514 - 29% Compressing live ranges: from 1236 to 359 - 29% Compressing live ranges: from 1722 to 541 - 31% Compressing live ranges: from 5186 to 1401 - 27% Unfortunately the dump doesn't mention how many live ranges could be merged thanks to the compression. It'd also be good to understand why the compression ratios are so small, and consistently around ~30%. Maybe curr_point includes things it should ignore (DEBUG_INSNs, NOTEs, ...)? Ciao! Steven
[Ada] Connect_Socket with timeout does not report failure correctly
This change fixes an oversight in the Connect_Socket variant that includes a timeout check. The connection is initiated asynchronously, and a select(2) call is performed to wait for it to complete, with a timeout. When that call returns however, the connection is not always succsefully established. It may also happen that an error occurred during the connection attempt (typically, connection refused by the remote host), which needs to be reported through the raising of an exception. This change adds the missing error check. The following test case should report: Client is trying to connect to Port: 11552 Expected exception raised with Text_IO;use Text_IO; with Gnat.Sockets; use Gnat.Sockets; with Ada.Command_Line; with Ada.Exceptions; use Ada.Exceptions; procedure Timed_Conn_Error is Port : constant Port_Type := 11552; -- There should be no listening server on this port Socket: Socket_Type := No_Socket; Address : Sock_Addr_Type; Host : constant String := Host_Name; Status: Selector_Status; begin Address.Addr := Addresses (Get_Host_By_Name (Host), 1); Address.Port := Port; Create_Socket (Socket); Set_Socket_Option (Socket, Socket_Level, (Reuse_Address, True)); Put_Line ("Client is trying to connect to Port:" & Port'Img & " "); Connect_Socket (Socket => Socket, Server => Address, Timeout => Forever, Status => Status); begin String'Output (Stream (Socket), "Hello"); exception when E : others => Put_Line ("Unexpected exception, should not reach this point"); Put_Line ("Info: " & Exception_Information (E)); end; exception when E : others => Put_Line ("Expected exception raised"); Put_Line ("Info: " & Exception_Information (E)); end Timed_Conn_Error; Tested on x86_64-pc-linux-gnu, committed on trunk 2012-10-03 Thomas Quinot * g-socket.adb (Connect_Socket, version with timeout): When the newly-connected socket is reported as available for writing, check whether it has a pending asynchronous error prior to returning. Index: g-socket.adb === --- g-socket.adb(revision 192025) +++ g-socket.adb(working copy) @@ -123,7 +123,7 @@ function Resolve_Error (Error_Value : Integer; From_Errno : Boolean := True) return Error_Type; - -- Associate an enumeration value (error_type) to en error value (errno). + -- Associate an enumeration value (error_type) to an error value (errno). -- From_Errno prevents from mixing h_errno with errno. function To_Name (N : String) return Name_Type; @@ -702,6 +702,13 @@ Req : Request_Type; -- Used to set Socket to non-blocking I/O + Conn_Err : aliased Integer; + -- Error status of the socket after completion of select(2) + + Res : C.int; + Conn_Err_Size : aliased C.int := Conn_Err'Size / 8; + -- For getsockopt(2) call + begin if Selector /= null and then not Is_Open (Selector.all) then raise Program_Error with "closed selector"; @@ -735,10 +742,32 @@ Selector => Selector, Status => Status); + -- Check error condition (the asynchronous connect may have terminated + -- with an error, e.g. ECONNREFUSED) if select(2) completed. + + if Status = Completed then + Res := C_Getsockopt + (C.int (Socket), SOSC.SOL_SOCKET, SOSC.SO_ERROR, +Conn_Err'Address, Conn_Err_Size'Access); + + if Res /= 0 then +Conn_Err := Socket_Errno; + end if; + + else + Conn_Err := 0; + end if; + -- Reset the socket to blocking I/O Req := (Name => Non_Blocking_IO, Enabled => False); Control_Socket (Socket, Request => Req); + + -- Report error condition if any + + if Conn_Err /= 0 then + Raise_Socket_Error (Conn_Err); + end if; end Connect_Socket;
[Ada] Additional information on subtype conformance error
This patch adds a clarifying message info when subtype conformance fails, due to a missing null exclusion indicatar in a formal that must match a controlling access formal. This Ada 2005 rule was checked partially in the context of subprogram renamings but not for 'Access attribute references. Compiling alpha.adb in gnat05 mode must be rejected with: alpha.adb:6:19: not subtype conformant with declaration at beta.ads:3 alpha.adb:6:19: controlling formal "Ref" of "Updated" excludes null, declaration must exclude null as well --- with Beta; package Alpha is type Object is tagged limited null record; procedure Start; procedure Updated (Ref : access Object) is null; end Alpha; --- package body Alpha is procedure Start is begin Beta.Set (Updated'Access); end Start; end Alpha; --- limited with Alpha; package Beta is type Callback is access procedure (Ref : access Alpha.Object); procedure Set (Proc : in Callback) is null; end Beta; Tested on x86_64-pc-linux-gnu, committed on trunk 2012-10-03 Ed Schonberg * sem_ch6.adb (Check_Conformance): Additional info when subtype conformance fails, due to a missing null exclusion indicatar in a formal that must match a controlling access formal. Index: sem_ch6.adb === --- sem_ch6.adb (revision 192025) +++ sem_ch6.adb (working copy) @@ -5756,14 +5756,31 @@ declare TSS_Name : constant TSS_Name_Type := Get_TSS_Name (New_Id); + begin if TSS_Name /= TSS_Stream_Read and then TSS_Name /= TSS_Stream_Write and then TSS_Name /= TSS_Stream_Input and then TSS_Name /= TSS_Stream_Output then - Conformance_Error - ("\type of & does not match!", New_Formal); + -- Here we have a definite conformance error. It is worth + -- special casesing the error message for the case of a + -- controlling formal (which excludes null). + + if Is_Controlling_Formal (New_Formal) then +Error_Msg_Node_2 := Scope (New_Formal); +Conformance_Error + ("\controlling formal& of& excludes null, " + & "declaration must exclude null as well", +New_Formal); + + -- Normal case (couldn't we give more detail here???) + + else +Conformance_Error + ("\type of & does not match!", New_Formal); + end if; + return; end if; end;
[Ada] Ada/C++ missing call to constructor with defaults
When the type of an object is a CPP untagged type and the object initialization requires calling its default C++ constructor, the Ada compiler did not generate the call to a C++ constructor which has all parameters with defaults (and hence it covers the default C++ constructor). // c_class.h class Tester { public: Tester(unsigned int a_num = 5, char* a_className = 0); }; // c_class.cc #include "c_class.h" #include Tester::Tester(unsigned int a_num, char* a_className) { std::cout << " ctor Tester called " << a_num << ":"; if (a_className == 0) { std::cout << "null"; } else { std::cout << a_className; } std::cout << std::endl; } -- c_class_h.ads with Interfaces.C; use Interfaces.C; with Interfaces.C.Strings; package c_class_h is package Class_Tester is type Tester is limited record null; end record; pragma Import (CPP, Tester); function New_Tester -- Modified by hand (a_num : unsigned := 5; a_className : Interfaces.C.Strings.chars_ptr := Interfaces.C.Strings.Null_Ptr) return Tester; -- c_class.h:3 pragma CPP_Constructor (New_Tester, "_ZN6TesterC1EjPc"); end; use Class_Tester; end c_class_h; -- main.adb with c_class_h; use c_class_h; procedure Main is use Class_Tester; Ptr : access Tester := new Tester; -- TEST pragma Unreferenced (Ptr); begin null; end main; -- ada2cpp.gpr project Ada2Cppc is for Languages use ("Ada", "C++"); for Main use ("main.adb"); package Naming is for Implementation_Suffix ("C++") use ".cc"; end Naming; for Source_Dirs use ("."); for Object_Dir use "obj"; package Compiler is for Default_Switches ("ada") use ("-g", "-gnato", "-gnatwa", "-gnatQ", "-gnat05", "-gnatD"); end Compiler; package Builder is for Default_Switches ("ada") use ("-g"); end Builder; package Ide is for Compiler_Command ("ada") use "gnatmake"; for Compiler_Command ("c") use "gcc"; end Ide; end Ada2Cppc; Command: gprbuild -q -P ada2cppc.gpr; obj/main Output: ctor Tester called 5:null Tested on x86_64-pc-linux-gnu, committed on trunk 2012-10-03 Javier Miranda * exp_disp.adb (Set_CPP_Constructors_Old): Handle constructor of untagged type that has all its parameters with defaults and hence it covers the default constructor. Index: exp_disp.adb === --- exp_disp.adb(revision 192025) +++ exp_disp.adb(working copy) @@ -8459,6 +8459,8 @@ P : Node_Id; Parms : List_Id; + Covers_Default_Constructor : Entity_Id := Empty; + begin -- Look for the constructor entities @@ -8490,7 +8492,8 @@ Make_Defining_Identifier (Loc, Chars (Defining_Identifier (P))), Parameter_Type => - New_Copy_Tree (Parameter_Type (P; + New_Copy_Tree (Parameter_Type (P)), + Expression => New_Copy_Tree (Expression (P; Next (P); end loop; end if; @@ -8508,6 +8511,17 @@ Set_Convention (Init, Convention_CPP); Set_Is_Public (Init); Set_Has_Completion (Init); + + -- If this constructor has parameters and all its parameters + -- have defaults then it covers the default constructor. The + -- semantic analyzer ensures that only one constructor with + -- defaults covers the default constructor. + + if Present (Parameter_Specifications (Parent (E))) + and then Needs_No_Actuals (E) + then + Covers_Default_Constructor := Init; + end if; end if; Next_Entity (E); @@ -8519,6 +8533,49 @@ if not Found then Set_Is_Abstract_Type (Typ); end if; + + -- Handle constructor that has all its parameters with defaults and + -- hence it covers the default constructor. We generate a wrapper IP + -- which calls the covering constructor. + + if Present (Covers_Default_Constructor) then +declare + Body_Stmts: List_Id; + Wrapper_Id: Entity_Id; + Wrapper_Body_Node : Node_Id; +begin + Loc := Sloc (Covers_Default_Constructor); + + Body_Stmts := New_List ( + Make_Procedure_Call_Statement (Loc, + Name => New_Reference_To (Covers_Default_Constructor, Loc), + Parameter_Associations => New_List ( + Make_Identifier (Loc, Name_uInit; + + Wrapper_Id := Make_Defining_Identifier (Loc, +
[Ada] Ada/C++ missing call to allocation of C++ object with defaults
When the type of an object is a CPP untagged type, the object is allocated in memory through the "new" construct, and the object initialization requires calling its C++ constructor passing defaults, the Ada compiler did not generate the call to a C++ constructor which has all parameters with defaults (and hence it covers the default C++ constructor). The following test now executes well. // c_class.h class Tester { public: Tester(unsigned int a_num = 5, char* a_className = 0); }; // c_class.cc #include "c_class.h" #include Tester::Tester(unsigned int a_num, char* a_className) { std::cout << " ctor Tester called " << a_num << ":"; if (a_className == 0) { std::cout << "null"; } else { std::cout << a_className; } std::cout << std::endl; } -- c_class_h.ads with Interfaces.C; use Interfaces.C; with Interfaces.C.Strings; package c_class_h is package Class_Tester is type Tester is limited record null; end record; pragma Import (CPP, Tester); function New_Tester (a_num : unsigned := 5; a_className : Interfaces.C.Strings.chars_ptr := Interfaces.C.Strings.Null_Ptr) return Tester; -- c_class.h:3 pragma CPP_Constructor (New_Tester, "_ZN6TesterC1EjPc"); end; use Class_Tester; end c_class_h; project Ada2Cppc is for Languages use ("Ada", "C++"); for Main use ("main.adb"); package Naming is for Implementation_Suffix ("C++") use ".cc"; end Naming; for Source_Dirs use ("."); package Compiler is for Default_Switches ("ada") use ("-gnat05"); end Compiler; package Builder is for Default_Switches ("ada") use ("-g"); end Builder; package Ide is for Compiler_Command ("ada") use "gnatmake"; for Compiler_Command ("c") use "gcc"; end Ide; end Ada2Cppc; Command: gprbuild -q -P ada2cppc.gpr; ./main Output: ctor Tester called 5:null Tested on x86_64-pc-linux-gnu, committed on trunk 2012-10-03 Javier Miranda * exp_ch4.adb (Expand_N_Allocator_Expression): Minor code reorganization and cleanup. Done to ensure proper management of the C++ constructor covering tagged and untagged types and also non-default constructors. * exp_ch6.ads, exp_ch6.adb (Make_CPP_Constructor_Call_In_Allocator): New subprogram. Index: exp_ch4.adb === --- exp_ch4.adb (revision 192027) +++ exp_ch4.adb (working copy) @@ -867,6 +867,15 @@ -- Start of processing for Expand_Allocator_Expression begin + -- Handle call to C++ constructor + + if Is_CPP_Constructor_Call (Exp) then + Make_CPP_Constructor_Call_In_Allocator + (Allocator => N, +Function_Call => Exp); + return; + end if; + -- In the case of an Ada 2012 allocator whose initial value comes from a -- function call, pass "the accessibility level determined by the point -- of call" (AI05-0234) to the function. Conceptually, this belongs in @@ -899,59 +908,7 @@ -- Case of tagged type or type requiring finalization if Is_Tagged_Type (T) or else Needs_Finalization (T) then - if Is_CPP_Constructor_Call (Exp) then --- Generate: ---Pnnn : constant ptr_T := new (T); ---Init (Pnnn.all,...); - --- Allocate the object without an expression - -Node := Relocate_Node (N); -Set_Expression (Node, New_Reference_To (Etype (Exp), Loc)); - --- Avoid its expansion to avoid generating a call to the default --- C++ constructor. - -Set_Analyzed (Node); - -Temp := Make_Temporary (Loc, 'P', N); - -Temp_Decl := - Make_Object_Declaration (Loc, -Defining_Identifier => Temp, -Constant_Present=> True, -Object_Definition => New_Reference_To (PtrT, Loc), -Expression => Node); -Insert_Action (N, Temp_Decl); - -Apply_Accessibility_Check (Temp); - --- Locate the enclosing list and insert the C++ constructor call - -declare - P : Node_Id; - -begin - P := Parent (Node); - while not Is_List_Member (P) loop - P := Parent (P); - end loop; - - Insert_List_After_And_Analyze (P, - Build_Initialization_Call (Loc, - Id_Ref => - Make_Explicit_Dereference (Loc, - Prefix => New_Reference_To (Temp, Loc)), - Typ => Etype (Exp), - Constructor_Ref => Exp)); -end; - -Rewrite (N, New_Reference_To (Temp, Loc)); -Analyze_And_Resolve (N, PtrT); -return; - end if; - --
[Ada] Further fixes to MINIMIZED overflow checking mode
This patch has three components. First we set SUPPRESSED mode as the default for -gnatg mode (including the run-time). This has no effect right now, but avoids hidden problems that may appear if in future we make overflow checking the default behavior. Second, in some obscure cases, the overflow checking code was generating bogus warnings for redundant type conversions from generated overflow checking code in MINIMIZED/ELIMINATED mode. No simple test has been found for this, and the conditions are quite obscure (and not well understood!) but the patch decisively eliminates such bogus warnings. Finally, we properly handle the case where the result is converted to a larger type. The following test program: 1. with Text_IO; use Text_IO; 2. procedure convov is 3.function a (x : integer) 4. return Long_Long_Integer 5.is 6.begin 7. return Long_Long_Integer (x * x); 8.end; 9. begin 10.Put_Line (a (Integer'Last)'Img); 11. end; when run in -gnato2 or -gnato3 mode, compiles quietly and outputs a single line: 4611686014132420609 Tested on x86_64-pc-linux-gnu, committed on trunk 2012-10-03 Robert Dewar * checks.adb (Minimize_Eliminate_Overflow_Checks): Properly handle case of top level expression within type conversion * gnat1drv.adb (Adjust_Global_Switches): Set SUPPRESSED as default for overflow checking for -gnatg mode (includes run-time). * sem_res.adb (Resolve_Type_Conversion): Avoid bogus warnings about redundant conversions from MINIMIZED/EXTENDED mode checking Index: checks.adb === --- checks.adb (revision 192027) +++ checks.adb (working copy) @@ -7404,6 +7404,16 @@ elsif Top_Level and then not (Bignum_Operands or Long_Long_Integer_Operands) + +-- One further refinement. If we are at the top level, but our parent +-- is a type conversion, then go into bignum or long long integer node +-- since the result will be converted to that type directly without +-- going through the result type, and we may avoid an overflow. This +-- is the case for example of Long_Long_Integer (A ** 4), where A is +-- of type Integer, and the result A ** 4 fits in Long_Long_Integer +-- but does not fit in Integer. + +and then Nkind (Parent (N)) /= N_Type_Conversion then -- Here we will keep the original types, but we do need an overflow -- check, so we will set Do_Overflow_Check to True (actually it is @@ -7561,12 +7571,6 @@ if Nkind (N) = N_Op_Expon and then Etype (Right_Opnd (N)) = LLIB then Convert_To_And_Rewrite (Standard_Natural, Right_Opnd (N)); - - -- Now Long_Long_Integer_Operands may have to be reset if that was - -- the only long long integer operand, i.e. we now have long long - -- integer operands only if the left operand is long long integer. - - Long_Long_Integer_Operands := Etype (Left_Opnd (N)) = LLIB; end if; -- Here we will do the operation in Long_Long_Integer. We do this even Index: sem_res.adb === --- sem_res.adb (revision 192025) +++ sem_res.adb (working copy) @@ -9624,6 +9624,13 @@ then null; +-- Never warn on conversion to Long_Long_Integer'Base since +-- that is most likely an artifact of the extended overflow +-- checking and comes from complex expanded code. + +elsif Orig_T = Base_Type (Standard_Long_Long_Integer) then + null; + -- Here we give the redundant conversion warning. If it is an -- entity, give the name of the entity in the message. If not, -- just mention the expression. Index: gnat1drv.adb === --- gnat1drv.adb(revision 192025) +++ gnat1drv.adb(working copy) @@ -334,6 +334,12 @@ if Opt.Suppress_Options.Overflow_Checks_General /= Not_Set then null; + -- By default suppress overflow checks in -gnatg mode + + elsif GNAT_Mode then + Suppress_Options.Overflow_Checks_General:= Suppressed; + Suppress_Options.Overflow_Checks_Assertions := Suppressed; + -- If we have backend divide and overflow checks, then by default -- overflow checks are minimized, which is a reasonable setting.
Re: [C++ PATCH] -Wsizeof-pointer-memaccess warning (take 2)
Jakub Jelinek a écrit: > --- gcc/cp/call.c.jj 2012-09-27 12:45:49.0 +0200 > +++ gcc/cp/call.c 2012-10-01 17:53:17.594609236 +0200 > @@ -557,7 +557,10 @@ null_ptr_cst_p (tree t) > { >/* Core issue 903 says only literal 0 is a null pointer constant. */ >if (cxx_dialect < cxx0x) > - t = integral_constant_value (t); > + { > + t = integral_constant_value (t); > + t = maybe_constant_value (t); > + } Just for my education, why couldn't maybe_constant_value just call integral_constant_value, so that we just use maybe_constant_value here? Even after reading the comments of these two functions, /having/ to use them like this seems confusing to the casual reader in me. -- Dodji
Re: PATCH trunk: gengtype honoring mark_hook-s inside struct inide union-s
Hello Basile - > 2012-10-02 Basile Starynkevitch > > * gengtype.c (walk_type): Emit mark_hook when inside a > struct of a union member. Can you send me off-list the gengtype output before and after the fix? + const char *structmarkhook = NULL; struct_mark_hook or just mark_hook please. + oprintf (d->of, "%*s/*structmarkhook %s */ %s (&%s));\n", +d->indent, "", t->u.s.tag, structmarkhook, oldval); No need to output the comment. Thanks, -- Laurynas
Re: [C++ PATCH] -Wsizeof-pointer-memaccess warning (take 2)
On Wed, Oct 03, 2012 at 11:14:37AM +0200, Dodji Seketeli wrote: > Jakub Jelinek a écrit: > > > --- gcc/cp/call.c.jj2012-09-27 12:45:49.0 +0200 > > +++ gcc/cp/call.c 2012-10-01 17:53:17.594609236 +0200 > > @@ -557,7 +557,10 @@ null_ptr_cst_p (tree t) > > { > >/* Core issue 903 says only literal 0 is a null pointer constant. */ > >if (cxx_dialect < cxx0x) > > - t = integral_constant_value (t); > > + { > > + t = integral_constant_value (t); > > + t = maybe_constant_value (t); > > + } > > Just for my education, why couldn't maybe_constant_value just call > integral_constant_value, so that we just use maybe_constant_value here? Perhaps maybe_constant_value already does all that integral_constant_value does, but I'm not sure about it. maybe_constant_value starts with a lot of conditions where it doesn't do anything if they are true, integral_constant_value doesn't have any such conditions, on the other side performs far less - just loops through DECL_INITIAL values looking for a const. As I said earlier, while the initial hacks around early folding of SIZEOF_EXPR were quite ugly, the amount of changes to allow late folding of SIZEOF_EXPR was huge, with user visible changes etc. Jakub
RE: [PATCH, i386]: Fix PR51109, symbol size in scheduler state machine is reduced
Testing was done before posting the patch. It was successful. Regards Ganesh -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Thursday, September 27, 2012 5:57 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH, i386]: Fix PR51109, symbol size in scheduler state machine is reduced On Thu, Sep 27, 2012 at 10:30 AM, Gopalasubramanian, Ganesh wrote: > This is a fix for PR 51109. > > There are three changes > > 1. Microcoded instructions are considered as single issue instructions > and are therefore issued to a separate execution unit. > 2. The multiplier unit is attached to execution unit 1 (ieu1). Since ieu > is handled as a separate automaton in the patch, separate "mult" automaton is > not required. > 3. The integer execution units (2AGUs and 2EXs) are now decoupled. Now, > they are described as separate automatons. > > Is it OK for upstream? > > Regards > Ganesh > > 2012-09-27 Ganesh Gopalasubramanian > > > PR 51109 > * gcc/config/i386/bdver1.md (bdver1_int): Automaton has been > split to reduce state transitions. OK for mainline, if tested according to [1]. [1] http://gcc.gnu.org/contribute.html#testing Thanks, Uros.
Re: [patch][lra] Use XNEWVEC and friends instead of xmalloc/xrealloc, and add some timevars
On Mon, Oct 1, 2012 at 1:05 PM, Steven Bosscher wrote: > Hello, > > This patch uses the libiberty new-like operators instead of using > xmalloc/xrealloc. > It also adds timevars for the main LRA phases, and it fixes a warning > suggesting a space before a ';' in an only-looping for loop. > > Bootstrapped (lra-branch, of course) on x86_64-unknown-linux-gnu. OK? Hi Vlad, Is this patch OK for the lra-branch? Ciao! Steven
Re: PATCH trunk: gengtype honoring mark_hook-s inside struct inside union-s
On Wed, Oct 03, 2012 at 12:21:02PM +0300, Laurynas Biveinis wrote: > Hello Basile - > > > 2012-10-02 Basile Starynkevitch > > > > * gengtype.c (walk_type): Emit mark_hook when inside a > > struct of a union member. > > Can you send me off-list the gengtype output before and after the fix? I messed something, the example I did send was wrong. Let's start all over again. Consider the following file (I named this example on purpose differently than in the previous message) /* file basilemarkh.h */ struct GTY ((mark_hook("mymarker"))) mytest_st { int myflag; tree mytree; gimple mygimple; }; #define MYUTAG 1 union GTY ((desc ("%0.u_int"))) myutest_un { int GTY ((skip)) u_int; struct mytest_st GTY ((tag ("MYUTAG"))) u_mytest; }; static GTY (()) union myutest_un *myutestptr; static inline void mymarker (struct mytest_st *s) { s->myflag = 1; } /* eof basilemarkh.h */ ### With the gengtype from 4.7 in Debian/Sid/AMD64 (a 4.7.2 I believe) /usr/lib/gcc/x86_64-linux-gnu/4.7/gengtype -D -v \ -r /usr/lib/gcc/x86_64-linux-gnu/4.7/gtype.state \ -P _g-4.7-basilemarkh.h basilemarkh.h from generated _g-4.7-basilemarkh.h /* functions code */ void gt_ggc_mx_myutest_un (void *x_p) { union myutest_un * const x = (union myutest_un *)x_p; if (ggc_test_and_set_mark (x)) { switch ((*x).u_int) { case MYUTAG: ## no mark hook here, should have a call to mymarker gt_ggc_m_9tree_node ((*x).u_mytest.mytree); gt_ggc_m_18gimple_statement_d ((*x).u_mytest.mygimple); break; default: break; } } } ### end of excerpt from generated _g-4.7-basilemarkh.h With the gengtype from unpatched trunk svn rev 192031 the emitted gt_ggc_mx_myutest_un is exactly the same. Of course the ### comment above has been added manually by me Basile. So I applied and I am proposing the following patch to gcc trunk 192031 (Laurynas, I did take your remarks into account) # patch to trunk Index: gcc/gengtype.c === --- gcc/gengtype.c (revision 192031) +++ gcc/gengtype.c (working copy) @@ -2810,6 +2810,7 @@ walk_type (type_p t, struct walk_type_data *d) const char *oldval = d->val; const char *oldprevval1 = d->prev_val[1]; const char *oldprevval2 = d->prev_val[2]; + const char *struct_mark_hook = NULL; const int union_p = t->kind == TYPE_UNION; int seen_default_p = 0; options_p o; @@ -2833,7 +2834,14 @@ walk_type (type_p t, struct walk_type_data *d) if (!desc && strcmp (o->name, "desc") == 0 && o->kind == OPTION_STRING) desc = o->info.string; + else if (!struct_mark_hook && strcmp (o->name, "mark_hook") == 0 + && o->kind == OPTION_STRING) + struct_mark_hook = o->info.string; + if (struct_mark_hook) + oprintf (d->of, "%*s%s (&%s));\n", +d->indent, "", struct_mark_hook, oldval); + d->prev_val[2] = oldval; d->prev_val[1] = oldprevval2; if (union_p) # end of patch with the following ChangeLog entry # gcc/ChangeLog entry 2012-10-03 Basile Starynkevitch * gengtype.c (walk_type): Emit mark_hook when inside a struct of a union member. # end of gcc/ChangeLog entry If you apply my proposed patch the emitted function is ### from generated _gtrunk-basilemarkh.h with my patch to gengtype.c /* functions code */ void gt_ggc_mx_myutest_un (void *x_p) { union myutest_un * const x = (union myutest_un *)x_p; if (ggc_test_and_set_mark (x)) { switch ((*x).u_int) { case MYUTAG: mymarker (&(*x).u_mytest)); gt_ggc_m_9tree_node ((*x).u_mytest.mytree); gt_ggc_m_18gimple_statement_d ((*x).u_mytest.mygimple); break; default: break; } } } end of excerpt from generated _gtrunk-basilemarkh.h If you just use the gengtype from trunk 192031 without my patch you get the same code as from gcc-4.7's gengtype: from generated _g192031-basilemarkh.h void gt_ggc_mx_myutest_un (void *x_p) { union myutest_un * const x = (union myutest_un *)x_p; if (ggc_test_and_set_mark (x)) { switch ((*x).u_int) { case MYUTAG: gt_ggc_m_9tree_node ((*x).u_mytest.mytree); gt_ggc_m_18gimple_statement_d ((*x).u_mytest.mygimple); break; default: break; } } } ### end of excerpt from generated _g192031-basilemarkh.h Is my patch here ok for trunk? Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement l
Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
Il 30/09/2012 11:02, Richard Sandiford ha scritto: > Uros Bizjak writes: >> On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek wrote: >>> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: After some off-line discussion with Richard, attached is v2 of the patch. 2012-09-27 Uros Bizjak PR rtl-optimization/54457 * simplify-rtx.c (simplify_subreg): Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). >>> >>> Is that a good idea even for WORD_REGISTER_OPERATIONS targets? >> >> I have bootstrapped and regtested [1] the patch on >> alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there >> were no additional failures. > > Thanks. Given Jakub's question/concern, I'd really prefer a third > opinion rather than approving it myself, but... OK if no-one objects > within 24hrs. I used to have a patch doing roughly the same thing (for more operations but not MULT). I never submitted because I didn't have the time to audit all targets after changing the canonicalization. Perhaps you can take the best of both worlds. Paolo change canonicalization Index: gcc/Makefile.in === --- gcc/Makefile.in (branch pr39726) +++ gcc/Makefile.in (working copy) @@ -2844,7 +2844,7 @@ jump.o : jump.c $(CONFIG_H) $(SYSTEM_H) simplify-rtx.o : simplify-rtx.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(RTL_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) $(REAL_H) insn-config.h \ $(RECOG_H) $(EXPR_H) $(TOPLEV_H) output.h $(FUNCTION_H) $(GGC_H) $(TM_P_H) \ - $(TREE_H) $(TARGET_H) + $(TREE_H) $(TARGET_H) $(OPTABS_H) cgraph.o : cgraph.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ langhooks.h $(TOPLEV_H) $(FLAGS_H) $(GGC_H) $(TARGET_H) $(CGRAPH_H) \ gt-cgraph.h output.h intl.h $(BASIC_BLOCK_H) debug.h $(HASHTAB_H) \ Index: gcc/simplify-rtx.c === --- gcc/simplify-rtx.c (branch pr39726) +++ gcc/simplify-rtx.c (working copy) @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. #include "output.h" #include "ggc.h" #include "target.h" +#include "optabs.h" /* Simplification and canonicalization of RTL. */ @@ -5191,6 +5192,8 @@ rtx simplify_subreg (enum machine_mode outermode, rtx op, enum machine_mode innermode, unsigned int byte) { + int is_lowpart; + /* Little bit of sanity checking. */ gcc_assert (innermode != VOIDmode); gcc_assert (outermode != VOIDmode); @@ -5300,11 +5303,13 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } + is_lowpart = subreg_lowpart_offset (outermode, innermode) == byte; + /* Merge implicit and explicit truncations. */ if (GET_CODE (op) == TRUNCATE && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) - && subreg_lowpart_offset (outermode, innermode) == byte) + && is_lowpart) return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), GET_MODE (XEXP (op, 0))); @@ -5343,7 +5348,7 @@ simplify_subreg (enum machine_mode outer The information is used only by alias analysis that can not grog partial register anyway. */ - if (subreg_lowpart_offset (outermode, innermode) == byte) + if (is_lowpart) ORIGINAL_REGNO (x) = ORIGINAL_REGNO (op); return x; } @@ -5393,6 +5398,51 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } + /* Try to move a subreg inside an arithmetic operation. */ + if (is_lowpart && ARITHMETIC_P (op) + && GET_MODE_CLASS (outermode) == MODE_INT + && GET_MODE_CLASS (innermode) == MODE_INT) +{ + enum insn_code ic; + enum machine_mode cnt_mode; + switch (GET_CODE (op)) + { + case ABS: + case NEG: + return simplify_gen_unary (GET_CODE (op), outermode, + rtl_hooks.gen_lowpart_no_emit + (outermode, XEXP (op, 0)), + outermode); + + case ASHIFT: + /* i386 always uses QImode for the shift count. Get the + appropriate mode from the optab. */ + ic = ashl_optab->handlers[outermode].insn_code; + if (ic != CODE_FOR_nothing) + cnt_mode = insn_data[ic].operand[2].mode; + else + cnt_mode = outermode; + return simplify_gen_binary (GET_CODE (op), outermode, + rtl_hooks.gen_lowpart_no_emit + (outermode, XEXP (op, 0)), + rtl_hooks.gen_lowpart_no_emit + (cnt_mode, XEXP (op, 1))); + + case PLUS: + case MINUS: + case AND: + case IOR: + case XOR: + return simplify_gen_binary (GET_CODE (op), outermode, + rtl_hooks.gen_lowpart_no_emit + (outermode, XEXP (op, 0)), + rtl_hooks.gen_lowpart_no_emit + (outermode, XEXP (op, 1))); + default: + break; + } +} + /* Optimize SUBREG truncations of zero and sign extended values. */ if ((GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == SIGN_EXTEND) Index: gcc/combine.c
Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
Hi Vlad, Some comments on lra-spills.c and lra-coalesce.c. > + The pass creates necessary stack slots and assign spilled pseudos > + to the stack slots in following way: s/assign/assigns/ > + (or insn memory constraints) might be not satisfied any more. s/might be not/might not be/ > + For some targets, the pass can spill some pseudos into hard > + registers of different class (usually into vector registers) > + instead of spilling them into memory if it is possible and > + profitable. Spilling GENERAL_REGS pseudo into SSE registers for > + modern Intel x86/x86-64 processors is an example of such > + optimization. And this is actually recommended by Intel > + optimization guide. Maybe mention core i7 specifically? "Modern" is a bit dangerous in code that'll live a long time. > +/* The structure describes a stack slot which can be used for several > + spilled pseudos. */ > +struct slot > +{ Looks like this describes "a register or stack slot" given the hard_regno case. > +/* Array containing info about the stack slots. The array element is > + indexed by the stack slot number in the range [0..slost_num). */ Typo: slots_num > + /* Each pseudo has an inherent size which comes from its own mode, > + and a total size which provides room for paradoxical subregs > + which refer to the pseudo reg in wider modes. > + > + We can use a slot already allocated if it provides both enough > + inherent space and enough total space. Otherwise, we allocate a > + new slot, making sure that it has no less inherent space, and no > + less total space, then the previous slot. */ The second part of the comment seems a bit misplaced, since the following code doesn't reuse stack slots. This is done elsewhere instead. Maybe the first part would be better above the inherent_size assignment. > + /* If we have any adjustment to make, or if the stack slot is the > + wrong mode, make a new stack slot. */ > + x = adjust_address_nv (x, GET_MODE (regno_reg_rtx[i]), adjust); We don't make a new slot here. > +/* Sort pseudos according their slot numbers putting ones with smaller > + numbers first, or last when the frame pointer is not needed. So > + pseudos with the first slot will be finally addressed with smaller > + address displacement. */ > +static int > +pseudo_reg_slot_compare (const void *v1p, const void *v2p) > +{ > + const int regno1 = *(const int *) v1p; > + const int regno2 = *(const int *) v2p; > + int diff, slot_num1, slot_num2; > + int total_size1, total_size2; > + > + slot_num1 = pseudo_slots[regno1].slot_num; > + slot_num2 = pseudo_slots[regno2].slot_num; > + if ((diff = slot_num1 - slot_num2) != 0) > +return (frame_pointer_needed > + || !FRAME_GROWS_DOWNWARD == STACK_GROWS_DOWNWARD ? diff : -diff); The comment doesn't quite describe the condition. Maybe: /* Sort pseudos according to their slots, putting the slots in the order that they should be allocated. Slots with lower numbers have the highest priority and should get the smallest displacement from the stack or frame pointer (whichever is being used). The first allocated slot is always closest to the frame pointer, so prefer lower slot numbers when frame_pointer_needed. If the stack and frame grow in the same direction, then the first allocated slot is always closest to the initial stack pointer and furthest away from the final stack pointer, so allocate higher numbers first when using the stack pointer in that case. The reverse is true if the stack and frame grow in opposite directions. */ > + total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1), > + GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode)); > + total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2), > + GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode)); > + if ((diff = total_size2 - total_size1) != 0) > +return diff; I think this could do with a bit more commentary. When is biggest_mode ever smaller than PSEUDO_REGNO_BYTES? Is that for pseudos that are only ever referenced as lowpart subregs? If so, why does PSEUDO_REGNO_BYTES matter for those registers here but not when calculating biggest_mode? > +/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS. Put the > + pseudos which did not get a spill hard register at the beginning of > + array PSEUDO_REGNOS. Return the number of such pseudos. */ It'd be worth saying that PSEUDO_REGNOS is sorted in order of highest frequency first. > + bitmap set_jump_crosses = regstat_get_setjmp_crosses (); I notice you use "set_jump" here and "setjump" in parts of 7a.patch. Probably better to use setjmp across the board. > + /* Hard registers which can not be used for any purpose at given > + program point because they are unallocatable or already allocated > + for other pseudos. */ > + HARD_REG_SET *reserved_hard_regs; > + > + if
Re: [C++ PATCH] Fix ICE with COMPOUND_EXPR in constexpr (PR c++/54777)
OK. Jason
[PATCH] Improve debug info for partial inlining (PR debug/54519, take 2)
On Tue, Sep 11, 2012 at 03:59:56PM +0200, Jakub Jelinek wrote: > As discussed in the PR, right now we do a very bad job for debug info > of partially inlined functions (both when they are kept only partially > inlined, or when partial inlining is performed, but doesn't seem to be > useful and foo.part.N is inlined back, either into the original function, or > into a function into which foo has been inlined first). > > This patch improves that by doing something similar to what ipa-prop.c does, > in particular for arguments that aren't actually passed to foo.part.N > we add debug args and corresponding debug bind and debug source bind stmts > to provide better debug info (if foo.part.N isn't inlined, then > DW_OP_GNU_parameter_ref is going to be used together with corresponding call > site arguments). > > Bootstrapped/regtested on x86_64-linux and i686-linux, some of the tests > still fail with some option combinations, am going to file a DF VTA PR for > that momentarily. Ok for trunk? Now that Alex fixed the DF VTA issue, most of the non-LTO FAILs are gone, the remaining one on x86_64 looks like a GDB bug, and there is one -Os failure on pr54519-3.c which could be either alias oracle or var-tracking bug/missing feature - basically there is a non-addressable parameter in stack slot, and vt_canon_true_dep -> canon_true_dependence thinks an argument push insn might alias with it, because it doesn't have a MEM_EXPR and ao_ref_from_mem fails. Posting an updated patch, because last night I found that even when the patch should have fixed static inline void foo (int x, int y) { asm volatile ("nop"); } static inline void bar (int z) { foo (z, 0); foo (z, 1); } int main () { bar (0); bar (1); return 0; } it didn't, there was a confusion when DECL_ORIGIN should be used and when not. This version of the patch fixes that, on this testcase (adjusted added as pr54519-6.c) p x, p y and up; p z now work well. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-10-03 Jakub Jelinek PR debug/54519 * ipa-split.c (split_function): Add debug args and debug source and normal stmts for args_to_skip which are gimple regs. * tree-inline.c (copy_debug_stmt): When inlining, adjust source debug bind stmts to debug binds of corresponding DEBUG_EXPR_DECL. * gcc.dg/guality/pr54519-1.c: New test. * gcc.dg/guality/pr54519-2.c: New test. * gcc.dg/guality/pr54519-3.c: New test. * gcc.dg/guality/pr54519-4.c: New test. * gcc.dg/guality/pr54519-5.c: New test. * gcc.dg/guality/pr54519-6.c: New test. --- gcc/ipa-split.c.jj 2012-09-25 14:26:52.612821323 +0200 +++ gcc/ipa-split.c 2012-10-02 17:41:31.030357922 +0200 @@ -1059,6 +1059,7 @@ split_function (struct split_point *spli gimple last_stmt = NULL; unsigned int i; tree arg, ddef; + VEC(tree, gc) **debug_args = NULL; if (dump_file) { @@ -1232,6 +1233,65 @@ split_function (struct split_point *spli gimple_set_block (call, DECL_INITIAL (current_function_decl)); VEC_free (tree, heap, args_to_pass); + if (args_to_skip) +for (parm = DECL_ARGUMENTS (current_function_decl), num = 0; +parm; parm = DECL_CHAIN (parm), num++) + if (bitmap_bit_p (args_to_skip, num) + && is_gimple_reg (parm)) + { + tree ddecl; + gimple def_temp; + + arg = get_or_create_ssa_default_def (cfun, parm); + if (!MAY_HAVE_DEBUG_STMTS) + continue; + if (debug_args == NULL) + debug_args = decl_debug_args_insert (node->symbol.decl); + ddecl = make_node (DEBUG_EXPR_DECL); + DECL_ARTIFICIAL (ddecl) = 1; + TREE_TYPE (ddecl) = TREE_TYPE (parm); + DECL_MODE (ddecl) = DECL_MODE (parm); + VEC_safe_push (tree, gc, *debug_args, DECL_ORIGIN (parm)); + VEC_safe_push (tree, gc, *debug_args, ddecl); + def_temp = gimple_build_debug_bind (ddecl, unshare_expr (arg), + call); + gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT); + } + if (debug_args != NULL) +{ + unsigned int i; + tree var, vexpr; + gimple_stmt_iterator cgsi; + gimple def_temp; + + push_cfun (DECL_STRUCT_FUNCTION (node->symbol.decl)); + var = BLOCK_VARS (DECL_INITIAL (node->symbol.decl)); + i = VEC_length (tree, *debug_args); + cgsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR)); + do + { + i -= 2; + while (var != NULL_TREE +&& DECL_ABSTRACT_ORIGIN (var) + != VEC_index (tree, *debug_args, i)) + var = TREE_CHAIN (var); + if (var == NULL_TREE) + break; + vexpr = make_node (DEBUG_EXPR_DECL); + parm = VEC_index (tree, *debug_args, i); + DECL_ARTIFICIAL (vexpr) = 1; + TREE_TYPE (vexpr) = TREE_TYPE (parm); + DECL_MODE (vexpr) = DECL_MODE (parm)
[PATCH] Fix up DW_TAG_formal_parameter placement
Hi! With the PR54519 patch I've just posted, I've noticed, I've noticed on the same testcase from yesterday's IRC: static inline void foo (int x, int y) { asm volatile ("nop"); } static inline void bar (int z) { foo (z, 0); foo (z, 1); } int main () { bar (0); bar (1); return 0; } that while I can print x and y just fine, if I do bt, x, y and z printed in the backtrace are all optimized out. The problem is that first tree versioning for foo.isra.* or bar.isra.* deposits the optimized away parameters as VAR_DECLs into the DECL_INITIAL block (which is fine), but then during inlining they end up in the remapped block of DECL_INITIAL, not the new block added above it into which inliner puts parameters. So in the debug info we have DW_TAG_inlined_subroutine DW_TAG_formal_parameter for non-optimized away parameters DW_TAG_lexical_block DW_TAG_formal_parameter for optimized away parameters and the debugger (expectably) looks only at DW_TAG_inlined_subroutine DIE's immediate children for the formal parameters to print during backtrace. Fixed up by moving the VAR_DECLs for parameters optimized away by versioning to BLOCK_SUPERCONTEXT during inlining, at that point we know both of the blocks have the same scope, and if the original DECL_INITIAL doesn't contain any other vars, we can actually end up with shorter/more correct debug info as well as memory savings due to being able to GC the remapped DECL_INITIAL block. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-10-03 Jakub Jelinek * tree-inline.c (expand_call_inline): Move VAR_DECLs with PARM_DECL origins from remapped DECL_INITIAL's BLOCK_VARS into id->block's BLOCK_VARS. --- gcc/tree-inline.c.jj2012-10-02 17:43:13.0 +0200 +++ gcc/tree-inline.c 2012-10-02 19:43:52.576382413 +0200 @@ -3946,7 +3946,29 @@ expand_call_inline (basic_block bb, gimp initialize_inlined_parameters (id, stmt, fn, bb); if (DECL_INITIAL (fn)) -prepend_lexical_block (id->block, remap_blocks (DECL_INITIAL (fn), id)); +{ + tree *var; + + prepend_lexical_block (id->block, remap_blocks (DECL_INITIAL (fn), id)); + gcc_checking_assert (BLOCK_SUBBLOCKS (id->block) + && (BLOCK_CHAIN (BLOCK_SUBBLOCKS (id->block)) + == NULL_TREE)); + /* Move vars for PARM_DECLs from DECL_INITIAL block to id->block, +otherwise DW_TAG_formal_parameter will not be children of +DW_TAG_inlined_subroutine, but of a DW_TAG_lexical_block +under it. The parameters can be then evaluated in the debugger, +but don't show in backtraces. */ + for (var = &BLOCK_VARS (BLOCK_SUBBLOCKS (id->block)); *var; ) + if (TREE_CODE (DECL_ORIGIN (*var)) == PARM_DECL) + { + tree v = *var; + *var = TREE_CHAIN (v); + TREE_CHAIN (v) = BLOCK_VARS (id->block); + BLOCK_VARS (id->block) = v; + } + else + var = &TREE_CHAIN (*var); +} /* Return statements in the function body will be replaced by jumps to the RET_LABEL. */ Jakub
[PATCH] Fix recently introduced -fcompare-debug failures in the scheduler (PR rtl-optimization/54792)
Hi! While bootstrapping the PR54519 patches last night, I've noticed a comparison failure on i686-linux. The problem was -g vs. -g0 swapping two instructions in tree-inline.o, caused by find_modifiable_mems scanning [head, tail) instead of [head, tail] sequence of instructions. Without -g0 on tree-inline.c we had a MEM load as last insn in a bb (i.e. tail) which wasn't processed by find_modifiable_mems, while with -g there were some DEBUG_INSNs after it and find_mems was called on it and triggerring the optimization. Fixed by scanning also the tail insn, the same way as other spots in sched-deps.c handle [head, tail] walk. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-10-03 Jakub Jelinek PR rtl-optimization/54792 * sched-deps.c (find_modifiable_mems): Scan also TAIL insn. --- gcc/sched-deps.c.jj 2012-10-01 10:08:43.0 +0200 +++ gcc/sched-deps.c2012-10-03 11:35:57.215589982 +0200 @@ -4816,10 +4816,10 @@ find_mem (struct mem_inc_info *mii, rtx void find_modifiable_mems (rtx head, rtx tail) { - rtx insn; + rtx insn, next_tail = NEXT_INSN (tail); int success_in_block = 0; - for (insn = head; insn != tail; insn = NEXT_INSN (insn)) + for (insn = head; insn != next_tail; insn = NEXT_INSN (insn)) { struct mem_inc_info mii; Jakub
RFC: Using DECL_NO_LIMIT_STACK as a backend specific flag
Hi Guys, Is there a target specific way to add a flag bit to a function decl ? For the RX port I want to be able to mark a function decl for which I have issued a warning message. Unfortunately I could find no easy way to do this other than stealing one of the generic bits (no_limit_stack in this case). Is there a proper way to do this ? I have attached my current implementation to demonstrate the problem. Cheers Nick Index: gcc/config/rx/rx.c === --- gcc/config/rx/rx.c (revision 192034) +++ gcc/config/rx/rx.c (working copy) @@ -1288,6 +1288,24 @@ target_reinit (); } + if (current_is_fast_interrupt && rx_warn_multiple_fast_interrupts) +{ + static tree prev_fast_interrupt = NULL_TREE; + + if (prev_fast_interrupt == NULL_TREE) + prev_fast_interrupt = fndecl; + else if (prev_fast_interrupt != fndecl + && ! DECL_NO_LIMIT_STACK (fndecl)) + { + warning (0, "multiple fast interrupt routines seen: %qE and %qE", + fndecl, prev_fast_interrupt); + /* FIXME: We use the no_limit_stack bit in the tree_function_decl +structure to mark functions for which we have issued this warning. +There probably ought to be another way to do this. */ + DECL_NO_LIMIT_STACK (fndecl) = 1; + } +} + rx_previous_fndecl = fndecl; } Index: gcc/config/rx/rx.opt === --- gcc/config/rx/rx.opt(revision 192034) +++ gcc/config/rx/rx.opt(working copy) @@ -118,3 +118,9 @@ mpid Target Mask(PID) Enables Position-Independent-Data (PID) mode. + +;--- + +mwarn-multiple-fast-interrupts +Target Report Var(rx_warn_multiple_fast_interrupts) Init(1) Warning +Warn when multiple, different, fast interrupt handlers are in the compilation unit. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 192034) +++ gcc/doc/invoke.texi (working copy) @@ -859,6 +859,7 @@ -mmax-constant-size=@gol -mint-register=@gol -mpid@gol +-mno-warn-multiple-fast-interrupts@gol -msave-acc-in-interrupts} @emph{S/390 and zSeries Options} @@ -17875,6 +17876,15 @@ By default this feature is not enabled. The default can be restored via the @option{-mno-pid} command-line option. +@item -mno-warn-multiple-fast-interrupts +@itemx -mwarn-multiple-fast-interrupts +@opindex mno-warn-multiple-fast-interrupts +@opindex mwarn-multiple-fast-interrupts +Prevents GCC from issuing a warning message if it finds more than one +fast interrupt handler when it is compiling a file. The default is to +issue a warning for each extra fast interrupt handler found, as the RX +only supports one such interrupt. + @end table @emph{Note:} The generic GCC command-line option @option{-ffixed-@var{reg}}
Re: RFC: Using DECL_NO_LIMIT_STACK as a backend specific flag
On Wed, Oct 3, 2012 at 6:31 AM, Nick Clifton wrote: > > Is there a target specific way to add a flag bit to a function decl ? > > For the RX port I want to be able to mark a function decl for which I > have issued a warning message. Unfortunately I could find no easy way > to do this other than stealing one of the generic bits (no_limit_stack > in this case). Is there a proper way to do this ? Can't you just keep a list of the decls for which you have issued the warning? How many decls do you expect to be on that list? Ian
Re: [PATCH] Fix recently introduced -fcompare-debug failures in the scheduler (PR rtl-optimization/54792)
On 12-10-03 9:01 AM, Jakub Jelinek wrote: Hi! While bootstrapping the PR54519 patches last night, I've noticed a comparison failure on i686-linux. The problem was -g vs. -g0 swapping two instructions in tree-inline.o, caused by find_modifiable_mems scanning [head, tail) instead of [head, tail] sequence of instructions. Without -g0 on tree-inline.c we had a MEM load as last insn in a bb (i.e. tail) which wasn't processed by find_modifiable_mems, while with -g there were some DEBUG_INSNs after it and find_mems was called on it and triggerring the optimization. Fixed by scanning also the tail insn, the same way as other spots in sched-deps.c handle [head, tail] walk. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-10-03 Jakub Jelinek PR rtl-optimization/54792 * sched-deps.c (find_modifiable_mems): Scan also TAIL insn. Yes. Thanks, Jakub.
Re: [patch][lra] Use XNEWVEC and friends instead of xmalloc/xrealloc, and add some timevars
On 12-10-03 6:53 AM, Steven Bosscher wrote: On Mon, Oct 1, 2012 at 1:05 PM, Steven Bosscher wrote: Hello, This patch uses the libiberty new-like operators instead of using xmalloc/xrealloc. It also adds timevars for the main LRA phases, and it fixes a warning suggesting a space before a ';' in an only-looping for loop. Bootstrapped (lra-branch, of course) on x86_64-unknown-linux-gnu. OK? Hi Vlad, Is this patch OK for the lra-branch? Yes. Thanks, Steven. I found only one typo: +DEFTIMEVAR (TV_LRA_ELIMINATE , "LRA virtuals eliminatenon") ^
Re: RFC: LRA for x86/x86-64 [7/9]
On 12-10-03 3:13 AM, Steven Bosscher wrote: On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford wrote: +/* Compress pseudo live ranges by removing program points where + nothing happens. Complexity of many algorithms in LRA is linear + function of program points number. To speed up the code we try to + minimize the number of the program points here. */ +static void +remove_some_program_points_and_update_live_ranges (void) Genuine question, but could we do this on the fly instead, by not incrementing curr_point if the current point had no value? I suppose the main complication would be checking cases where all births are recorded by extending the previous just-closed live range rather than starting a new one, in which case it's the previous point that needs to be reused. Hmm... It does seem to be something worth investigating further. Things like: Compressing live ranges: from 1742579 to 554532 - 31% Compressing live ranges: from 1742569 to 73069 - 4% look extreme, but they're actually the norm. For the same test case (PR54146 still) but looking only at functions with initially 1000- program points, you get this picture: Compressing live ranges: from 1766 to 705 - 39% Compressing live ranges: from 1449 to 370 - 25% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 2464 to 676 - 27% Compressing live ranges: from 1433 to 379 - 26% Compressing live ranges: from 1261 to 348 - 27% Compressing live ranges: from 2835 to 755 - 26% Compressing live ranges: from 5426 to 1678 - 30% Compressing live ranges: from 5227 to 1477 - 28% Compressing live ranges: from 1845 to 467 - 25% Compressing live ranges: from 4868 to 1378 - 28% Compressing live ranges: from 4875 to 1388 - 28% Compressing live ranges: from 4882 to 1384 - 28% Compressing live ranges: from 5688 to 1714 - 30% Compressing live ranges: from 4943 to 1310 - 26% Compressing live ranges: from 2976 to 792 - 26% Compressing live ranges: from 5463 to 1526 - 27% Compressing live ranges: from 2854 to 730 - 25% Compressing live ranges: from 1810 to 745 - 41% Compressing live ranges: from 2771 to 904 - 32% Compressing live ranges: from 4916 to 1429 - 29% Compressing live ranges: from 6505 to 2238 - 34% Compressing live ranges: from 6493 to 166 - 2% Compressing live ranges: from 5498 to 1734 - 31% Compressing live ranges: from 1810 to 745 - 41% Compressing live ranges: from 5043 to 1420 - 28% Compressing live ranges: from 3094 to 788 - 25% Compressing live ranges: from 4563 to 1311 - 28% Compressing live ranges: from 4557 to 158 - 3% Compressing live ranges: from 1050 to 274 - 26% Compressing live ranges: from 1602 to 434 - 27% Compressing live ranges: from 2474 to 600 - 24% Compressing live ranges: from 2718 to 866 - 31% Compressing live ranges: from 2097 to 716 - 34% Compressing live ranges: from 4152 to 1099 - 26% Compressing live ranges: from 5065 to 1514 - 29% Compressing live ranges: from 1236 to 359 - 29% Compressing live ranges: from 1722 to 541 - 31% Compressing live ranges: from 5186 to 1401 - 27% Unfortunately the dump doesn't mention how many live ranges could be merged thanks to the compression. It'd also be good to understand why the compression ratios are so small, and consistently around ~30%. This sentence is not clear to me. 30% means 3 times less points. It is pretty good compression. Maybe curr_point includes things it should ignore (DEBUG_INSNs, NOTEs, ...)? After the compression, there are only points important for conflict info, i.e. only points where some reg dies or start living. Even more if on the subsequent points there are only life starts or only deaths, they are represented by one point after the compression.
Re: RFC: Using DECL_NO_LIMIT_STACK as a backend specific flag
Hi Ian, Can't you just keep a list of the decls for which you have issued the warning? Yes - that would work too. In fact I agree that this would be cleaner solution in my particular case. I'll create a new patch... How many decls do you expect to be on that list? Not very many. Maybe two or three at most. But I am interested to know if there is a way for targets to add their own information to function decls (and decls in general). If not for this particular case, then for problems to come in the future. Cheers Nick
[patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])
On Wed, Oct 3, 2012 at 4:56 PM, Vladimir Makarov wrote: > On 12-10-03 3:13 AM, Steven Bosscher wrote: >> >> On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford >> wrote: +/* Compress pseudo live ranges by removing program points where + nothing happens. Complexity of many algorithms in LRA is linear + function of program points number. To speed up the code we try to + minimize the number of the program points here. */ +static void +remove_some_program_points_and_update_live_ranges (void) >>> >>> Genuine question, but could we do this on the fly instead, >>> by not incrementing curr_point if the current point had no value? >>> >>> I suppose the main complication would be checking cases where >>> all births are recorded by extending the previous just-closed live >>> range rather than starting a new one, in which case it's the previous >>> point that needs to be reused. Hmm... >> >> It does seem to be something worth investigating further. Things like: >> >> Compressing live ranges: from 1742579 to 554532 - 31% >> Compressing live ranges: from 1742569 to 73069 - 4% >> >> look extreme, but they're actually the norm. For the same test case >> (PR54146 still) but looking only at functions with initially 1000- >> program points, you get this picture: >> >> Compressing live ranges: from 1766 to 705 - 39% >> Compressing live ranges: from 1449 to 370 - 25% >> Compressing live ranges: from 3939 to 1093 - 27% >> Compressing live ranges: from 3939 to 1093 - 27% >> Compressing live ranges: from 3939 to 1093 - 27% >> Compressing live ranges: from 3939 to 1093 - 27% >> Compressing live ranges: from 2464 to 676 - 27% >> Compressing live ranges: from 1433 to 379 - 26% >> Compressing live ranges: from 1261 to 348 - 27% >> Compressing live ranges: from 2835 to 755 - 26% >> Compressing live ranges: from 5426 to 1678 - 30% >> Compressing live ranges: from 5227 to 1477 - 28% >> Compressing live ranges: from 1845 to 467 - 25% >> Compressing live ranges: from 4868 to 1378 - 28% >> Compressing live ranges: from 4875 to 1388 - 28% >> Compressing live ranges: from 4882 to 1384 - 28% >> Compressing live ranges: from 5688 to 1714 - 30% >> Compressing live ranges: from 4943 to 1310 - 26% >> Compressing live ranges: from 2976 to 792 - 26% >> Compressing live ranges: from 5463 to 1526 - 27% >> Compressing live ranges: from 2854 to 730 - 25% >> Compressing live ranges: from 1810 to 745 - 41% >> Compressing live ranges: from 2771 to 904 - 32% >> Compressing live ranges: from 4916 to 1429 - 29% >> Compressing live ranges: from 6505 to 2238 - 34% >> Compressing live ranges: from 6493 to 166 - 2% >> Compressing live ranges: from 5498 to 1734 - 31% >> Compressing live ranges: from 1810 to 745 - 41% >> Compressing live ranges: from 5043 to 1420 - 28% >> Compressing live ranges: from 3094 to 788 - 25% >> Compressing live ranges: from 4563 to 1311 - 28% >> Compressing live ranges: from 4557 to 158 - 3% >> Compressing live ranges: from 1050 to 274 - 26% >> Compressing live ranges: from 1602 to 434 - 27% >> Compressing live ranges: from 2474 to 600 - 24% >> Compressing live ranges: from 2718 to 866 - 31% >> Compressing live ranges: from 2097 to 716 - 34% >> Compressing live ranges: from 4152 to 1099 - 26% >> Compressing live ranges: from 5065 to 1514 - 29% >> Compressing live ranges: from 1236 to 359 - 29% >> Compressing live ranges: from 1722 to 541 - 31% >> Compressing live ranges: from 5186 to 1401 - 27% >> >> Unfortunately the dump doesn't mention how many live ranges could be >> merged thanks to the compression. >> >> It'd also be good to understand why the compression ratios are so >> small, and consistently around ~30%. > > This sentence is not clear to me. 30% means 3 times less points. It is > pretty good compression. > >> Maybe curr_point includes things >> it should ignore (DEBUG_INSNs, NOTEs, ...)? >> > After the compression, there are only points important for conflict info, > i.e. only points where some reg dies or start living. Even more if on the > subsequent points there are only life starts or only deaths, they are > represented by one point after the compression. With this patch the amount of compression is reduced. Without the patch the compression ratio is typically around 30%, with the patch this goes up to ~65%. The worst compression ratios (where worse is better in this case :-) are: $ grep Compressing log4 | egrep " [78].%" Compressing live ranges: from 31 to 22 - 70%, pre_count 17, post_count 15 Compressing live ranges: from 128 to 94 - 73%, pre_count 87, post_count 77 Compressing live ranges: from 32 to 23 - 71%, pre_count 16, post_count 15 Compressing live ranges: from 38 to 29 - 76%, pre_count 19, post_count 18 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 60 to 4
Re: [PATCH] Use __cxa_atexit on OpenBSD
On Sat, 15 Sep 2012, Ian Lance Taylor wrote: >> 2012-09-02 Mark Kettenis >> >> * config.gcc (*-*-openbsd4.[3-9]|*-*-openbsd[5-9]*): Set >> default_use_cxa_atexit to yes. > This is OK. I committed this to trunk and plan on doing so for the 4.7 branch as well (so that OpenBSD can benefit from a release branch of GCC carrying this) unless there are objections. Gerald
Commit: RX: Warn about multiple fast interrupt routines.
Hi Guys, I am applying the patch below to the RX backend. It adds support for issuing warning messages when multiple fast interrupt routines are defined in the same compilation unit. The RX only supports one fast interrupt so it is probably a mistake to define more than one. If the programmer really does want to define them then the command line -mno-warn-multiple-fast-interrupts can be used. Of course it is possible to define multiple fast interrupt handlers in multiple compilation units and then link them together, but Renesas were not interested in solving this problem. Cheers Nick PS. I have even remembered to include a patch to the html documentation as well! gcc/ChangeLog 2012-10-03 Nick Clifton * config/rx/rx.c (struct decl_chain): New local structure. (warned_decls): New local variable. Contains a stack of decls for which warnings have been issued. (add_warned_decl): Adds a decl to the stack. (already_warned): Returns true if a given decl is on the stack. (rx_set_current_function): Issue a warning if multiple fast interrupt handlers are defined. * config/rx/rx.opt (mwarn-multiple-fast-interrupts): New option. * doc/invoke.texi: Document the option. Index: gcc/config/rx/rx.c === --- gcc/config/rx/rx.c (revision 192034) +++ gcc/config/rx/rx.c (working copy) @@ -1256,6 +1256,41 @@ } } +struct decl_chain +{ + tree fndecl; + struct decl_chain * next; +}; + +/* Stack of decls for which we have issued warnings. */ +static struct decl_chain * warned_decls = NULL; + +static void +add_warned_decl (tree fndecl) +{ + struct decl_chain * warned = (struct decl_chain *) xmalloc (sizeof * warned); + + warned->fndecl = fndecl; + warned->next = warned_decls; + warned_decls = warned; +} + +/* Returns TRUE if FNDECL is on our list of warned about decls. */ + +static bool +already_warned (tree fndecl) +{ + struct decl_chain * warned; + + for (warned = warned_decls; + warned != NULL; + warned = warned->next) +if (warned->fndecl == fndecl) + return true; + + return false; +} + /* Perform any actions necessary before starting to compile FNDECL. For the RX we use this to make sure that we have the correct set of register masks selected. If FNDECL is NULL then we are @@ -1288,6 +1323,24 @@ target_reinit (); } + if (current_is_fast_interrupt && rx_warn_multiple_fast_interrupts) +{ + /* We do not warn about the first fast interrupt routine that +we see. Instead we just push it onto the stack. */ + if (warned_decls == NULL) + add_warned_decl (fndecl); + + /* Otherwise if this fast interrupt is one for which we have +not already issued a warning, generate one and then push +it onto the stack as well. */ + else if (! already_warned (fndecl)) + { + warning (0, "multiple fast interrupt routines seen: %qE and %qE", + fndecl, warned_decls->fndecl); + add_warned_decl (fndecl); + } +} + rx_previous_fndecl = fndecl; } Index: gcc/config/rx/rx.opt === --- gcc/config/rx/rx.opt(revision 192034) +++ gcc/config/rx/rx.opt(working copy) @@ -118,3 +118,9 @@ mpid Target Mask(PID) Enables Position-Independent-Data (PID) mode. + +;--- + +mwarn-multiple-fast-interrupts +Target Report Var(rx_warn_multiple_fast_interrupts) Init(1) Warning +Warn when multiple, different, fast interrupt handlers are in the compilation unit. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 192034) +++ gcc/doc/invoke.texi (working copy) @@ -859,6 +859,7 @@ -mmax-constant-size=@gol -mint-register=@gol -mpid@gol +-mno-warn-multiple-fast-interrupts@gol -msave-acc-in-interrupts} @emph{S/390 and zSeries Options} @@ -17875,6 +17876,15 @@ By default this feature is not enabled. The default can be restored via the @option{-mno-pid} command-line option. +@item -mno-warn-multiple-fast-interrupts +@itemx -mwarn-multiple-fast-interrupts +@opindex mno-warn-multiple-fast-interrupts +@opindex mwarn-multiple-fast-interrupts +Prevents GCC from issuing a warning message if it finds more than one +fast interrupt handler when it is compiling a file. The default is to +issue a warning for each extra fast interrupt handler found, as the RX +only supports one such interrupt. + @end table @emph{Note:} The generic GCC command-line option @option{-ffixed-@var{reg}} Index: htdocs/gcc-4.8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.32 diff -u -3 -p -r1.32 changes.html --- htdocs/gcc-4.8/changes.html 2 Oct 2012 18:
Re: Propagate profile counts during switch expansion
What is the status of switch expansion GIMPLE rewrite? If it is not planned for 4.8, It will be desirable to include this fix into trunk. It also helps set up a good base line to test against regression. thanks, David On Tue, Oct 2, 2012 at 6:09 PM, Easwaran Raman wrote: > Hi, > This patch propagates the profile counts during RTL expansion. In > many cases, there is no way to determine the exact count of an edge > generated during the expansion. So this patch uses some simple > heuristics to estimate the edge counts but ensures that the counts of > the basic blocks corresponding to the cases are (nearly the) same as > at the gimple level. > > Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for > trunk? > > - Easwaran > > -- > 2012-10-02 Easwaran Raman > > * cfgbuild.c (gen_probabilities_from_existing_counts): New function. > (compute_outgoing_frequencies): If at least one successor of a > BB has non-zero profile count, use the counts to compute > probabilities. > * expr.c (do_tablejump): Add a REG_BR_PROB note on the > jump to default label. > (try_tablejump): Add a parameter to specify the probability > of jumping to the default label. > * expr.h (try_tablejump): Add a new parameter. > * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. > (do_jump_if_equal): Pass probability for REG_BR_PROB note. > (add_case_node): Pass execution count of the case node and use > it to initialize COUNT field. > (emit_case_decision_tree): Pass default_count to emit_case_nodes. > (get_outgoing_edge_counts): New function. > (add_prob_note_to_last_insn): Likewise. > (case_probability): New macro. > (emit_case_dispatch_table): Compute probability of jumping to default > label and apply note to the jump. > (expand_case): Compute and propagate default edge count to > emit_case_dispatch_table. > (expand_sjlj_dispatch_table): Update calls to add_case_node and > emit_case_dispatch_table. > (balance_case_nodes): Update subtree_counts. > (emit_case_nodes): Compute edge probabilities and add note. > > gcc/testsuite/ChangeLog: > 2012-10-02 Easwaran Raman > * gcc.dg/tree-prof/switch-case-1.c: New test case. > * gcc.dg/tree-prof/switch-case-2.c: New test case. > > Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c > === > --- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) > @@ -0,0 +1,40 @@ > +/* { dg-options "-O2 -fdump-rtl-expand-all" } */ > +int g; > + > +__attribute__((noinline)) void foo (int n) > +{ > + switch (n) > +{ > +case 1: > + g++; break; > +case 2: > + g += 2; break; > +case 3: > + g += 1; break; > +case 4: > + g += 3; break; > +case 5: > + g += 4; break; > +case 6: > + g += 5; break; > +case 7: > + g += 6; break; > +case 8: > + g += 7; break; > +case 9: > + g += 8; break; > +default: > + g += 8; break; > + } > +} > + > +int main () > +{ > + int i; > + for (i = 0; i < 1; i++) > + foo ((i * i) % 5); > + return 0; > +} > +/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count > 4000" 2 "expand"} } */ > +/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count > 2000" 1 "expand"} } */ > +/* { dg-final-use { cleanup-rtl-dump "expand" } } */ > Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c > === > --- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) > @@ -0,0 +1,40 @@ > +/* { dg-options "-O2 -fdump-rtl-expand-all" } */ > +int g; > + > +__attribute__((noinline)) void foo (int n) > +{ > + switch (n) > +{ > +case 99: > + g += 2; break; > +case 1: > + g++; break; > +case 100: > + g += 1; break; > +case 4: > + g += 3; break; > +case 5: > + g += 4; break; > +case 6: > + g += 5; break; > +case 7: > + g += 6; break; > +case 8: > + g += 7; break; > +case 9: > + g += 8; break; > +default: > + g += 8; break; > + } > +} > + > +int main () > +{ > + int i; > + for (i = 0; i < 1; i++) > + foo ((i * i) % 5); > + return 0; > +} > +/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count > 4000" 2 "expand"} } */ > +/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count > 2000" 1 "expand"} } */ > +/* { dg-final-use { cleanup-rtl-dump "expand" } } */ > Index: gcc/expr.c > === > --- gcc/expr.c (revision 191879) > +++ gcc/expr.c (working copy) > @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin > #ifdef PUSH_ROUNDING > static void emit_single_push_insn (enum machine_mode, rtx, tree); > #endif > -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); > +stati
[Patch, Fortran, OOP] PR 54784: [4.7/4.8 Regression] wrong code in polymorphic allocation with SOURCE
Hi all, here is a small patch for a wrong-code regression with polymorphic allocation. The problem is that we falsely detect the allocation variable to be a polymorphic array (although it is a scalar). For further details see the PR, in particular comment 4. Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.7? Cheers, Janus 2012-10-03 Janus Weil PR fortran/54784 * trans-stmt.c (gfc_trans_allocate): Correctly determine the reference to the _data component for polymorphic allocation with SOURCE. 2012-10-03 Janus Weil PR fortran/54784 * gfortran.dg/class_allocate_13.f90: New. pr54784.diff Description: Binary data class_allocate_13.f90 Description: Binary data
Re: Convert more non-GTY htab_t to hash_table.
On 10/2/12, Ian Lance Taylor wrote: > On Oct 2, 2012 Lawrence Crowl wrote: > > On 10/2/12, Richard Guenther wrote: > > > You are changing a hashtable used by fold checking, did you > > > test with fold checking enabled? > > > > I didn't know I had to do anything beyond the normal make check. > > What do I do? > > Fold checking is not enabled by default because of high overhead > and general pointlessness. To enable it, when you run configure, > use --enable-checking=yes,fold. So, why have the feature if it is pointless? Just curious. -- Lawrence Crowl
Re: Convert more non-GTY htab_t to hash_table.
On Wed, Oct 03, 2012 at 09:54:45AM -0700, Lawrence Crowl wrote: > On 10/2/12, Ian Lance Taylor wrote: > > On Oct 2, 2012 Lawrence Crowl wrote: > > > On 10/2/12, Richard Guenther wrote: > > > > You are changing a hashtable used by fold checking, did you > > > > test with fold checking enabled? > > > > > > I didn't know I had to do anything beyond the normal make check. > > > What do I do? > > > > Fold checking is not enabled by default because of high overhead > > and general pointlessness. To enable it, when you run configure, > > use --enable-checking=yes,fold. > > So, why have the feature if it is pointless? Just curious. It is not pointless, just seldom used. The reason for the verification code is to (very expensively) verify that fold_binary etc. doesn't modify passed expressions in place, instead if it needs to make changes, it allocates new trees. It is not something for every day use (unlike say tree or rtl checking which can be enabled on faster build boxes for daily bootstraps on the trunk), but something to be checked once a year or so (similarly to --enable-checking=valgrind, which is even more expensive). Jakub
Re: abs(long long)
On Tue, Oct 2, 2012 at 1:53 PM, Marc Glisse wrote: >> See what we did in c/cmath and c_global/cmath. > > > Note that llabs is quite different from asin. Is asin the function you took out of c/cmath? > __builtin_llabs generates an > ABS_EXPR, which will later be expanded either to a special instruction or to > a condition. It never generates a call to llabs (I am not sure exactly if > Paolo's instructions to use llabs meant he wanted an actual library call). distinction without difference. Again see c/cmath. > __builtin_asin on the other hand is never expanded inline (except maybe for > special constant input like 0) and expands to a call to the library function > asin. > > Would the attached patch be better, assuming it passes testing? For lldiv, > there is no builtin (for good reason). > > * include/c_std/cstdlib (abs(long long)): Define with > __builtin_llabs when we have long long. > > (abs(__int128)): Define when we have __int128. This change is OK > (div(long long, long long)): Use lldiv. not this one. > > > -- > Marc Glisse > Index: cstdlib > === > --- cstdlib (revision 191941) > +++ cstdlib (working copy) > @@ -128,21 +128,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >using ::strtod; >using ::strtol; >using ::strtoul; >using ::system; > #ifdef _GLIBCXX_USE_WCHAR_T >using ::wcstombs; >using ::wctomb; > #endif // _GLIBCXX_USE_WCHAR_T > >inline long > - abs(long __i) { return labs(__i); } > + abs(long __i) { return __builtin_labs(__i); } > + > +#ifdef _GLIBCXX_USE_LONG_LONG > + inline long long > + abs(long long __x) { return __builtin_llabs (__x); } > +#endif > + > +#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_INT128) > + inline __int128 > + abs(__int128 __x) { return __x >= 0 ? __x : -__x; } > +#endif > >inline ldiv_t >div(long __i, long __j) { return ldiv(__i, __j); } > > _GLIBCXX_END_NAMESPACE_VERSION > } // namespace > > #if _GLIBCXX_USE_C99 > > #undef _Exit > @@ -161,29 +171,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > #if !_GLIBCXX_USE_C99_LONG_LONG_DYNAMIC >using ::lldiv_t; > #endif > #if _GLIBCXX_USE_C99_CHECK || _GLIBCXX_USE_C99_DYNAMIC >extern "C" void (_Exit)(int) throw () _GLIBCXX_NORETURN; > #endif > #if !_GLIBCXX_USE_C99_DYNAMIC >using ::_Exit; > #endif > > - inline long long > - abs(long long __x) { return __x >= 0 ? __x : -__x; } > - > #if !_GLIBCXX_USE_C99_LONG_LONG_DYNAMIC >using ::llabs; > >inline lldiv_t >div(long long __n, long long __d) > - { lldiv_t __q; __q.quot = __n / __d; __q.rem = __n % __d; return __q; } > + { return ::lldiv (__n, __d); } > >using ::lldiv; > #endif > > #if _GLIBCXX_USE_C99_LONG_LONG_CHECK || _GLIBCXX_USE_C99_LONG_LONG_DYNAMIC >extern "C" long long int (atoll)(const char *) throw (); >extern "C" long long int > (strtoll)(const char * __restrict, char ** __restrict, int) throw (); >extern "C" unsigned long long int > (strtoull)(const char * __restrict, char ** __restrict, int) throw (); > @@ -198,21 +205,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > _GLIBCXX_END_NAMESPACE_VERSION > } // namespace __gnu_cxx > > namespace std > { > #if !_GLIBCXX_USE_C99_LONG_LONG_DYNAMIC >using ::__gnu_cxx::lldiv_t; > #endif >using ::__gnu_cxx::_Exit; > - using ::__gnu_cxx::abs; > #if !_GLIBCXX_USE_C99_LONG_LONG_DYNAMIC >using ::__gnu_cxx::llabs; >using ::__gnu_cxx::div; >using ::__gnu_cxx::lldiv; > #endif >using ::__gnu_cxx::atoll; >using ::__gnu_cxx::strtof; >using ::__gnu_cxx::strtoll; >using ::__gnu_cxx::strtoull; >using ::__gnu_cxx::strtold; >
Re: Propagate profile counts during switch expansion
On Wed, Oct 3, 2012 at 6:12 PM, Xinliang David Li wrote: > What is the status of switch expansion GIMPLE rewrite? If it is not > planned for 4.8, It will be desirable to include this fix into trunk. I could work on it for GCC 4.8 (there's not a lot of work left to be done for it now) but we haven't really decided yet where the pass should be scheduled and I also would like to wait a bit to see how the SJLJ changes work out. So I talked about this with Easwaran, I think his patch should be included into the trunk now. I'll adapt it for the move to GIMPLE. > It also helps set up a good base line to test against regression. Agreed. Ciao! Steven
Re: [PATCH] PowerPC VLE port
On Wed, Oct 3, 2012 at 6:59 AM, James Lemke wrote: > Ping.. > @@ -847,7 +1106,6 @@ > && (DEFAULT_ABI != ABI_AIX || SYMBOL_REF_FUNCTION_P > (op))"))) > > ;; Return 1 if op is an operand that can be loaded via the GOT. > -;; or non-special register register field no cr0 > (define_predicate "got_operand" > (match_code "symbol_ref,const,label_ref")) Most likely should be submitted and committed (as obvious) separately. > @@ -6694,7 +6771,7 @@ > > switch (mode) > { >- case QImode: >+case QImode: > case HImode: > if (dest == NULL) > dest = gen_reg_rtx (mode); Likewise. > @@ -14944,11 +15104,10 @@ > return; > > case 'Q': >- if (TARGET_MFCRF) >- fputc (',', file); >-/* FALLTHRU */ >- else >+ if (! TARGET_MFCRF) > return; >+ fputc (',', file); >+ /* FALLTHRU */ > > case 'R': > /* X is a CR register. Print the mask for `mtcrf'. */ Likewise. >@@ -15893,7 +16071,7 @@ > } > > /* Return the string to output a conditional branch to LABEL, which is >- the operand number of the label, or -1 if the branch is really a >+ the operand template of the label, or NULL if the branch is really a >conditional return. > >OP is the conditional expression. XEXP (OP, 0) is assumed to be a This looks like it should be done separately also. > +bool > +valid_vle_sd4_field (rtx mem, enum machine_mode mode) No comment before this function. >- (simple_return "")]) >+ (simple_return "1")]) Submitted separately it looks. > libgcc/longlong.h Gets sync'd with glibc's version sometimes. Also have you thought have just adding a vle.md for all the needed patterns and disabling the patterns in rs6000.md for VLE and not using %^/%+/%- ? I think that would have been a cleaner implementation of vle than adding support for it to the current patterns. Also does not have the maintenance issue of always having to check if a new pattern needs the %^/%+/%-. Thanks, Andrew Pinski > > > > Original Message > Subject: [PATCH] PowerPC VLE port > Date: Mon, 24 Sep 2012 21:44:02 -0400 > From: James Lemke > To: GCC Patches > > The initial patch for this port caused much comment. I have attached > an updated patch trying to address many of those points. > All comments are welcome. I would prefer comments are made on-list. > > I have tried to simplify this patch by: > 1) Removing as many compound tests as possible and now rely on feature > flags. > I have removed TARGET_VLE_ISEL, TARGET_VLE_MULTIPLE, TARGET_VLE_ISEL64 & > TARGET_MFCRF_NOVLE. > -mcpu now implies the following options: > -mcpu=e200z[0-2]: -mvle -misel -mno-mfcrf -mmultiple -msoft-float > -mcpu=e200z[367]: -mvle -misel -mno-mfcrf -mmultiple -mfloat-gprs=single \ >-mspe=yes -mabi=spe > 2) When gcc is configured for a non-VLE target, TARGET_VLE evaluates to "0" > so that most VLE-specific compiler code is optimized away. > 3) Separated all VLE-only items from rs6000.md to a new file, vle.md. > In the cases where there was strong commonality there is still VLE code in > rs6000.md. > > On r191665 I have run the DejaGNU suite. A bootstrap is running. > > Comments? > Jim. > > -- > Jim Lemke > Mentor Graphics / CodeSourcery > Orillia Ontario, +1-613-963-1073 > > >
Re: PATCH: PR target/54741: Check SSE and YMM state support for -march=native
On Tue, Oct 2, 2012 at 12:35 PM, H.J. Lu wrote: > Hi, > > This patch checks SSE and YMM state support for -march=native. Tested > on Linux/x86-64. OK to install? > > Thanks. > > > H.J. > --- > 2012-10-02 H.J. Lu > > PR target/54741 > * config/i386/driver-i386.c (XCR_XFEATURE_ENABLED_MASK): New. > (XSTATE_FP): Likewise. > (XSTATE_SSE): Likewise. > (XSTATE_YMM): Likewise. > (host_detect_local_cpu): Disable AVX, AVX2, FMA, FMA4 and XOP if > SSE and YMM states aren't supported. > > diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c > index bda4e02..4dffc51 100644 > --- a/gcc/config/i386/driver-i386.c > +++ b/gcc/config/i386/driver-i386.c > @@ -390,6 +390,7 @@ const char *host_detect_local_cpu (int argc, const char > **argv) >unsigned int has_hle = 0, has_rtm = 0; >unsigned int has_rdrnd = 0, has_f16c = 0, has_fsgsbase = 0; >unsigned int has_rdseed = 0, has_prfchw = 0, has_adx = 0; > + unsigned int has_osxsave = 0; > >bool arch; > > @@ -431,6 +432,7 @@ const char *host_detect_local_cpu (int argc, const char > **argv) >has_sse4_1 = ecx & bit_SSE4_1; >has_sse4_2 = ecx & bit_SSE4_2; >has_avx = ecx & bit_AVX; > + has_osxsave = ecx & bit_OSXSAVE; >has_cmpxchg16b = ecx & bit_CMPXCHG16B; >has_movbe = ecx & bit_MOVBE; >has_popcnt = ecx & bit_POPCNT; > @@ -460,6 +462,26 @@ const char *host_detect_local_cpu (int argc, const char > **argv) >has_adx = ebx & bit_ADX; > } > > + /* Get XCR_XFEATURE_ENABLED_MASK register with xgetbv. */ > +#define XCR_XFEATURE_ENABLED_MASK 0x0 > +#define XSTATE_FP 0x1 > +#define XSTATE_SSE 0x2 > +#define XSTATE_YMM 0x4 > + if (has_osxsave) > +asm (".byte 0x0f; .byte 0x01; .byte 0xd0" > +: "=a" (eax), "=d" (edx) > +: "c" (XCR_XFEATURE_ENABLED_MASK)); > + > + /* Check if SSE and YMM states are supported. */ > + if ((eax & (XSTATE_SSE | XSTATE_YMM)) == (XSTATE_SSE | XSTATE_YMM)) > +{ > + has_avx = 0; > + has_avx2 = 0; > + has_fma = 0; > + has_fma4 = 0; > + has_xop = 0; > +} > + >/* Check cpuid level of extended features. */ >__cpuid (0x8000, ext_level, ebx, ecx, edx); > This is very embarrassing. Thanks to Andrew, I checked his fix http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54741#c18 into trunk, 4.6 and 4.7 branches. -- H.J.
Re: Propagate profile counts during switch expansion
thanks for the update! David On Wed, Oct 3, 2012 at 10:37 AM, Steven Bosscher wrote: > On Wed, Oct 3, 2012 at 6:12 PM, Xinliang David Li wrote: >> What is the status of switch expansion GIMPLE rewrite? If it is not >> planned for 4.8, It will be desirable to include this fix into trunk. > > I could work on it for GCC 4.8 (there's not a lot of work left to be > done for it now) but we haven't really decided yet where the pass > should be scheduled and I also would like to wait a bit to see how the > SJLJ changes work out. So I talked about this with Easwaran, I think > his patch should be included into the trunk now. I'll adapt it for the > move to GIMPLE. > >> It also helps set up a good base line to test against regression. > > Agreed. > > Ciao! > Steven
Re: [PATCH] PowerPC VLE port
;; Return 1 if op is an operand that can be loaded via the GOT. -;; or non-special register register field no cr0 (define_predicate "got_operand" (match_code "symbol_ref,const,label_ref")) Most likely should be submitted and committed (as obvious) separately. Yes, will do. switch (mode) { - case QImode: +case QImode: case HImode: if (dest == NULL) dest = gen_reg_rtx (mode); Likewise. Will do. case 'Q': - if (TARGET_MFCRF) - fputc (',', file); -/* FALLTHRU */ - else + if (! TARGET_MFCRF) return; + fputc (',', file); + /* FALLTHRU */ case 'R': /* X is a CR register. Print the mask for `mtcrf'. */ Likewise. Will do. /* Return the string to output a conditional branch to LABEL, which is - the operand number of the label, or -1 if the branch is really a + the operand template of the label, or NULL if the branch is really a conditional return. OP is the conditional expression. XEXP (OP, 0) is assumed to be a This looks like it should be done separately also. OK. +bool +valid_vle_sd4_field (rtx mem, enum machine_mode mode) No comment before this function. I'll fix that. - (simple_return "")]) + (simple_return "1")]) Submitted separately it looks. Sure. libgcc/longlong.h Gets sync'd with glibc's version sometimes. Are you asking me to change something here? Also have you thought have just adding a vle.md for all the needed patterns and disabling the patterns in rs6000.md for VLE and not using %^/%+/%- ? I think that would have been a cleaner implementation of vle than adding support for it to the current patterns. Also does not have the maintenance issue of always having to check if a new pattern needs the %^/%+/%-. The idea was discussed. A disadvantage is the code duplication it would cause, which is also a maintenance headache. So we opted for the current implementation. If the mainstream development causes a problem for VLE it will be up to those interested in VLE to correct %^ etc. -- Jim Lemke Mentor Graphics / CodeSourcery Orillia Ontario, +1-613-963-1073
RFA: add lock_length attribute to break branch-shortening cycles
The ARCompact architecture has some pipelining features that result in the vanilla branch shortening not always converging. Moreover, there are some short, complex branch instructions with very short offsets; replacing them with a multi-insn sequence when the offset doesn't reach makes the code significantly longer. Thus, when starting branch shortening with pessimistic assumptions, the short branches are often not used because of the pessimistic branch length causing the offsets going out of range. This problem can be avoided when starting with a low estimate and working upwards. However, that makes the incidence of infinite branch shortening cycles higher, and also makes it impossible to just break out after some iteration count. To address these issues, I've made the generator programs recognize the optional lock_length attribute. To quote from the documentation added for this feature: If you define the `lock_length' attribute, branch shortening will work the other way round: it starts out assuming minimum instruction lengths and iterates from there. In addition, the value of the `lock_length' attribute does not decrease across iterations, and the value computed for the `length' attribute will be no smaller than that of the `lock_length' attribute. bootstrapped and regression tested on i686-pc-linux-gnu 2012-10-03 Joern Rennecke * final.c (get_attr_length_1): Use direct recursion rather than calling get_attr_length. (get_attr_lock_length): New function. (INSN_VARIABLE_LENGTH_P): Define. (shorten_branches): Take HAVE_ATTR_lock_length into account. Don't overwrite non-delay slot insn lengths with the lengths of delay slot insns with same uid. * genattrtab.c (lock_length_str): New variable. (make_length_attrs): New parameter base. (main): Initialize lock_length_str. Generate lock_lengths attributes. * genattr.c (gen_attr): Emit declarations for lock_length attribute related functions. * doc/md.texi (node Insn Lengths): Document lock_length attribute. Index: doc/md.texi === --- doc/md.texi (revision 192036) +++ doc/md.texi (working copy) @@ -8004,6 +8004,20 @@ (define_insn "jump" (const_int 6)))]) @end smallexample +@cindex lock_length +Usually, branch shortening is done assuming the worst case (i.e. longest) +lengths, and then iterating (if optimizing) to smaller lengths till +no further changed occur. This does not work so well for architectures +that have very small minimum offsets and considerable jumps in instruction +lengths. + +If you define the @code{lock_length} attribute, branch shortening will +work the other way round: it starts out assuming minimum instruction +lengths and iterates from there. In addition, the value of the +@code{lock_length} attribute does not decrease across iterations, and +the value computed for the @code{length} attribute will be no smaller +than that of the @code{lock_length} attribute. + @end ifset @ifset INTERNALS @node Constant Attributes Index: final.c === --- final.c (revision 192036) +++ final.c (working copy) @@ -312,6 +312,7 @@ dbr_sequence_length (void) `insn_current_length'. */ static int *insn_lengths; +static char *uid_lock_length; VEC(int,heap) *insn_addresses_; @@ -447,6 +448,20 @@ get_attr_length (rtx insn) return get_attr_length_1 (insn, insn_default_length); } +#ifdef HAVE_ATTR_lock_length +int +get_attr_lock_length (rtx insn) +{ + if (uid_lock_length && insn_lengths_max_uid > INSN_UID (insn)) +return uid_lock_length[INSN_UID (insn)]; + return get_attr_length_1 (insn, insn_min_lock_length); +} +#define INSN_VARIABLE_LENGTH_P(INSN) \ + (insn_variable_length_p (INSN) || insn_variable_lock_length_p (INSN)) +#else +#define INSN_VARIABLE_LENGTH_P(INSN) (insn_variable_length_p (INSN)) +#endif + /* Obtain the current length of an insn. If branch shortening has been done, get its actual length. Otherwise, get its minimum length. */ int @@ -865,6 +880,7 @@ shorten_branches (rtx first ATTRIBUTE_UN rtx body; int uid; rtx align_tab[MAX_CODE_ALIGN]; + int (*length_fun) (rtx); #endif @@ -875,6 +891,10 @@ shorten_branches (rtx first ATTRIBUTE_UN free (uid_shuid); uid_shuid = XNEWVEC (int, max_uid); +#ifdef HAVE_ATTR_lock_length + uid_lock_length = XNEWVEC (char, max_uid); + memset (uid_lock_length, 0, max_uid); +#endif if (max_labelno != max_label_num ()) { @@ -1067,6 +1087,10 @@ shorten_branches (rtx first ATTRIBUTE_UN #endif /* CASE_VECTOR_SHORTEN_MODE */ /* Compute initial lengths, addresses, and varying flags for each insn. */ + length_fun = insn_default_length; +#ifdef HAVE_ATTR_lock_length + length_fun = insn_min_length; +#endif for (insn_current_address = 0, insn = first; insn != 0;
[PATCH] Fix PR54782
Hi, Attached patch fixes PR54782. phi_arg_location is not correctly updated in move_block_to_fn. This patch fixes the problem. Bootstrapped and passed gcc regression tests on x86. Okay for trunk? Thanks, Dehao gcc/ChangeLog: 2012-10-03 Dehao Chen PR middle-end/54782 * tree-cfg.c (move_block_to_fn): Update lexical block for phi_args. gcc/testsuite/ChangeLog: 2012-10-03 Dehao Chen PR middle-end/54782 * gcc.dg/pr54782.c: New test. Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 192041) +++ gcc/tree-cfg.c (working copy) @@ -6322,6 +6322,7 @@ move_block_to_fn (struct function *dest_cfun, basi use_operand_p use; tree op = PHI_RESULT (phi); ssa_op_iter oi; + unsigned i; if (virtual_operand_p (op)) { @@ -6340,6 +6341,20 @@ move_block_to_fn (struct function *dest_cfun, basi SET_USE (use, replace_ssa_name (op, d->vars_map, dest_cfun->decl)); } + for (i = 0; i < EDGE_COUNT (bb->preds); i++) + { + location_t locus = gimple_phi_arg_location (phi, i); + if (locus != UNKNOWN_LOCATION) + { + tree block = LOCATION_BLOCK (locus); + if (d->orig_block == NULL_TREE + || block == d->orig_block) + gimple_phi_arg_set_location (phi, i, d->new_block ? + COMBINE_LOCATION_DATA (line_table, locus, d->new_block) : + LOCATION_LOCUS (locus)); + } + } + gsi_next (&si); } Index: gcc/testsuite/gcc.dg/pr54782.c === --- gcc/testsuite/gcc.dg/pr54782.c (revision 0) +++ gcc/testsuite/gcc.dg/pr54782.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O -ffast-math -ftree-parallelize-loops=2 -g" } */ + +struct S +{ + int n; + float *a; +}; + +int +foo (struct S *s) +{ + float sum = 0; + int i; + for (i = 0; i < s->n; i++) +sum += s->a[i]; + return sum; +}
Add myself to MAINTAINERS
Adding myself as write after approval. Index: MAINTAINERS === --- MAINTAINERS (revision 191941) +++ MAINTAINERS (working copy) @@ -342,6 +342,7 @@ R. Kelley Cook kc...@gcc.gnu.org Christian Cornelssen cc...@cs.tu-berlin.de François-Xavier Coudertfxcoud...@gcc.gnu.org Cary Coutant ccout...@google.com +Lawrence Crowl cr...@google.com Ian Dall i...@beware.dropbear.id.au David Daneydavid.da...@caviumnetworks.com Bud Davis jmda...@link.com -- Lawrence Crowl
Re: [PATCH] Fix PR54782
On Wed, Oct 03, 2012 at 11:26:09AM -0700, Dehao Chen wrote: > @@ -6340,6 +6341,20 @@ move_block_to_fn (struct function *dest_cfun, basi > SET_USE (use, replace_ssa_name (op, d->vars_map, dest_cfun->decl)); > } > > + for (i = 0; i < EDGE_COUNT (bb->preds); i++) > + { > + location_t locus = gimple_phi_arg_location (phi, i); > + if (locus != UNKNOWN_LOCATION) > + { > + tree block = LOCATION_BLOCK (locus); > + if (d->orig_block == NULL_TREE || block == d->orig_block) > + gimple_phi_arg_set_location (phi, i, d->new_block ? > + COMBINE_LOCATION_DATA (line_table, locus, d->new_block) : > + LOCATION_LOCUS (locus)); > + } > + } > + The formatting is wrong on this. ? and : would need to go to the next line and even the indentation of part of an argument is weird. IMHO better just do: if (d->orig_block == NULL_TREE || block == d->orig_block) { if (d->new_block) locus = COMBINE_LOCATION_DATA (line_table, locus, d->new_block); else locus = LOCATION_LOCUS (locus); gimple_phi_arg_set_location (phi, i, locus); } It will be far more readable. And/or to decrease indentation level you could do if (locus == UNKNOWN_LOCATION) continue; block = LOCATION_BLOCK (locus); ... Jakub
Re: Propagate profile counts during switch expansion
> thanks for the update! OK, I will review the patch tomorrow then. It is a good incremental step. I would certainly like to see the gimple expansion in 4.8 however. Honza
RFA: darwin PATCH to fix build, internal visibility
This patch fixes a couple of Darwin issues I noticed with a cross-compiler: 1) Adds a couple of consts to avoid const-correctness errors. 2) Treats visibility "internal" like "hidden" rather than like "default". The documentation says that internal is hidden + processor-specific semantics, so treating it as just hidden makes sense to me. OK for trunk? Jason commit 907f8d9588da2e65cf938985639bfaf0dde53c6f Author: Jason Merrill Date: Wed Oct 3 09:55:25 2012 -0400 * config/darwin.c (darwin_assemble_visibility): Treat VISIBILITY_INTERNAL as hidden. * config/darwin-c.c (find_subframework_file): Add missing const. (framework_construct_pathname): Likewise. diff --git a/gcc/config/darwin-c.c b/gcc/config/darwin-c.c index a642f66..8e48c30 100644 --- a/gcc/config/darwin-c.c +++ b/gcc/config/darwin-c.c @@ -267,7 +267,7 @@ static struct framework_header framework_header_dirs[] = { static char * framework_construct_pathname (const char *fname, cpp_dir *dir) { - char *buf; + const char *buf; size_t fname_len, frname_len; cpp_dir *fast_dir; char *frname; @@ -344,7 +344,7 @@ find_subframework_file (const char *fname, const char *pname) { char *sfrname; const char *dot_framework = ".framework/"; - char *bufptr; + const char *bufptr; int sfrname_len, i, fname_len; struct cpp_dir *fast_dir; static struct cpp_dir subframe_dir; diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c index 54c92d1..5a9f50a 100644 --- a/gcc/config/darwin.c +++ b/gcc/config/darwin.c @@ -2623,7 +2623,7 @@ darwin_assemble_visibility (tree decl, int vis) { if (vis == VISIBILITY_DEFAULT) ; - else if (vis == VISIBILITY_HIDDEN) + else if (vis == VISIBILITY_HIDDEN || vis == VISIBILITY_INTERNAL) { fputs ("\t.private_extern ", asm_out_file); assemble_name (asm_out_file, @@ -2631,7 +2631,7 @@ darwin_assemble_visibility (tree decl, int vis) fputs ("\n", asm_out_file); } else -warning (OPT_Wattributes, "internal and protected visibility attributes " +warning (OPT_Wattributes, "protected visibility attribute " "not supported in this configuration; ignored"); }
[libcpp] Free some variables
Build on x86-64-linux with C/C++/Fortran. I will now do an all-language build/regtest. OK when it passes? Tobias 2012-10-03 Tobias Burnus * files.c (read_file_guts, _cpp_save_file_entries): Free memory before returning. * lex.c (warn_about_normalization): Ditto. * mkdeps.c (deps_save): Ditto. * pch.c (cpp_valid_state): Ditto. diff --git a/libcpp/files.c b/libcpp/files.c index 5b3a37b..6fc24e2 100644 --- a/libcpp/files.c +++ b/libcpp/files.c @@ -668,12 +668,13 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file) } } if (count < 0) { cpp_errno (pfile, CPP_DL_ERROR, file->path); + free (buf); return false; } if (regular && total != size && STAT_SIZE_RELIABLE (file->st)) cpp_error (pfile, CPP_DL_WARNING, "%s is shorter than expected", file->path); @@ -1756,12 +1757,13 @@ _cpp_save_file_entries (cpp_reader *pfile, FILE *fp) FILE *ff; int oldfd = f->fd; if (!open_file (f)) { open_file_failed (pfile, f, 0); + free (result); return false; } ff = fdopen (f->fd, "rb"); md5_stream (ff, result->entries[count].sum); fclose (ff); f->fd = oldfd; diff --git a/libcpp/lex.c b/libcpp/lex.c index ab904db..23809bc 100644 --- a/libcpp/lex.c +++ b/libcpp/lex.c @@ -1091,12 +1091,13 @@ warn_about_normalization (cpp_reader *pfile, if (NORMALIZE_STATE_RESULT (s) == normalized_C) cpp_warning_with_line (pfile, CPP_W_NORMALIZE, token->src_loc, 0, "`%.*s' is not in NFKC", (int) sz, buf); else cpp_warning_with_line (pfile, CPP_W_NORMALIZE, token->src_loc, 0, "`%.*s' is not in NFC", (int) sz, buf); + free (buf); } } /* Returns TRUE if the sequence starting at buffer->cur is invalid in an identifier. FIRST is TRUE if this starts an identifier. */ static bool diff --git a/libcpp/mkdeps.c b/libcpp/mkdeps.c index af11ac3..b576813 100644 --- a/libcpp/mkdeps.c +++ b/libcpp/mkdeps.c @@ -396,31 +396,39 @@ deps_save (struct deps *deps, FILE *f) int deps_restore (struct deps *deps, FILE *fd, const char *self) { unsigned int i, count; size_t num_to_read; size_t buf_size = 512; - char *buf = XNEWVEC (char, buf_size); + char *buf; /* Number of dependences. */ if (fread (&count, 1, sizeof (count), fd) != sizeof (count)) return -1; + buf = XNEWVEC (char, buf_size); + /* The length of each dependence string, followed by the string. */ for (i = 0; i < count; i++) { /* Read in # bytes in string. */ if (fread (&num_to_read, 1, sizeof (size_t), fd) != sizeof (size_t)) - return -1; + { + free (buf); + return -1; + } if (buf_size < num_to_read + 1) { buf_size = num_to_read + 1 + 127; buf = XRESIZEVEC (char, buf, buf_size); } if (fread (buf, 1, num_to_read, fd) != num_to_read) - return -1; + { + free (buf); + return -1; + } buf[num_to_read] = '\0'; /* Generate makefile dependencies from .pch if -nopch-deps. */ if (self != NULL && filename_cmp (buf, self) != 0) deps_add_dep (deps, buf); } diff --git a/libcpp/pch.c b/libcpp/pch.c index d278f14..001bf3f 100644 --- a/libcpp/pch.c +++ b/libcpp/pch.c @@ -707,13 +707,12 @@ cpp_valid_state (cpp_reader *r, const char *name, int fd) /* We win! */ return 0; error: cpp_errno (r, CPP_DL_ERROR, "while reading precompiled header"); - return -1; fail: free (namebuf); free (undeftab); free (nl.defs); return 1;
patch for allocation reginfo in advance
The following patch was submitted as approved by Jeff Law and Paolo Bonzini. The patch was successfully boostraped on x86/x86-64. Committed as rev. 192048. 2012-10-03 Vladimir Makarov * reginfo.c (max_regno_since_last_resize): New. (reg_preferred_class, reg_alternate_class): Add assert. (allocate_reg_info): Initialize allocated reg info. (resize_reg_info): Make bigger reg_info and initialize new memory. (reginfo_init): Initialize max_regno_since_last_resize. (setup_reg_classes): Change assert. Index: reginfo.c === --- reginfo.c (revision 192046) +++ reginfo.c (working copy) @@ -839,6 +839,8 @@ static struct reg_pref *reg_pref; /* Current size of reg_info. */ static int reg_info_size; +/* Max_reg_num still last resize_reg_info call. */ +static int max_regno_since_last_resize; /* Return the reg_class in which pseudo reg number REGNO is best allocated. This function is sometimes called before the info has been computed. @@ -849,6 +851,7 @@ reg_preferred_class (int regno) if (reg_pref == 0) return GENERAL_REGS; + gcc_assert (regno < reg_info_size); return (enum reg_class) reg_pref[regno].prefclass; } @@ -858,6 +861,7 @@ reg_alternate_class (int regno) if (reg_pref == 0) return ALL_REGS; + gcc_assert (regno < reg_info_size); return (enum reg_class) reg_pref[regno].altclass; } @@ -868,45 +872,64 @@ reg_allocno_class (int regno) if (reg_pref == 0) return NO_REGS; + gcc_assert (regno < reg_info_size); return (enum reg_class) reg_pref[regno].allocnoclass; } -/* Allocate space for reg info. */ +/* Allocate space for reg info and initilize it. */ static void allocate_reg_info (void) { - reg_info_size = max_reg_num (); + int i; + + max_regno_since_last_resize = max_reg_num (); + reg_info_size = max_regno_since_last_resize * 3 / 2 + 1; gcc_assert (! reg_pref && ! reg_renumber); reg_renumber = XNEWVEC (short, reg_info_size); reg_pref = XCNEWVEC (struct reg_pref, reg_info_size); memset (reg_renumber, -1, reg_info_size * sizeof (short)); + for (i = 0; i < reg_info_size; i++) +{ + reg_pref[i].prefclass = GENERAL_REGS; + reg_pref[i].altclass = ALL_REGS; + reg_pref[i].allocnoclass = GENERAL_REGS; +} } -/* Resize reg info. The new elements will be uninitialized. Return - TRUE if new elements (for new pseudos) were added. */ +/* Resize reg info. The new elements will be initialized. Return TRUE + if new pseudos were added since the last call. */ bool resize_reg_info (void) { - int old; + int old, i; + bool change_p; if (reg_pref == NULL) { allocate_reg_info (); return true; } - if (reg_info_size == max_reg_num ()) -return false; + change_p = max_regno_since_last_resize != max_reg_num (); + max_regno_since_last_resize = max_reg_num (); + if (reg_info_size >= max_reg_num ()) +return change_p; old = reg_info_size; - reg_info_size = max_reg_num (); + reg_info_size = max_reg_num () * 3 / 2 + 1; gcc_assert (reg_pref && reg_renumber); reg_renumber = XRESIZEVEC (short, reg_renumber, reg_info_size); reg_pref = XRESIZEVEC (struct reg_pref, reg_pref, reg_info_size); memset (reg_pref + old, -1, (reg_info_size - old) * sizeof (struct reg_pref)); memset (reg_renumber + old, -1, (reg_info_size - old) * sizeof (short)); + for (i = old; i < reg_info_size; i++) +{ + reg_pref[i].prefclass = GENERAL_REGS; + reg_pref[i].altclass = ALL_REGS; + reg_pref[i].allocnoclass = GENERAL_REGS; +} return true; } @@ -938,6 +961,7 @@ reginfo_init (void) /* This prevents dump_reg_info from losing if called before reginfo is run. */ reg_pref = NULL; + reg_info_size = max_regno_since_last_resize = 0; /* No more global register variables may be declared. */ no_global_reg_vars = 1; return 1; @@ -964,7 +988,7 @@ struct rtl_opt_pass pass_reginfo_init = -/* Set up preferred, alternate, and cover classes for REGNO as +/* Set up preferred, alternate, and allocno classes for REGNO as PREFCLASS, ALTCLASS, and ALLOCNOCLASS. */ void setup_reg_classes (int regno, @@ -973,7 +997,7 @@ setup_reg_classes (int regno, { if (reg_pref == NULL) return; - gcc_assert (reg_info_size == max_reg_num ()); + gcc_assert (reg_info_size >= max_reg_num ()); reg_pref[regno].prefclass = prefclass; reg_pref[regno].altclass = altclass; reg_pref[regno].allocnoclass = allocnoclass;
Re: abs(long long)
On Wed, 3 Oct 2012, Gabriel Dos Reis wrote: On Tue, Oct 2, 2012 at 1:53 PM, Marc Glisse wrote: * include/c_std/cstdlib (abs(long long)): Define with __builtin_llabs when we have long long. (abs(__int128)): Define when we have __int128. This change is OK Thanks, I'll commit that part only. (div(long long, long long)): Use lldiv. not this one. Ok. Note that glibc's implementation does more than just / and %. Possible reasons are: 1) glibc does useless work 2) libstdc++ has a bug 3) there are platforms supported by glibc but not by libstdc++ I choose to believe it is option 3. -- Marc Glisse
Re: [PATCH] Fix PR54782
Thanks for the comments. The patch was updated as attached. Dehao On Wed, Oct 3, 2012 at 11:46 AM, Jakub Jelinek wrote: > On Wed, Oct 03, 2012 at 11:26:09AM -0700, Dehao Chen wrote: >> @@ -6340,6 +6341,20 @@ move_block_to_fn (struct function *dest_cfun, basi >> SET_USE (use, replace_ssa_name (op, d->vars_map, dest_cfun->decl)); >> } >> >> + for (i = 0; i < EDGE_COUNT (bb->preds); i++) >> + { >> + location_t locus = gimple_phi_arg_location (phi, i); >> + if (locus != UNKNOWN_LOCATION) >> + { >> + tree block = LOCATION_BLOCK (locus); >> + if (d->orig_block == NULL_TREE > || block == d->orig_block) >> + gimple_phi_arg_set_location (phi, i, d->new_block ? >> + COMBINE_LOCATION_DATA (line_table, locus, d->new_block) : >> + LOCATION_LOCUS (locus)); >> + } >> + } >> + > > The formatting is wrong on this. ? and : would need to go to the next line > and even the indentation of part of an argument is weird. > IMHO better just do: > if (d->orig_block == NULL_TREE || block == d->orig_block) > { > if (d->new_block) > locus = COMBINE_LOCATION_DATA (line_table, locus, >d->new_block); > else > locus = LOCATION_LOCUS (locus); > gimple_phi_arg_set_location (phi, i, locus); > } > > It will be far more readable. > And/or to decrease indentation level you could do > if (locus == UNKNOWN_LOCATION) > continue; > block = LOCATION_BLOCK (locus); > ... > > Jakub Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 192041) +++ gcc/tree-cfg.c (working copy) @@ -6322,6 +6322,7 @@ move_block_to_fn (struct function *dest_cfun, basi use_operand_p use; tree op = PHI_RESULT (phi); ssa_op_iter oi; + unsigned i; if (virtual_operand_p (op)) { @@ -6340,6 +6341,23 @@ move_block_to_fn (struct function *dest_cfun, basi SET_USE (use, replace_ssa_name (op, d->vars_map, dest_cfun->decl)); } + for (i = 0; i < EDGE_COUNT (bb->preds); i++) + { + location_t locus = gimple_phi_arg_location (phi, i); + tree block = LOCATION_BLOCK (locus); + + if (locus == UNKNOWN_LOCATION) + continue; + if (d->orig_block == NULL_TREE || block == d->orig_block) + { + if (d->new_block == NULL_TREE) + locus = LOCATION_LOCUS (locus); + else + locus = COMBINE_LOCATION_DATA (line_table, locus, d->new_block); + gimple_phi_arg_set_location (phi, i, locus); + } + } + gsi_next (&si); } Index: gcc/testsuite/gcc.dg/pr54782.c === --- gcc/testsuite/gcc.dg/pr54782.c (revision 0) +++ gcc/testsuite/gcc.dg/pr54782.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O -ffast-math -ftree-parallelize-loops=2 -g" } */ + +struct S +{ + int n; + float *a; +}; + +int +foo (struct S *s) +{ + float sum = 0; + int i; + for (i = 0; i < s->n; i++) +sum += s->a[i]; + return sum; +}
Re: [PATCH] Fix PR54782
On Wed, Oct 03, 2012 at 01:09:27PM -0700, Dehao Chen wrote: > Thanks for the comments. The patch was updated as attached. Ok, thanks. Jakub
Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars
Tobias, Sorry to return to this patch so long after the discussion, but I have been investigating testsuite failures on AIX and this patch introduced a number of Fortran failures. + if (sym->module) + name = gfc_get_string (".__%s_MOD_%s", sym->module, sym->name); + else + name = gfc_get_string (".%s", sym->name); Symbols beginning with "." have a well-known meaning on AIX, so the compiler cannot assume that it is safe to mangle a name by prepending a period. For instance, the test allocatable_function_5.s generates the following global symbol definitions in the assembler: .globl __m_MOD_mbar .globl .__m_MOD_mbar .csect __m_MOD_mbar[DS] __m_MOD_mbar: <--- function descriptor __m_MOD_mbar .long .__m_MOD_mbar, TOC[tc0], 0 .csect .text[PR] .__m_MOD_mbar: <--- function address .__m_MOD_mbar ... .globl .__m_MOD_mbar .csect .data[RW],4 .align 2 .__m_MOD_mbar: <--- mangled string name .__m_MOD_mbar .space 4 Unsurprisingly, two symbols with the same name are a problem and the AIX assembler complains about the redefinition. I am not sure why you chose a period and how best to correct this. Thanks, David
Re: patch to fix
On Wed, 3 Oct 2012, Kenneth Zadeck wrote: The patch defines a new datatype, a 'wide_int' (defined in wide-int.[ch], and this datatype will be used to perform all of the integer constant math in the compiler. Externally, wide-int is very similar to double-int except that it does not have the limitation that math must be done on exactly two HOST_WIDE_INTs. Internally, a wide_int is a structure that contains a fixed sized array of HOST_WIDE_INTs, a length field and a mode. The size of the array is determined at generation time by dividing the number of bits of the largest integer supported on the target by the number of bits in a HOST_WIDE_INT of the host. Thus, with this format, any length of integer can be supported on any host. Hello, did you consider making the size of wide_int a template parameter, now that we are using C++? All with a convenient typedef or macro so it doesn't show. I am asking because in vrp I do some arithmetic that requires 2*N+1 bits where N is the size of double_int. -- Marc Glisse
Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars
David, David Edelsohn wrote: I am not sure why you chose a period and how best to correct this. Well, in principle any name which the user cannot enter would do. (Not enter: At least not as valid Fortran identifier.) The reason for choosing "." is that is used elsewhere in gfortran for such identifier for the string-length variable belonging to , e.g. "._result" in trans-decl.c. I assume the reason that it didn't pop up with those is that those are local variables, but I wouldn't be surprised if it would break elsewhere. I wonder whether "@" would work, otherwise, one could also use "_". The only other problem is that it will break the ABI. On the other hand, it's a rather new feature and if we bump the .mod version number, the chance that one effectively forces the user to re-compile is rather high. So far we always bumped the .mod version number as something changed. There are also some other patches pending which effectively lead to a bump in the .mod version. (The .mod version won't affect code which doesn't use modules such as BLAS/LAPACK or any Fortran 66/77 code, but those won't be affected by the ABI change anyway as there the name doesn't propagate as it does with modules..) Thanks for investigating the test-suite failure. Tobias
opts.c, gcc.c: Plug some memory leaks - and an out-of-bounds memory access
Found using http://scan5.coverity.com/ Build on x86-64-gnu-linux with C/C++/Fortran. I will now do an all-language build/regtest. OK when it passes? (Note to the save_string call: I reduced it by 2: The "+1" in the call makes it long (out of bounds) and the "+1" in temp_filename_length is not needed (but also doesn't harm) as "tmp" is null terminated and save_string adds another '\0' after copying "len" bytes.) Tobias 2012-10-03 Tobias Burnus * gcc.c (record_temp_file, add_sysrooted_prefix, process_command, do_self_spec, compare_debug_dump_opt_spec_function): Plug memleaks. (do_spec_1): Ditto, fix out-of-bound access. * opts.c (common_handle_option): Plug memleak. diff --git a/gcc/gcc.c b/gcc/gcc.c index af3c34a..2d56d40 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -1981,21 +1981,24 @@ record_temp_file (const char *filename, int always_delete, int fail_delete) always_delete_queue = temp; already1:; } if (fail_delete) { struct temp_file *temp; for (temp = failure_delete_queue; temp; temp = temp->next) if (! filename_cmp (name, temp->name)) - goto already2; + { + free (name); + goto already2; + } temp = XNEW (struct temp_file); temp->next = failure_delete_queue; temp->name = name; failure_delete_queue = temp; already2:; } } @@ -2455,22 +2458,29 @@ add_sysrooted_prefix (struct path_prefix *pprefix, const char *prefix, if (target_system_root) { char *sysroot_no_trailing_dir_separator = xstrdup (target_system_root); size_t sysroot_len = strlen (target_system_root); if (sysroot_len > 0 && target_system_root[sysroot_len - 1] == DIR_SEPARATOR) sysroot_no_trailing_dir_separator[sysroot_len - 1] = '\0'; if (target_sysroot_suffix) - prefix = concat (target_sysroot_suffix, prefix, NULL); - prefix = concat (sysroot_no_trailing_dir_separator, prefix, NULL); + { + char *tmp; + tmp = concat (target_sysroot_suffix, prefix, NULL); + prefix = concat (sysroot_no_trailing_dir_separator, tmp, NULL); + free (tmp); + } + else + prefix = concat (sysroot_no_trailing_dir_separator, prefix, NULL); + free (sysroot_no_trailing_dir_separator); /* We have to override this because GCC's notion of sysroot moves along with GCC. */ component = "GCC"; } add_prefix (pprefix, prefix, component, priority, require_machine_suffix, os_multilib); } @@ -3564,21 +3574,21 @@ set_option_handlers (struct cl_option_handlers *handlers) /* Create the vector `switches' and its contents. Store its length in `n_switches'. */ static void process_command (unsigned int decoded_options_count, struct cl_decoded_option *decoded_options) { const char *temp; char *temp1; - const char *tooldir_prefix; + char *tooldir_prefix, *tooldir_prefix2; char *(*get_relative_prefix) (const char *, const char *, const char *) = NULL; struct cl_option_handlers handlers; unsigned int j; gcc_exec_prefix = getenv ("GCC_EXEC_PREFIX"); n_switches = 0; n_infiles = 0; added_libraries = 0; @@ -3913,36 +3923,38 @@ process_command (unsigned int decoded_options_count, add_prefix (&exec_prefixes, standard_libexec_prefix, "BINUTILS", PREFIX_PRIORITY_LAST, 2, 0); add_prefix (&exec_prefixes, standard_exec_prefix, "BINUTILS", PREFIX_PRIORITY_LAST, 2, 0); #endif add_prefix (&startfile_prefixes, standard_exec_prefix, "BINUTILS", PREFIX_PRIORITY_LAST, 1, 0); } gcc_assert (!IS_ABSOLUTE_PATH (tooldir_base_prefix)); - tooldir_prefix = concat (tooldir_base_prefix, spec_machine, - dir_separator_str, NULL); + tooldir_prefix2 = concat (tooldir_base_prefix, spec_machine, + dir_separator_str, NULL); /* Look for tools relative to the location from which the driver is running, or, if that is not available, the configured prefix. */ tooldir_prefix = concat (gcc_exec_prefix ? gcc_exec_prefix : standard_exec_prefix, spec_machine, dir_separator_str, - spec_version, dir_separator_str, tooldir_prefix, NULL); + spec_version, dir_separator_str, tooldir_prefix2, NULL); + free (tooldir_prefix2); add_prefix (&exec_prefixes, concat (tooldir_prefix, "bin", dir_separator_str, NULL), "BINUTILS", PREFIX_PRIORITY_LAST, 0, 0); add_prefix (&startfile_prefixes, concat (tooldir_prefix, "lib", dir_separator_str, NULL), "BINUTILS", PREFIX_PRIORITY_LAST, 0, 1); + free (tooldir_prefix); #if defined(TARGET_SYSTEM_ROOT_RELOCATABLE) && !defined(VMS) /* If the normal TARGET_SYSTEM_ROOT is inside of $exec_prefix, then consider it to relocate with the rest of the GCC installation if GCC_EXEC_PREFIX is set. ``make_relative_prefix'' is not compiled for VMS, so don't call it. */ if (target_system_root && !target_system_root_changed && gcc_exec_prefix) { char *tmp_prefix = get_relative_
Re: [wwwdocs] Buildstat update for 4.4
On Tue, 2 Oct 2012, Tom G. Christensen wrote: > Latest results for 4.4.x Thanks, Tom. Applied. Gerald
[lra] patch taking LRA review into account
The following patch takes comments and proposals from Jeff Law's and Richard Sandiford's reviews of LRA. The patch was successfully bootstraped on x86/x86-64. Commited as rev. 192050. 2012-10-03 Vladimir Makarov * sched-vis.c (print_value_slim): Fix the comment. * targhooks.h (default_register_bank): Rename to default_register_priority. * targhooks.c (default_register_bank): Ditto. * target.def (register_bank): Rename to register_priority. (spill_class): Add a new argument (spill_class_mode): Remove. * doc/tm.texi.in (TARGET_REGISTER_BANK): Rename to TARGET_REGISTER_PRIORITY. (TARGET_SPILL_CLASS_MODE): Remove. * doc/tm.texi: Update. * lra.c (setup_reg_spill_flag): Provide a new argument to spill_class. * lra-assigns.c (find_hard_regno_for): Use register_priority instead of register_bank. * lra-spills.c (assign_spill_hard_regs): Don't use hook spill_class_mode. Provide a new argument to spill_class hook. * config/i386/i386.c (ix86_register_bank): Rename to ix86_register_priority. (ix86_spill_class): Add new parameter. (ix86_spill_class_mode): Remove. (TARGET_REGISTER_BANK): Rename to TARGET_REGISTER_PRIORITY. (TARGET_SPILL_CLASS_MODE): Remove. Index: config/i386/i386.c === --- config/i386/i386.c (revision 192048) +++ config/i386/i386.c (working copy) @@ -31705,9 +31705,9 @@ ix86_lra_p (void) return true; } -/* Return a register bank number for hard reg REGNO. */ +/* Return a register priority for hard reg REGNO. */ static int -ix86_register_bank (int hard_regno) +ix86_register_priority (int hard_regno) { /* ebp and r13 as the base always wants a displacement, r12 as the base always wants an index. So discourage their usage in an @@ -40525,37 +40525,20 @@ ix86_autovectorize_vector_sizes (void) -/* Return class of registers which could be used for pseudo of class - RCLASS for spilling instead of memory. Return NO_REGS if it is not - possible or non-profitable. */ -static enum reg_class -ix86_spill_class (enum reg_class rclass) +/* Return class of registers which could be used for pseudo of MODE + and of class RCLASS for spilling instead of memory. Return NO_REGS + if it is not possible or non-profitable. */ +static reg_class_t +ix86_spill_class (reg_class_t rclass, enum machine_mode mode) { if (TARGET_SSE && TARGET_GENERAL_REGS_SSE_SPILL && hard_reg_set_subset_p (reg_class_contents[rclass], -reg_class_contents[GENERAL_REGS])) +reg_class_contents[GENERAL_REGS]) + && (mode == SImode || (TARGET_64BIT && mode == DImode))) return SSE_REGS; return NO_REGS; } -/* Return mode in which pseudo of MODE and RCLASS can be spilled into - a register of class SPILL_CLASS. Return VOIDmode if it is not - possible. */ -static enum machine_mode -ix86_spill_class_mode (enum reg_class rclass, enum reg_class spill_class, - enum machine_mode mode) -{ - if (! TARGET_SSE || ! TARGET_GENERAL_REGS_SSE_SPILL - || ! hard_reg_set_subset_p (reg_class_contents[rclass], - reg_class_contents[GENERAL_REGS]) - || spill_class != SSE_REGS) -return VOIDmode; - if (mode == SImode || (TARGET_64BIT && mode == DImode)) -return mode; - return VOIDmode; -} - - /* Implement targetm.vectorize.init_cost. */ static void * @@ -40961,8 +40944,8 @@ ix86_memmodel_check (unsigned HOST_WIDE_ #undef TARGET_LRA_P #define TARGET_LRA_P ix86_lra_p -#undef TARGET_REGISTER_BANK -#define TARGET_REGISTER_BANK ix86_register_bank +#undef TARGET_REGISTER_PRIORITY +#define TARGET_REGISTER_PRIORITY ix86_register_priority #undef TARGET_LEGITIMATE_CONSTANT_P #define TARGET_LEGITIMATE_CONSTANT_P ix86_legitimate_constant_p @@ -40990,9 +40973,6 @@ ix86_memmodel_check (unsigned HOST_WIDE_ #undef TARGET_SPILL_CLASS #define TARGET_SPILL_CLASS ix86_spill_class -#undef TARGET_SPILL_CLASS_MODE -#define TARGET_SPILL_CLASS_MODE ix86_spill_class_mode - struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-i386.h" Index: doc/tm.texi === --- doc/tm.texi (revision 192048) +++ doc/tm.texi (working copy) @@ -2897,20 +2897,16 @@ as below: A target hook which returns true if we use LRA instead of reload pass. It means that LRA was ported to the target.The default version of this target hook returns always false. @end deftypefn -@deftypefn {Target Hook} int TARGET_REGISTER_BANK (int) -A target hook which returns the register bank number to which the register @var{hard_regno} belongs to. The smaller the number, the more preferable the hard register usage (when all other conditions are the same). This hook can be used to prefer some hard register over others in LRA. For example, some x86-64 register usage needs additional prefi
Re: RFC: LRA for x86/x86-64 [2/9]
On 09/28/2012 11:36 AM, Jeff Law wrote: On 09/28/2012 09:21 AM, Vladimir Makarov wrote: On 12-09-28 4:43 AM, Steven Bosscher wrote: On Fri, Sep 28, 2012 at 12:57 AM, Vladimir Makarov wrote: LRA outputs a lot debug information about insns. I found that using slim insn/rtl presentation helps a lot for LRA debuging. The following patch makes slim presentation printing functions visible to LRA. It also implements one more such function. 2012-09-27 Vladimir Makarov * rtl.h (debug_bb_n_slim, debug_bb_slim, print_value_slim): New prototypes. (debug_rtl_slim, debug_insn_slim): Ditto. * sched-vis.c (print_value_slim): New. I have patches in the works to use the slim RTL dumping format more, too, and to use the pretty-printer code so that printing strings with escaped characters can be made more transparent (e.g. for use in GraphViz dumps). That would be nice. Slim printing is very useful for LRA which prints a lot of changes in RTL code during all its work. Regular printing is unreadable because of its volume. For LRA debugging I usually find a suspicious place in slim dump and if I need more info I use regular dump of the suspicious insn. Perhaps it's time to rename sched-vis.c to print-rtl-slim.c? :-) Yes, the name sched-vis.c is very misleading. It seems to me this change ought to go forward now (rename to print-rtl-slim.c and add print_value_slim. WRT print_value_slim we have this block comment: +/* Prints rtxes, I customarily classified as values. They're + constants, registers, labels, symbols and memory accesses. Print + them to file F. */ That block comment just doesn't make sense when I read it. It seems like. /* Print X, an RTL value node, to file F in slim format. Include additional information if VERBOSE is nonzero. Value nodes are constants, registers, labels, symbols and memory. */ With that change I think you could make the obvious changes necessary to rename to print-rtl-slim and check this patch in. Unfortunately, it is not easy. The right thing to do is to make it independent from scheduler. There are dependecies on insn scheduler in this file. Hooks print_insn and current_sched_info are used in the code. So I just fixed the comment and will work on removing dependencies later if I have time.
[SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing
Hello, This adds the two common built-in functions __builtin_thread_pointer and __builtin_set_thread_pointer to the SH port. I've done it in a way so that hopefully it can be transitioned to target independent thread pointer built-ins easily, as suggested by Richard a while ago: http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00946.html Originally I wanted to wait until the target independent bits are in, but somehow the thread mentioned above died and I got impatient. I've also added support for SH's GBR based displacement addressing modes. They are not used for general purpose mem loads/stores by the compiler, but rather when code accesses data behind the thread pointer (thread control block or something). The way GBR displacement address opportunities are discovered might not be the best way of doing this sort of thing, but it works. The insn walking could potentially slow down compile times, but it is only enabled for functions where the GBR is referenced, so it shouldn't be so bad. Alternatives and suggestions are highly appreciated. :) Tested on rev 191894 with 'make all' (c,c++) and 'make -k check-gcc RUNTESTFLAGS="sh.exp --target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"' For code that doesn't reference GBR there are no functional changes. TLS code that references GBR might trigger the 'sh_find_equiv_gbr_addr' function. Unfortunately TLS tests don't seem to work on sh-sim, so I could not test this part. Cheers, Oleg gcc/ChangeLog: PR target/54760 * config/sh/sh.md (define_constants): Add UNSPECV_GBR. (get_thread_pointer, set_thread_pointer): New expanders. (load_gbr): Rename to store_gbr. Remove GBR_REG use. (store_gbr): New insn. (*mov_gbr_load, *mov_gbr_store): New insns and accompanying unnamed splits. * config/sh/predicates.md (general_movsrc_operand, general_movdst_operand): Reject GBR addresses. * config/sh/sh-protos.h (sh_find_equiv_gbr_addr): New declaration. * config/sh/sh.c (prepare_move_operands): Use gen_store_gbr instead of gen_load_gbr in TLS_MODEL_LOCAL_EXEC case. (sh_address_cost, sh_legitimate_address_p, sh_secondary_reload): Handle GBR addresses. (builtin_description): Add is_enabled member. (shmedia_builtin, sh1_builtin): New functions. (signature_args): Add SH_BLTIN_VP. (bdesc): Use shmedia_builtin for existing built-ins. Add __builtin_thread_pointer and __builtin_set_thread_pointer as sh1_builtin. (sh_media_init_builtins, sh_init_builtins): Merge into single function sh_init_builtins. Add is_enabled checking. (sh_media_builtin_decl, sh_builtin_decl): Merge into single function sh_builtin_decl. Add is_enabled checking. (base_reg_disp): New class. (sh_find_base_reg_disp, sh_find_equiv_gbr_addr): New functions. testsuite/ChangeLog: PR target/54760 * gcc.target/sh/pr54706-1.c: New. * gcc.target/sh/pr54706-2.c: New. * gcc.target/sh/pr54706-3.c: New. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 191894) +++ gcc/config/sh/sh.md (working copy) @@ -175,6 +175,7 @@ (UNSPECV_WINDOW_END 10) (UNSPECV_CONST_END 11) (UNSPECV_EH_RETURN 12) + (UNSPECV_GBR 13) ]) ;; - @@ -10029,13 +10030,165 @@ DONE; }) -(define_insn "load_gbr" - [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG)) - (use (reg:SI GBR_REG))] +;;-- +;; Thread pointer getter and setter. +;; +;; On SH the thread pointer is kept in the GBR. +;; These patterns are usually expanded from the respective built-in functions. +(define_expand "get_thread_pointer" + [(set (match_operand:SI 0 "register_operand") (reg:SI GBR_REG))] + "TARGET_SH1") + +(define_insn "store_gbr" + [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG))] "" "stc gbr,%0" [(set_attr "type" "tls_load")]) +(define_expand "set_thread_pointer" + [(set (reg:SI GBR_REG) + (unspec_volatile:SI [(match_operand:SI 0 "register_operand")] + UNSPECV_GBR))] + "TARGET_SH1") + +(define_insn "load_gbr" + [(set (reg:SI GBR_REG) + (unspec_volatile:SI [(match_operand:SI 0 "register_operand" "r")] + UNSPECV_GBR))] + "TARGET_SH1" + "ldc %0,gbr" + [(set_attr "type" "move")]) + +;;-- +;; Thread pointer relative memory loads and stores. +;; +;; On SH there are GBR displacement address modes which can be utilized to +;; access memory behind the thread pointer. +;; Since we do not allow using GBR for general purpose memory accesses, these +;; GBR addressing modes are formed by the combine pass. +;; This could b
Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing
On Wed, 2012-10-03 at 23:21 +0200, Oleg Endo wrote: > testsuite/ChangeLog: > > PR target/54760 > * gcc.target/sh/pr54706-1.c: New. > * gcc.target/sh/pr54706-2.c: New. > * gcc.target/sh/pr54706-3.c: New. Obviously there is a typo in the file names for the test cases. Attached is the corrected patch + changelog. Cheers, Oleg gcc/ChangeLog: PR target/54760 * config/sh/sh.md (define_constants): Add UNSPECV_GBR. (get_thread_pointer, set_thread_pointer): New expanders. (load_gbr): Rename to store_gbr. Remove GBR_REG use. (store_gbr): New insn. (*mov_gbr_load, *mov_gbr_store): New insns and accompanying unnamed splits. * config/sh/predicates.md (general_movsrc_operand, general_movdst_operand): Reject GBR addresses. * config/sh/sh-protos.h (sh_find_equiv_gbr_addr): New declaration. * config/sh/sh.c (prepare_move_operands): Use gen_store_gbr instead of gen_load_gbr in TLS_MODEL_LOCAL_EXEC case. (sh_address_cost, sh_legitimate_address_p, sh_secondary_reload): Handle GBR addresses. (builtin_description): Add is_enabled member. (shmedia_builtin, sh1_builtin): New functions. (signature_args): Add SH_BLTIN_VP. (bdesc): Use shmedia_builtin for existing built-ins. Add __builtin_thread_pointer and __builtin_set_thread_pointer as sh1_builtin. (sh_media_init_builtins, sh_init_builtins): Merge into single function sh_init_builtins. Add is_enabled checking. (sh_media_builtin_decl, sh_builtin_decl): Merge into single function sh_builtin_decl. Add is_enabled checking. (base_reg_disp): New class. (sh_find_base_reg_disp, sh_find_equiv_gbr_addr): New functions. testsuite/ChangeLog: PR target/54760 * gcc.target/sh/pr54760-1.c: New. * gcc.target/sh/pr54760-2.c: New. * gcc.target/sh/pr54760-3.c: New. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 191894) +++ gcc/config/sh/sh.md (working copy) @@ -175,6 +175,7 @@ (UNSPECV_WINDOW_END 10) (UNSPECV_CONST_END 11) (UNSPECV_EH_RETURN 12) + (UNSPECV_GBR 13) ]) ;; - @@ -10029,13 +10030,165 @@ DONE; }) -(define_insn "load_gbr" - [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG)) - (use (reg:SI GBR_REG))] +;;-- +;; Thread pointer getter and setter. +;; +;; On SH the thread pointer is kept in the GBR. +;; These patterns are usually expanded from the respective built-in functions. +(define_expand "get_thread_pointer" + [(set (match_operand:SI 0 "register_operand") (reg:SI GBR_REG))] + "TARGET_SH1") + +(define_insn "store_gbr" + [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GBR_REG))] "" "stc gbr,%0" [(set_attr "type" "tls_load")]) +(define_expand "set_thread_pointer" + [(set (reg:SI GBR_REG) + (unspec_volatile:SI [(match_operand:SI 0 "register_operand")] + UNSPECV_GBR))] + "TARGET_SH1") + +(define_insn "load_gbr" + [(set (reg:SI GBR_REG) + (unspec_volatile:SI [(match_operand:SI 0 "register_operand" "r")] + UNSPECV_GBR))] + "TARGET_SH1" + "ldc %0,gbr" + [(set_attr "type" "move")]) + +;;-- +;; Thread pointer relative memory loads and stores. +;; +;; On SH there are GBR displacement address modes which can be utilized to +;; access memory behind the thread pointer. +;; Since we do not allow using GBR for general purpose memory accesses, these +;; GBR addressing modes are formed by the combine pass. +;; This could be done with fewer patterns than below by using a mem predicate +;; for the GBR mem, but then reload would try to reload addresses with a +;; zero displacement for some strange reason. + +(define_insn "*mov_gbr_load" + [(set (match_operand:QIHISI 0 "register_operand" "=z") + (mem:QIHISI (plus:SI (reg:SI GBR_REG) + (match_operand:QIHISI 1 "gbr_displacement"] + "TARGET_SH1" + "mov. @(%O1,gbr),%0" + [(set_attr "type" "load")]) + +(define_insn "*mov_gbr_load" + [(set (match_operand:QIHISI 0 "register_operand" "=z") + (mem:QIHISI (reg:SI GBR_REG)))] + "TARGET_SH1" + "mov. @(0,gbr),%0" + [(set_attr "type" "load")]) + +(define_insn "*mov_gbr_load" + [(set (match_operand:SI 0 "register_operand" "=z") + (sign_extend:SI + (mem:QIHI (plus:SI (reg:SI GBR_REG) + (match_operand:QIHI 1 "gbr_displacement")] + "TARGET_SH1" + "mov. @(%O1,gbr),%0" + [(set_attr "type" "load")]) + +(define_insn "*mov_gbr_load" + [(set (match_operand:SI 0 "register_operand" "=z") + (sign_extend:SI (mem:QIHI (reg:SI GBR_REG] + "TARGET_SH1" + "mov. @(0,gbr),%0" + [(set_attr "type" "load")]) + +(define_insn "*mov_gbr_store" + [
Re: patch to fix
i have already converted the vrp code, so i have some guess at where you are talking about. (of course correct me if i am wrong). in the code that computes the range when two variables are multiplied together needs to do a multiplication that produces a result that is twice as wide as the inputs. my library is able to do that with one catch (and this is a big catch): the target has to have an integer mode that is twice as big as the mode of the operands. The issue is that wide-ints actually carry around the mode of the value in order to get the bitsize and precision of the operands (it does not have the type, because this code has to both work on the rtl and tree level and i generally do not want the signness anyway). my current code in vrp checks to see if such a mode exists and if it does, it produces the product. if the mode does not exist, it returns bottom. What this means is that for most (many or some) targets that have a TImode, the largest thing that particular vrp discover ranges for is a DImode value. We could get around this by defining the next larger mode than what the target really needs but i wonder how much mileage you are going to get out of that with really large numbers. Of course you could have something else in mind. kenny On 10/03/2012 04:47 PM, Marc Glisse wrote: On Wed, 3 Oct 2012, Kenneth Zadeck wrote: The patch defines a new datatype, a 'wide_int' (defined in wide-int.[ch], and this datatype will be used to perform all of the integer constant math in the compiler. Externally, wide-int is very similar to double-int except that it does not have the limitation that math must be done on exactly two HOST_WIDE_INTs. Internally, a wide_int is a structure that contains a fixed sized array of HOST_WIDE_INTs, a length field and a mode. The size of the array is determined at generation time by dividing the number of bits of the largest integer supported on the target by the number of bits in a HOST_WIDE_INT of the host. Thus, with this format, any length of integer can be supported on any host. Hello, did you consider making the size of wide_int a template parameter, now that we are using C++? All with a convenient typedef or macro so it doesn't show. I am asking because in vrp I do some arithmetic that requires 2*N+1 bits where N is the size of double_int.
[Patch, Fortran, F08] PR 45521: GENERIC resolution with ALLOCATABLE/POINTER and PROCEDURE
Hi all, the attached patch implements an F08 feature, which allows to distinguish two specific procedures in a generic interface, based on the POINTER and ALLOCATABLE attribute of their arguments. In addition to this, the patch fixes a bug in rejecting data actual arguments passed to procedure formal arguments. The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk? Cheers, Janus 2012-10-03 Janus Weil PR fortran/45521 * interface.c (generic_correspondence): Implement additional distinguishability criteria of F08. (compare_actual_formal): Reject data object as actual argument for procedure formal argument. 2012-10-03 Janus Weil PR fortran/45521 * gfortran.dg/generic_25.f90: New. * gfortran.dg/generic_26.f90: New. * gfortran.dg/generic_27.f90: New. pr45521_v2.diff Description: Binary data generic_25.f90 Description: Binary data generic_26.f90 Description: Binary data generic_27.f90 Description: Binary data
libbacktrace patch committed: Fix leb128 overflow test
This patch to libbacktrace fixes the overflow test for the leb128 reading routines. What matters is not the shift after the loop, but the shift within the loop. This also removes the setting of unit_buf.start in build_address_map, which was simply wrong and was causing the error message to print the wrong offset in the .debug_info section. Bootstrapped and ran libbacktrace testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian 2012-10-03 Ian Lance Taylor * dwarf.c (read_uleb128): Fix overflow test. (read_sleb128): Likewise. (build_address_map): Don't change unit_buf.start. Index: dwarf.c === --- dwarf.c (revision 191858) +++ dwarf.c (working copy) @@ -524,10 +524,12 @@ read_uleb128 (struct dwarf_buf *buf) { uint64_t ret; unsigned int shift; + int overflow; unsigned char b; ret = 0; shift = 0; + overflow = 0; do { const unsigned char *p; @@ -536,14 +538,17 @@ read_uleb128 (struct dwarf_buf *buf) if (!advance (buf, 1)) return 0; b = *p; - ret |= ((uint64_t) (b & 0x7f)) << shift; + if (shift < 64) + ret |= ((uint64_t) (b & 0x7f)) << shift; + else if (!overflow) + { + dwarf_buf_error (buf, "LEB128 overflows uint64_t"); + overflow = 1; + } shift += 7; } while ((b & 0x80) != 0); - if (shift > 64) -dwarf_buf_error (buf, "LEB128 overflows uint64_5"); - return ret; } @@ -554,10 +559,12 @@ read_sleb128 (struct dwarf_buf *buf) { uint64_t val; unsigned int shift; + int overflow; unsigned char b; val = 0; shift = 0; + overflow = 0; do { const unsigned char *p; @@ -566,15 +573,18 @@ read_sleb128 (struct dwarf_buf *buf) if (!advance (buf, 1)) return 0; b = *p; - val |= ((uint64_t) (b & 0x7f)) << shift; + if (shift < 64) + val |= ((uint64_t) (b & 0x7f)) << shift; + else if (!overflow) + { + dwarf_buf_error (buf, "signed LEB128 overflows uint64_t"); + overflow = 1; + } shift += 7; } while ((b & 0x80) != 0); - if (shift > 64) -dwarf_buf_error (buf, "signed LEB128 overflows uint64_t"); - - if ((b & 0x40) != 0) + if ((b & 0x40) != 0 && shift < 64) val |= ((uint64_t) -1) << shift; return (int64_t) val; @@ -1262,7 +1272,6 @@ build_address_map (struct backtrace_stat } unit_buf = info; - unit_buf.start = info.buf; unit_buf.left = len; if (!advance (&info, len))
[SH] PR 33135 - Remove mieee option in libgcc
Hello, Since the -mieee behavior has been fixed, is enabled by default on SH and the additional flags in libgcc can be removed. OK? Cheers, Oleg libgcc/ChangeLog: PR target/33135 * config/sh/t-sh (HOST_LIBGCC2_CFLAGS): Delete. * config/sh/t-netbsd (HOST_LIBGCC2_CFLAGS): Delete. * config/sh/t-linux (HOST_LIBGCC2_CFLAGS): Remove mieee option. Index: libgcc/config/sh/t-sh === --- libgcc/config/sh/t-sh (revision 192050) +++ libgcc/config/sh/t-sh (working copy) @@ -59,5 +59,3 @@ libgcc-4-300.a: div_table-4-300.o $(AR_CREATE_FOR_TARGET) $@ div_table-4-300.o -HOST_LIBGCC2_CFLAGS += -mieee - Index: libgcc/config/sh/t-netbsd === --- libgcc/config/sh/t-netbsd (revision 192050) +++ libgcc/config/sh/t-netbsd (working copy) @@ -1,3 +1,2 @@ LIB1ASMFUNCS_CACHE = _ic_invalidate -HOST_LIBGCC2_CFLAGS += -mieee Index: libgcc/config/sh/t-linux === --- libgcc/config/sh/t-linux (revision 192051) +++ libgcc/config/sh/t-linux (working copy) @@ -2,7 +2,7 @@ LIB2ADD = $(srcdir)/config/sh/linux-atomic.c -HOST_LIBGCC2_CFLAGS += -mieee -DNO_FPSCR_VALUES +HOST_LIBGCC2_CFLAGS += -DNO_FPSCR_VALUES # Silence atomic built-in related warnings in linux-atomic.c. # Unfortunately the conflicting types warning can't be disabled selectively.
Re: RFA: darwin PATCH to fix build, internal visibility
On Oct 3, 2012, at 12:04 PM, Jason Merrill wrote: > This patch fixes a couple of Darwin issues I noticed with a cross-compiler: > > 1) Adds a couple of consts to avoid const-correctness errors. > 2) Treats visibility "internal" like "hidden" rather than like "default". > The documentation says that internal is hidden + processor-specific > semantics, so treating it as just hidden makes sense to me. > > OK for trunk? Ok.
[PATCH] rs6000: Remove 'A' output modifier
It was used for old POWER only, which has been removed. Tested etc.; okay to apply? Segher 2012-10-04 Segher Boessenkool gcc/ * config/rs6000/rs6000.c (print_operand) ['A']: Delete. --- gcc/config/rs6000/rs6000.c | 11 --- 1 files changed, 0 insertions(+), 11 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 211087b..846a939 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -14676,17 +14676,6 @@ print_operand (FILE *file, rtx x, int code) { /* %a is output_address. */ -case 'A': - /* If X is a constant integer whose low-order 5 bits are zero, -write 'l'. Otherwise, write 'r'. This is a kludge to fix a bug -in the AIX assembler where "sri" with a zero shift count -writes a trash instruction. */ - if (GET_CODE (x) == CONST_INT && (INTVAL (x) & 31) == 0) - putc ('l', file); - else - putc ('r', file); - return; - case 'b': /* If constant, low-order 16 bits of constant, unsigned. Otherwise, write normally. */ -- 1.7.7.6
Re: Commit: RX: Warn about multiple fast interrupt routines.
On Wed, 3 Oct 2012, Nick Clifton wrote: > PS. I have even remembered to include a patch to the html > documentation as well! Yes, I was going to mention this appreciatively. ;-) A minor change I applied on top is the following, adding a closing and a dash in command-line. Thanks, Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.33 retrieving revision 1.35 diff -u -3 -p -r1.33 -r1.35 --- changes.html3 Oct 2012 16:12:56 - 1.33 +++ changes.html3 Oct 2012 22:33:46 - 1.35 @@ -244,7 +244,7 @@ by this change. This target will now issue a warning message whenever multiple fast interrupt handlers are found in the same cpmpilation unit. This feature can be turned off by the new -mno-warn-multiple-fast-interrupts -command line option. +command-line option. SH
Re: Convert more non-GTY htab_t to hash_table.
On 10/2/12, Richard Guenther wrote: > On Mon, 1 Oct 2012, Lawrence Crowl wrote: > >> Change more non-GTY hash tables to use the new type-safe template hash >> table. >> Constify member function parameters that can be const. >> Correct a couple of expressions in formerly uninstantiated templates. >> >> The new code is 0.362% faster in bootstrap, with a 99.5% confidence of >> being faster. >> >> Tested on x86-64. >> >> Okay for trunk? > > You are changing a hashtable used by fold checking, did you test > with fold checking enabled? > > +/* Data structures used to maintain mapping between basic blocks and > + copies. */ > +static hash_table bb_original; > +static hash_table bb_copy; > > note that because hash_table has a constructor we now get global > CTORs for all statics :( (and mx-protected local inits ...) > Can you please try to remove the constructor from hash_table to > avoid this overhead? (as a followup - that is, don't initialize htab) > > The cfg.c, dse.c and hash-table.h parts are ok for trunk, I'll leave the > rest to > respective maintainers of the pieces of the compiler. > > Thanks, > Richard. > >> >> Index: gcc/java/ChangeLog >> >> 2012-10-01 Lawrence Crowl >> >> * Make-lang.in (JAVA_OBJS): Add dependence on hash-table.o. >> (JCFDUMP_OBJS): Add dependence on hash-table.o. >> (jcf-io.o): Add dependence on hash-table.h. >> * jcf-io.c (memoized_class_lookups): Change to use type-safe hash table. >> >> Index: gcc/c/ChangeLog >> >> 2012-10-01 Lawrence Crowl >> >> * Make-lang.in (c-decl.o): Add dependence on hash-table.h. >> * c-decl.c (detect_field_duplicates_hash): Change to new type-safe >> hash table. >> >> Index: gcc/objc/ChangeLog >> >> 2012-10-01 Lawrence Crowl >> >> * Make-lang.in (OBJC_OBJS): Add dependence on hash-table.o. >> (objc-act.o): Add dependence on hash-table.h. >> * objc-act.c (objc_detect_field_duplicates): Change to new type-safe >> hash table. >> >> Index: gcc/ChangeLog >> >> 2012-10-01 Lawrence Crowl >> >> * Makefile.in (fold-const.o): Add depencence on hash-table.h. >> (dse.o): Likewise. >> (cfg.o): Likewise. >> * fold-const.c (fold_checksum_tree): Change to new type-safe hash table. >> * (print_fold_checksum): Likewise. >> * cfg.c (var bb_original): Likewise. >> * (var bb_copy): Likewise. >> * (var loop_copy): Likewise. >> * hash-table.h (template hash_table): Constify parameters for find... >> and remove_elt... member functions. >> (hash_table::empty) Correct size expression. >> (hash_table::clear_slot) Correct deleted entry assignment. >> * dse.c (var rtx_group_table): Change to new type-safe hash table. >> >> Index: gcc/cp/ChangeLog >> >> 2012-10-01 Lawrence Crowl >> >> * Make-lang.in (class.o): Add dependence on hash-table.h. >> (tree.o): Likewise. >> (semantics.o): Likewise. >> * class.c (fixed_type_or_null): Change to new type-safe hash table. >> * tree.c (verify_stmt_tree): Likewise. >> (verify_stmt_tree_r): Likewise. >> * semantics.c (struct nrv_data): Likewise. >> >> >> Index: gcc/java/Make-lang.in >> === >> --- gcc/java/Make-lang.in(revision 191941) >> +++ gcc/java/Make-lang.in(working copy) >> @@ -83,10 +83,10 @@ JAVA_OBJS = java/class.o java/decl.o jav >>java/zextract.o java/jcf-io.o java/win32-host.o java/jcf-parse.o >> java/mangle.o \ >>java/mangle_name.o java/builtins.o java/resource.o \ >>java/jcf-depend.o \ >> - java/jcf-path.o java/boehm.o java/java-gimplify.o >> + java/jcf-path.o java/boehm.o java/java-gimplify.o hash-table.o >> >> JCFDUMP_OBJS = java/jcf-dump.o java/jcf-io.o java/jcf-depend.o >> java/jcf-path.o \ >> -java/win32-host.o java/zextract.o ggc-none.o >> +java/win32-host.o java/zextract.o ggc-none.o hash-table.o >> >> JVGENMAIN_OBJS = java/jvgenmain.o java/mangle_name.o >> >> @@ -326,7 +326,7 @@ java/java-gimplify.o: java/java-gimplify >> # jcf-io.o needs $(ZLIBINC) added to cflags. >> CFLAGS-java/jcf-io.o += $(ZLIBINC) >> java/jcf-io.o: java/jcf-io.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ >> - $(JAVA_TREE_H) java/zipfile.h >> + $(JAVA_TREE_H) java/zipfile.h $(HASH_TABLE_H) >> >> # jcf-path.o needs a -D. >> CFLAGS-java/jcf-path.o += \ >> Index: gcc/java/jcf-io.c >> === >> --- gcc/java/jcf-io.c(revision 191941) >> +++ gcc/java/jcf-io.c(working copy) >> @@ -31,7 +31,7 @@ The Free Software Foundation is independ >> #include "jcf.h" >> #include "tree.h" >> #include "java-tree.h" >> -#include "hashtab.h" >> +#include "hash-table.h" >> #include >> >> #include "zlib.h" >> @@ -271,20 +271,34 @@ find_classfile (char *filename, JCF *jcf >>return open_class (filename, jcf, fd, dep_name); >> } >> >> -/* Returns 1 if the CLASSNAME (really a char *) matches the name >> -
Re: Convert more non-GTY htab_t to hash_table.
Sorry, one more time with the right file contents. On 10/2/12, Richard Guenther wrote: > You are changing a hashtable used by fold checking, did you test > with fold checking enabled? One small thinko fixed. Patch passes tests. > The cfg.c, dse.c and hash-table.h parts are ok for trunk, I'll > leave the rest to respective maintainers of the pieces of the > compiler. +cc java: tro...@redhat.com c: r...@redhat.com objc: mikest...@comcast.net cp: ja...@redhat.com Change more non-GTY hash tables to use the new type-safe template hash table. Constify member function parameters that can be const. Correct a couple of expressions in formerly uninstantiated templates. The new code is 0.362% faster in bootstrap, with a 99.5% confidence of being faster. Tested on x86-64. Okay for trunk? Index: gcc/java/ChangeLog 2012-10-01 Lawrence Crowl * Make-lang.in (JAVA_OBJS): Add dependence on hash-table.o. (JCFDUMP_OBJS): Add dependence on hash-table.o. (jcf-io.o): Add dependence on hash-table.h. * jcf-io.c (memoized_class_lookups): Change to use type-safe hash table. Index: gcc/c/ChangeLog 2012-10-01 Lawrence Crowl * Make-lang.in (c-decl.o): Add dependence on hash-table.h. * c-decl.c (detect_field_duplicates_hash): Change to new type-safe hash table. Index: gcc/objc/ChangeLog 2012-10-01 Lawrence Crowl * Make-lang.in (OBJC_OBJS): Add dependence on hash-table.o. (objc-act.o): Add dependence on hash-table.h. * objc-act.c (objc_detect_field_duplicates): Change to new type-safe hash table. Index: gcc/ChangeLog 2012-10-01 Lawrence Crowl * Makefile.in (fold-const.o): Add depencence on hash-table.h. (dse.o): Likewise. (cfg.o): Likewise. * fold-const.c (fold_checksum_tree): Change to new type-safe hash table. * (print_fold_checksum): Likewise. * cfg.c (var bb_original): Likewise. * (var bb_copy): Likewise. * (var loop_copy): Likewise. * hash-table.h (template hash_table): Constify parameters for find... and remove_elt... member functions. (hash_table::empty) Correct size expression. (hash_table::clear_slot) Correct deleted entry assignment. * dse.c (var rtx_group_table): Change to new type-safe hash table. Index: gcc/cp/ChangeLog 2012-10-01 Lawrence Crowl * Make-lang.in (class.o): Add dependence on hash-table.h. (tree.o): Likewise. (semantics.o): Likewise. * class.c (fixed_type_or_null): Change to new type-safe hash table. * tree.c (verify_stmt_tree): Likewise. (verify_stmt_tree_r): Likewise. * semantics.c (struct nrv_data): Likewise. Index: gcc/java/Make-lang.in === --- gcc/java/Make-lang.in (revision 191941) +++ gcc/java/Make-lang.in (working copy) @@ -83,10 +83,10 @@ JAVA_OBJS = java/class.o java/decl.o jav java/zextract.o java/jcf-io.o java/win32-host.o java/jcf-parse.o java/mangle.o \ java/mangle_name.o java/builtins.o java/resource.o \ java/jcf-depend.o \ - java/jcf-path.o java/boehm.o java/java-gimplify.o + java/jcf-path.o java/boehm.o java/java-gimplify.o hash-table.o JCFDUMP_OBJS = java/jcf-dump.o java/jcf-io.o java/jcf-depend.o java/jcf-path.o \ - java/win32-host.o java/zextract.o ggc-none.o + java/win32-host.o java/zextract.o ggc-none.o hash-table.o JVGENMAIN_OBJS = java/jvgenmain.o java/mangle_name.o @@ -326,7 +326,7 @@ java/java-gimplify.o: java/java-gimplify # jcf-io.o needs $(ZLIBINC) added to cflags. CFLAGS-java/jcf-io.o += $(ZLIBINC) java/jcf-io.o: java/jcf-io.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ - $(JAVA_TREE_H) java/zipfile.h + $(JAVA_TREE_H) java/zipfile.h $(HASH_TABLE_H) # jcf-path.o needs a -D. CFLAGS-java/jcf-path.o += \ Index: gcc/java/jcf-io.c === --- gcc/java/jcf-io.c (revision 191941) +++ gcc/java/jcf-io.c (working copy) @@ -31,7 +31,7 @@ The Free Software Foundation is independ #include "jcf.h" #include "tree.h" #include "java-tree.h" -#include "hashtab.h" +#include "hash-table.h" #include #include "zlib.h" @@ -271,20 +271,34 @@ find_classfile (char *filename, JCF *jcf return open_class (filename, jcf, fd, dep_name); } -/* Returns 1 if the CLASSNAME (really a char *) matches the name - stored in TABLE_ENTRY (also a char *). */ -static int -memoized_class_lookup_eq (const void *table_entry, const void *classname) +/* Hash table helper. */ + +struct charstar_hash : typed_noop_remove { - return strcmp ((const char *)classname, (const char *)table_entry) == 0; + typedef const char T; + static inline hashval_t hash (const T *candidate); + static inline bool equal (const T *existing, const T *candidate); +}; + +inline hashval_t +charstar_hash::hash (const T *candidate) +{ + return htab_hash_string (cand
Re: [PATCH] rs6000: Remove 'A' output modifier
On Wed, Oct 3, 2012 at 6:28 PM, Segher Boessenkool wrote: > It was used for old POWER only, which has been removed. > > Tested etc.; okay to apply? > > > Segher > > > 2012-10-04 Segher Boessenkool > > gcc/ > * config/rs6000/rs6000.c (print_operand) ['A']: Delete. Okay. Thanks, David
Re: patch to fix
On Oct 3, 2012, at 1:47 PM, Marc Glisse wrote: > did you consider making the size of wide_int a template parameter, now that > we are using C++? All with a convenient typedef or macro so it doesn't show. > I am asking because in vrp I do some arithmetic that requires 2*N+1 bits > where N is the size of double_int. No, not really. I'd maybe answer it this way, we put in a type (singular) to support all integral constants in all languages on a port. Since we only needed 1, there was little need to templatize it. By supporting all integral constants in all languages, there is little need for more. If Ada say, wanted a 2048 bit integer, then, we just have it drop off the size it wants someplace and we would mix that in on a MAX(….) line, net result, the type we use would then directly support the needs of Ada. If vpr wanted 2x of all existing modes, we could simply change the MAX equation and essentially double it; if people need that. This comes as a cost, as the intermediate wide values are fixed size allocated (not variable); so these all would be larger. For the longer lived values, no change, as these are variably sized as one would expect.
Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars
For C and C++, identifiers beginning with underscore and upper case letter or with two underscores are reserve to the implementation. C++ uses _Z for mangling. Maybe Fortran could prepend "_F". Something beginning with an underscore seems like a much better choice, given the rules about reserved identifiers. Thanks, David On Wed, Oct 3, 2012 at 5:00 PM, Tobias Burnus wrote: > David, > > > David Edelsohn wrote: >> >> I am not sure why you chose a period and how best to correct this. > > > Well, in principle any name which the user cannot enter would do. (Not > enter: At least not as valid Fortran identifier.) > > The reason for choosing "." is that is used elsewhere in > gfortran for such identifier for the string-length variable belonging to > , e.g. "._result" in trans-decl.c. I assume the reason that it > didn't pop up with those is that those are local variables, but I wouldn't > be surprised if it would break elsewhere. > > I wonder whether "@" would work, otherwise, one could also use "_". The only > other problem is that it will break the ABI. On the other hand, it's a > rather new feature and if we bump the .mod version number, the chance that > one effectively forces the user to re-compile is rather high. So far we > always bumped the .mod version number as something changed. There are also > some other patches pending which effectively lead to a bump in the .mod > version. > > (The .mod version won't affect code which doesn't use modules such as > BLAS/LAPACK or any Fortran 66/77 code, but those won't be affected by the > ABI change anyway as there the name doesn't propagate as it does with > modules..) > > > Thanks for investigating the test-suite failure. > > Tobias
Re: [wwwdocs] Buildstat update for 4.6
On Tue, 2 Oct 2012, Tom G. Christensen wrote: > Latest results for 4.6.x Thanks, applied. A bit unusual to see submissions of older versions of one branch at this time, and from one and the same submitter... Gerald
Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars
Maybe something like the following: Index: trans-decl.c === --- trans-decl.c(revision 192019) +++ trans-decl.c(working copy) @@ -1097,9 +1097,9 @@ /* Also prefix the mangled name. */ if (sym->module) - name = gfc_get_string (".__%s_MOD_%s", sym->module, sym->name); + name = gfc_get_string ("_F_%s_MOD_%s", sym->module, sym->name); else - name = gfc_get_string (".%s", sym->name); + name = gfc_get_string ("_F_%s", sym->name); length = build_decl (input_location, VAR_DECL, get_identifier (name), Thanks, David On Wed, Oct 3, 2012 at 5:00 PM, Tobias Burnus wrote: > David, > > > David Edelsohn wrote: >> >> I am not sure why you chose a period and how best to correct this. > > > Well, in principle any name which the user cannot enter would do. (Not > enter: At least not as valid Fortran identifier.) > > The reason for choosing "." is that is used elsewhere in > gfortran for such identifier for the string-length variable belonging to > , e.g. "._result" in trans-decl.c. I assume the reason that it > didn't pop up with those is that those are local variables, but I wouldn't > be surprised if it would break elsewhere. > > I wonder whether "@" would work, otherwise, one could also use "_". The only > other problem is that it will break the ABI. On the other hand, it's a > rather new feature and if we bump the .mod version number, the chance that > one effectively forces the user to re-compile is rather high. So far we > always bumped the .mod version number as something changed. There are also > some other patches pending which effectively lead to a bump in the .mod > version. > > (The .mod version won't affect code which doesn't use modules such as > BLAS/LAPACK or any Fortran 66/77 code, but those won't be affected by the > ABI change anyway as there the name doesn't propagate as it does with > modules..) > > > Thanks for investigating the test-suite failure. > > Tobias
Re: [wwwdocs] Buildstat update for 4.5
On Tue, 2 Oct 2012, Tom G. Christensen wrote: > Latest results for 4.5.x Thanks, Tom! Gerald
Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars
Another suggestion from Segher is "_F.", which is both reserve and cannot conflict with C/C++ because identifiers cannot contain ".". - David
Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars
On Oct 3, 2012, at 4:49 PM, David Edelsohn wrote: > Another suggestion from Segher is "_F.", which is both reserve and > cannot conflict with C/C++ because identifiers cannot contain ".". The problem with that are machines that don't like "." in identifiers (aka NO_DOT_IN_LABEL)… so one just has to be a little careful in using it.
Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars
On Wed, Oct 3, 2012 at 8:29 PM, Mike Stump wrote: > On Oct 3, 2012, at 4:49 PM, David Edelsohn wrote: >> Another suggestion from Segher is "_F.", which is both reserve and >> cannot conflict with C/C++ because identifiers cannot contain ".". > > The problem with that are machines that don't like "." in identifiers (aka > NO_DOT_IN_LABEL)… so one just has to be a little careful in using it. That's a good reason to use "_F". And NO_DOT_IN_LABEL systems will not function with the current "." mangling. - David
Ping: RFA: Process '*' in '@'-output-template alternatives
The following patch is still awaiting review: 2011-09-19 J"orn Rennecke * genoutput.c (process_template): Process '*' in '@' alternatives. * doc/md.texi (node Output Statement): Provide example for the above. http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01422.html
Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #2b
@@ -1115,7 +1118,8 @@ static const struct attribute_spec rs600 { NULL,0, 0, false, false, false, NULL, false } }; -#ifndef MASK_STRICT_ALIGN +#ifndef OPTION_MASK_STRICT_ALIGN +#define OPTION_MASK_STRICT_ALIGN 0 #define MASK_STRICT_ALIGN 0 #endif #ifndef TARGET_PROFILE_KERNEL Why does this fragment define OPTION_MASK_STRICT_ALIGN but does not remove definition of MASK_STRICT_ALIGN? Similarly for -#ifndef MASK_64BIT +#ifndef OPTION_MASK_64BIT +#define OPTION_MASK_64BIT 0 #define MASK_64BIT 0 #endif Why define both OPTION_MASK_64BIT and MASK_64BIT? And +/* Map OPTION_ back into TARGET_ options in rs6000_isa_flags. */ Why set up correspondence for all OPTION_xxx flags back to TARGET_xxx flags? Thanks, David
Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing
Oleg Endo wrote: > This adds the two common built-in functions __builtin_thread_pointer and > __builtin_set_thread_pointer to the SH port. > I've done it in a way so that hopefully it can be transitioned to target > independent thread pointer built-ins easily, as suggested by Richard a > while ago: > http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00946.html > > Originally I wanted to wait until the target independent bits are in, > but somehow the thread mentioned above died and I got impatient. > > I've also added support for SH's GBR based displacement addressing > modes. They are not used for general purpose mem loads/stores by the > compiler, but rather when code accesses data behind the thread pointer > (thread control block or something). > The way GBR displacement address opportunities are discovered might not > be the best way of doing this sort of thing, but it works. The insn > walking could potentially slow down compile times, but it is only > enabled for functions where the GBR is referenced, so it shouldn't be so > bad. Alternatives and suggestions are highly appreciated. :) > > Tested on rev 191894 with 'make all' (c,c++) and > 'make -k check-gcc RUNTESTFLAGS="sh.exp --target_board=sh-sim > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"' > > For code that doesn't reference GBR there are no functional changes. > TLS code that references GBR might trigger the 'sh_find_equiv_gbr_addr' > function. Unfortunately TLS tests don't seem to work on sh-sim, so I > could not test this part. sh4-unknown-linux-gnu build failed during compiling libmudflap: /exp/ldroot/dodes/xsh-gcc/./gcc/xgcc -B/exp/ldroot/dodes/xsh-gcc/./gcc/ -B/usr/local/sh4-unknown-linux-gnu/bin/ -B/usr/local/sh4-unknown-linux-gnu/lib/ -isystem /usr/local/sh4-unknown-linux-gnu/include -isystem /usr/local/sh4-unknown-linux-gnu/sys-include -DHAVE_CONFIG_H -I. -I../../../LOCAL/trunk/libmudflap -DLIBMUDFLAPTH -g -O2 -MT libmudflapth_la-mf-runtime.lo -MD -MP -MF .deps/libmudflapth_la-mf-runtime.Tpo -c ../../../LOCAL/trunk/libmudflap/mf-runtime.c -o libmudflapth_la-mf-runtime.o ../../../LOCAL/trunk/libmudflap/mf-runtime.c: In function 'begin_recursion_protect1': ../../../LOCAL/trunk/libmudflap/mf-runtime.c:152:1: internal compiler error: Segmentation fault } ^ 0x8529c60 crash_signal ../../LOCAL/trunk/gcc/toplev.c:335 0x8771a87 sh_find_base_reg_disp ../../LOCAL/trunk/gcc/config/sh/sh.c:13344 0x8791554 sh_find_equiv_gbr_addr(rtx_def*, rtx_def*) ../../LOCAL/trunk/gcc/config/sh/sh.c:13395 0x87ce6cf gen_split_1029(rtx_def*, rtx_def**) ../../LOCAL/trunk/gcc/config/sh/sh.md:10184 0x87ea6e0 split_1 ../../LOCAL/trunk/gcc/config/sh/sh.md:10183 0x87ea6e0 split_3 ../../LOCAL/trunk/gcc/config/sh/sh.md:7082 0x82bf8a1 try_split(rtx_def*, rtx_def*, int) ../../LOCAL/trunk/gcc/emit-rtl.c:3503 0x849c642 split_insn ../../LOCAL/trunk/gcc/recog.c:2809 0x84a08b5 split_all_insns() ../../LOCAL/trunk/gcc/recog.c:2899 0x84a09a7 rest_of_handle_split_all_insns ../../LOCAL/trunk/gcc/recog.c:3751 Looks prev_nonnote_insn returns a barrier there: (gdb) fr 0 #0 0x08771a87 in sh_find_base_reg_disp (insn=, x=0xb79ddd40, base_reg=0x0, disp=0) at ../../LOCAL/trunk/gcc/config/sh/sh.c:13344 13344 if (p != NULL && GET_CODE (p) == SET && REG_P (XEXP (p, 0)) (gdb) p p $1 = (rtx) 0xafafafaf (gdb) p i $2 = (rtx_def *) 0xb79ecc3c (gdb) call debug_rtx(i) (barrier 216 215 217) > (builtin_description): Add is_enabled member. > (shmedia_builtin, sh1_builtin): New functions. > (signature_args): Add SH_BLTIN_VP. > (bdesc): Use shmedia_builtin for existing built-ins. Add > __builtin_thread_pointer and __builtin_set_thread_pointer as > sh1_builtin. > (sh_media_init_builtins, sh_init_builtins): Merge into single > function sh_init_builtins. Add is_enabled checking. > (sh_media_builtin_decl, sh_builtin_decl): Merge into single > function sh_builtin_decl. Add is_enabled checking. It would be better to separate this part except new thread pointer builtins into an independent patch which should be tested also with sh64-elf build, though now unified sh64-elf build is failing. I'd like to commit a quick fix for sh64-elf build failure. Regards, kaz
[patch committed SH] Fix sh64-elf build failure
Hi, I've committed the patch below to fix sh64-elf build failure. SHmedia and SHcompact using call cookie require special return insns and will require an extra work to enable simple_return. I simply disable it for these targets ATM. Regards, kaz -- 2012-10-04 Kaz Kojima * config/sh/sh.c (sh_can_use_simple_return_p): Return false for SHmedia and SHcompact using call cookie. * config/sh/sh.md (epilogue): Emit non-inlined return insns for SHmedia and SHcompact using call cookie. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 192060) +++ gcc/config/sh/sh.md (working copy) @@ -10460,6 +10460,13 @@ "" { sh_expand_epilogue (false); + if (TARGET_SHMEDIA + || (TARGET_SHCOMPACT + && (crtl->args.info.call_cookie & CALL_COOKIE_RET_TRAMP (1 +{ + emit_jump_insn (gen_return ()); + DONE; +} }) (define_expand "eh_return" Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 192060) +++ gcc/config/sh/sh.c (working copy) @@ -13134,6 +13134,12 @@ HARD_REG_SET live_regs_mask; int d; + /* Some targets require special return insns. */ + if (TARGET_SHMEDIA + || (TARGET_SHCOMPACT + && (crtl->args.info.call_cookie & CALL_COOKIE_RET_TRAMP (1 +return false; + if (! reload_completed || frame_pointer_needed) return false;
Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])
On 12-10-03 11:35 AM, Steven Bosscher wrote: On Wed, Oct 3, 2012 at 4:56 PM, Vladimir Makarov wrote: On 12-10-03 3:13 AM, Steven Bosscher wrote: On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford wrote: +/* Compress pseudo live ranges by removing program points where + nothing happens. Complexity of many algorithms in LRA is linear + function of program points number. To speed up the code we try to + minimize the number of the program points here. */ +static void +remove_some_program_points_and_update_live_ranges (void) Genuine question, but could we do this on the fly instead, by not incrementing curr_point if the current point had no value? I suppose the main complication would be checking cases where all births are recorded by extending the previous just-closed live range rather than starting a new one, in which case it's the previous point that needs to be reused. Hmm... It does seem to be something worth investigating further. Things like: Compressing live ranges: from 1742579 to 554532 - 31% Compressing live ranges: from 1742569 to 73069 - 4% look extreme, but they're actually the norm. For the same test case (PR54146 still) but looking only at functions with initially 1000- program points, you get this picture: Compressing live ranges: from 1766 to 705 - 39% Compressing live ranges: from 1449 to 370 - 25% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 3939 to 1093 - 27% Compressing live ranges: from 2464 to 676 - 27% Compressing live ranges: from 1433 to 379 - 26% Compressing live ranges: from 1261 to 348 - 27% Compressing live ranges: from 2835 to 755 - 26% Compressing live ranges: from 5426 to 1678 - 30% Compressing live ranges: from 5227 to 1477 - 28% Compressing live ranges: from 1845 to 467 - 25% Compressing live ranges: from 4868 to 1378 - 28% Compressing live ranges: from 4875 to 1388 - 28% Compressing live ranges: from 4882 to 1384 - 28% Compressing live ranges: from 5688 to 1714 - 30% Compressing live ranges: from 4943 to 1310 - 26% Compressing live ranges: from 2976 to 792 - 26% Compressing live ranges: from 5463 to 1526 - 27% Compressing live ranges: from 2854 to 730 - 25% Compressing live ranges: from 1810 to 745 - 41% Compressing live ranges: from 2771 to 904 - 32% Compressing live ranges: from 4916 to 1429 - 29% Compressing live ranges: from 6505 to 2238 - 34% Compressing live ranges: from 6493 to 166 - 2% Compressing live ranges: from 5498 to 1734 - 31% Compressing live ranges: from 1810 to 745 - 41% Compressing live ranges: from 5043 to 1420 - 28% Compressing live ranges: from 3094 to 788 - 25% Compressing live ranges: from 4563 to 1311 - 28% Compressing live ranges: from 4557 to 158 - 3% Compressing live ranges: from 1050 to 274 - 26% Compressing live ranges: from 1602 to 434 - 27% Compressing live ranges: from 2474 to 600 - 24% Compressing live ranges: from 2718 to 866 - 31% Compressing live ranges: from 2097 to 716 - 34% Compressing live ranges: from 4152 to 1099 - 26% Compressing live ranges: from 5065 to 1514 - 29% Compressing live ranges: from 1236 to 359 - 29% Compressing live ranges: from 1722 to 541 - 31% Compressing live ranges: from 5186 to 1401 - 27% Unfortunately the dump doesn't mention how many live ranges could be merged thanks to the compression. It'd also be good to understand why the compression ratios are so small, and consistently around ~30%. This sentence is not clear to me. 30% means 3 times less points. It is pretty good compression. Maybe curr_point includes things it should ignore (DEBUG_INSNs, NOTEs, ...)? After the compression, there are only points important for conflict info, i.e. only points where some reg dies or start living. Even more if on the subsequent points there are only life starts or only deaths, they are represented by one point after the compression. With this patch the amount of compression is reduced. Without the patch the compression ratio is typically around 30%, with the patch this goes up to ~65%. The worst compression ratios (where worse is better in this case :-) are: $ grep Compressing log4 | egrep " [78].%" Compressing live ranges: from 31 to 22 - 70%, pre_count 17, post_count 15 Compressing live ranges: from 128 to 94 - 73%, pre_count 87, post_count 77 Compressing live ranges: from 32 to 23 - 71%, pre_count 16, post_count 15 Compressing live ranges: from 38 to 29 - 76%, pre_count 19, post_count 18 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33 Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33 Compressing live ranges: from 125 to 89 - 71%, pre_count 71, post_count 62 Compressing live ranges: fr
Re: [PATCH] Improve debug info for partial inlining (PR debug/54519, take 2)
On Oct 3, 2012, Jakub Jelinek wrote: > basically there is a non-addressable parameter in stack slot, and > vt_canon_true_dep -> canon_true_dependence thinks an argument push > insn might alias with it, because it doesn't have a MEM_EXPR and > ao_ref_from_mem fails. I have a pending (still unreviewed) patch that might address :-), so to speak, this problem. for gcc/ChangeLog from Alexandre Oliva PR debug/53671 PR debug/49888 * var-tracking.c (attrs_list_by_loc_eq): New. (track_stack_pointer): New. (dataflow_set_merge): Use it. (vt_initialize): Record the initial stack pointer in the dataflow set. Index: gcc/var-tracking.c === --- gcc/var-tracking.c.orig 2012-06-26 17:33:12.991323578 -0300 +++ gcc/var-tracking.c 2012-06-26 17:51:55.316453808 -0300 @@ -1462,6 +1462,17 @@ attrs_list_member (attrs list, decl_or_v return NULL; } +/* Return the entry whose LOC field equals LOC. */ + +static attrs +attrs_list_by_loc_eq (attrs list, rtx loc) +{ + for (; list; list = list->next) +if (list->loc == loc) + return list; + return NULL; +} + /* Insert the triplet DECL, OFFSET, LOC to the list *LISTP. */ static void @@ -4028,6 +4039,86 @@ variable_merge_over_src (variable s2var, return 1; } +/* Add to DST any needed binding for the canonicalization of the stack + pointer to yield the same expression as in SRC1 and SRC2, if they + both yield the same expression. + + Return TRUE iff we found an equivalence. + + ??? The return value, that was useful during testing, ended up + unused, but this single-use static function will be inlined, and + then the return value computation will be optimized out, so I'm + leaving it in. + + ??? We use this kludge to avoid accidental aliasing between + incoming arguments and register-saving or outgoing-args pushes. We + shouldn't have to add explicit stack pointer tracking for that: + intersect_loc_chains ought to be able to take information from the + static cselib table and recognize additional equivalences, but we + don't have a sufficiently efficient algorithm to do that yet. */ + +static bool +track_stack_pointer (dataflow_set *dst, dataflow_set *src1, dataflow_set *src2) +{ + attrs dattr, s1attr, s2attr; + rtx daddr, s1addr, s2addr; + decl_or_value dv; + + for (dattr = dst->regs[STACK_POINTER_REGNUM]; + (dattr = attrs_list_by_loc_eq (dattr, stack_pointer_rtx)) + && (dattr->offset || !dv_is_value_p (dattr->dv)); + dattr = dattr->next) +; + + for (s1attr = src1->regs[STACK_POINTER_REGNUM]; + (s1attr = attrs_list_by_loc_eq (s1attr, stack_pointer_rtx)) + && (s1attr->offset || !dv_is_value_p (s1attr->dv)); + s1attr = s1attr->next) +; + if (!s1attr) +return false; + + for (s2attr = src2->regs[STACK_POINTER_REGNUM]; + (s2attr = attrs_list_by_loc_eq (s2attr, stack_pointer_rtx)) + && (s2attr->offset || !dv_is_value_p (s2attr->dv)); + s2attr = s2attr->next) +; + if (!s2attr) +return false; + + if (dattr) +daddr = vt_canonicalize_addr (dst, dv_as_value (dattr->dv)); + else +daddr = NULL; + s1addr = vt_canonicalize_addr (src1, dv_as_value (s1attr->dv)); + s2addr = vt_canonicalize_addr (src2, dv_as_value (s2attr->dv)); + + if (!rtx_equal_p (s1addr, s2addr)) +return false; + + if (daddr && rtx_equal_p (daddr, s1addr)) +return true; + + dst_can_be_shared = false; + if (daddr) +dv = dattr->dv; + else if (vt_get_canonicalize_base (s1addr) + != (cfa_base_rtx ? cfa_base_rtx : arg_pointer_rtx)) +return false; + else +{ + cselib_val *val = cselib_lookup (s1addr, GET_MODE (s1addr), 1, VOIDmode); + cselib_preserve_value (val); + dv = dv_from_value (val->val_rtx); +} + + var_reg_decl_set (dst, stack_pointer_rtx, + VAR_INIT_STATUS_INITIALIZED, + dv, 0, NULL_RTX, INSERT); + + return true; +} + /* Combine dataflow set information from SRC2 into DST, using PDST to carry over information across passes. */ @@ -4066,6 +4157,8 @@ dataflow_set_merge (dataflow_set *dst, d FOR_EACH_HTAB_ELEMENT (shared_hash_htab (dsm.cur->vars), var, variable, hi) variable_merge_over_cur (var, &dsm); + track_stack_pointer (dst, src1, src2); + if (dsm.src_onepart_cnt) dst_can_be_shared = false; @@ -9682,6 +9775,18 @@ vt_initialize (void) expr = plus_constant (GET_MODE (reg), reg, ofst); cselib_add_permanent_equiv (val, expr, get_insns ()); } + + /* VAL below will generally be the one set within the + conditional block, but if OFST happens to be zero, we'll be + happy to use the one corresponding to REG. + + ??? We shouldn't need this any more once dataflow merges + start using equivalences from the cselib table in addition to + those in dataflow sets. */ + var_reg_decl_set (&VTI (ENTRY_BLOCK_PTR)->out, + stack_pointer_rtx, VAR_INIT_STATUS_INITIALIZED, + dv_from_value (val->val_rtx), 0, NULL_RTX, + INSERT);
Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])
On Thu, Oct 4, 2012 at 5:30 AM, Vladimir Makarov wrote: > I was going to look at this code too but I was interesting in generation of > less points and live ranges. It is strange that in my profiles, > remove_some_program_points_and_update_live_ranges takes 0.6% of compiler > time on these huge tests. So I was not interesting to speed up the > function and may be therefore you have no visible change in compilation > time. Right. The compression algorithm doesn't care much about the initial number of program points, only about the number of live ranges before and after compression. I had expected a bigger effect on the number of live ranges before compression. 0.6% sounds really very different from my timings. How much time does create_start_finish_chains take for you? > I don't object the idea of the patch. I need some time to look at it (the > different results on a function is a bit scary for me) and check simulator > times on other tests. Understood. FWIW you can find the test results from before and after the patch here: http://gcc.gnu.org/ml/gcc-testresults/2012-10/msg00267.html http://gcc.gnu.org/ml/gcc-testresults/2012-10/msg00303.html Ciao! Steven