[committed] vms/ia64: Define SUPPORTS_ONE_ONLY

2011-12-23 Thread Tristan Gingold
Hi,

the native ia64 VMS linker doesn't fully support COMDAT sections.

Committed on trunk.

Tristan.

2011-12-23  Tristan Gingold  

* config/ia64/vms.h (SUPPORTS_ONE_ONLY): Define.


--- a/gcc/config/ia64/vms.h
+++ b/gcc/config/ia64/vms.h
@@ -157,3 +157,7 @@ STATIC func_ptr __CTOR_LIST__[1]
 
 #undef TARGET_PROMOTE_FUNCTION_MODE
 #define TARGET_PROMOTE_FUNCTION_MODE default_promote_function_mode_always_promo
+
+/* IA64 VMS doesn't fully support COMDAT sections.  */
+
+#define SUPPORTS_ONE_ONLY 0



[committed]: VMS: Fix a typo in vms-crtlmap.map

2011-12-23 Thread Tristan Gingold
Hi,

this patch fixes a typo in the CRTL map file.

Committed.

Tristan.

2011-12-23  Tristan Gingold  

* config/vms/vms-crtlmap.map (log10): Fix typo.

--- a/gcc/config/vms/vms-crtlmap.map
+++ b/gcc/config/vms/vms-crtlmap.map
@@ -112,7 +112,7 @@ isupper
 kill
 localtime
 log   FLOAT
-log1  FLOAT
+log10 FLOAT
 lseek
 malloc64 MALLOC
 mbstowcs  64



Re: [PATCH] Fix PR50396

2011-12-23 Thread Richard Guenther
On Thu, 22 Dec 2011, Richard Henderson wrote:

> On 12/22/2011 07:46 AM, Richard Guenther wrote:
> > Any way to test, in the testcase, whether the vector modes
> > will have NaNs or not?
> 
>  v[0] != v[0] ?

Well, if MODE_HAS_NANS returns false we might fold 0.0/0.0 to 0.0,
or the HW might simply not have NaNs (SPU?) and have 0.0 as the
result.  Thus, I want to query GCC capabilities (-ffinite-math-only)
and HW capabilities (what we have in real_mode_format) from inside
the testcase.

Any idea?  Otherwise I'll add dg-skips for the targets that fail
the test.

Richard.


Re: [PATCH] Fix PR50396

2011-12-23 Thread Richard Guenther
On Fri, 23 Dec 2011, Richard Guenther wrote:

> On Thu, 22 Dec 2011, Richard Henderson wrote:
> 
> > On 12/22/2011 07:46 AM, Richard Guenther wrote:
> > > Any way to test, in the testcase, whether the vector modes
> > > will have NaNs or not?
> > 
> >  v[0] != v[0] ?
> 
> Well, if MODE_HAS_NANS returns false we might fold 0.0/0.0 to 0.0,
> or the HW might simply not have NaNs (SPU?) and have 0.0 as the
> result.  Thus, I want to query GCC capabilities (-ffinite-math-only)
> and HW capabilities (what we have in real_mode_format) from inside
> the testcase.
> 
> Any idea?  Otherwise I'll add dg-skips for the targets that fail
> the test.

It seems we have ___HAS_QUIET_NAN__.  Nice.  Thus I'll use

/* { dg-do run } */

extern void abort (void);
typedef float vf128 __attribute__((vector_size(16)));
typedef float vf64 __attribute__((vector_size(8)));
int main()
{
#if !__FINITE_MATH_ONLY__
#if __FLT_HAS_QUIET_NAN__
  vf128 v = (vf128){ 0.f, 0.f, 0.f, 0.f };
  vf64 u = (vf64){ 0.f, 0.f };
  v = v / (vf128){ 0.f, 0.f, 0.f, 0.f };
  if (v[0] == v[0])
abort ();
  u = u / (vf64){ 0.f, 0.f };
  if (u[0] == u[0])
abort ();
#endif
#endif
  return 0;
}



Re: PR middle-end/51212: sorry out on -fgnu-tm + -fnon-call-exceptions

2011-12-23 Thread Richard Guenther
On Thu, Dec 22, 2011 at 8:47 PM, Aldy Hernandez  wrote:
> The problem here is that with -fnon-call-exceptions, a memory dereference
> may trap, but when we instrument the store, we have lost the landing pad
> information.
>
> One solution would be to move the EH information to the TM load/store
> instrumentation builtins, but that doesn't get us around the fact that
> libitm is not exception safe, and we have no mechanism for taking and
> exception (and propagating it) in the middle of a transaction.
>
> Richard has suggested that another alternative could be to support the
> exception case for NULL, but non-null faulting memory references would still
> cause a crash.  In this case we would simply test for NULL at the start of
> the accessors and explicitly throw the exception.
>
> And yet a third alternative, is to disable the -fgnu-tm and
> -fnon-call-exceptions combination.  I have implemented this one, as I'd
> rather have it not work, than work half-way.
>
> Torvald, do you have any thoughts on the matter?
>
> Attached patch for disabling the feature, if you both agree on this
> approach.

I think this should be documented at the place -fgnu-tm is documented.

Richard.


[Ada] Check matching Float_Representation on OpenVMS

2011-12-23 Thread Arnaud Charlet
On OpenVMS targets, a configuration pragma Float_Representation
indicates that the specified representation (IEEE, VAX) should
be the default for predefined floating point types and for
new floating point definitions.

Tested on x86_64-pc-linux-gnu, committed on trunk

2011-12-23  Geert Bosch  

* sem_ch3.adb (Can_Derive_From): Check matching Float_Rep on VMS.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 182654)
+++ sem_ch3.adb (working copy)
@@ -15333,10 +15333,23 @@
  Spec : constant Entity_Id := Real_Range_Specification (Def);
 
   begin
+ --  Check specified "digits" constraint
+
  if Digs_Val > Digits_Value (E) then
 return False;
  end if;
 
+ --  Avoid types not matching pragma Float_Representation, if present
+
+ if (Opt.Float_Format = 'I' and then Float_Rep (E) /= IEEE_Binary)
+  or else
+(Opt.Float_Format = 'V' and then Float_Rep (E) /= VAX_Native)
+ then
+return False;
+ end if;
+
+ --  Check for matching range, if specified
+
  if Present (Spec) then
 if Expr_Value_R (Type_Low_Bound (E)) >
Expr_Value_R (Low_Bound (Spec))


[Ada] Straigthen implementation of aggregate libraries

2011-12-23 Thread Arnaud Charlet
Handle case where the same library project is imported by multiple
aggregated libraries.

Tested on x86_64-pc-linux-gnu, committed on trunk

2011-12-23  Pascal Obry  

* prj.ads (For_Every_Project_Imported): Add In_Aggregate_Lib
parameter to generic formal procedure.
* prj.adb (For_Every_Project_Imported): Update accordingly.
(Recursive_Check): Likewise. Do not parse imported project for
aggregate library. This is needed as the imported projects are
there just to handle dependencies.
(Look_For_Sources): Likewise.
(Recursive_Add): Likewise.
* prj-env.adb, prj-conf.adb, makeutl.adb, gnatcmd.adb:
Add In_Aggregate_Lib parameter to routines used with
For_Every_Project_Imported generic procedure.
* prj-nmsc.adb (Tree_Processing_Data): Add In_Aggregate_Lib
field.
(Check): Move where it is used. Fix implementation
to not check libraries that are inside aggregate libraries.
(Recursive_Check): Add In_Aggregate_Lib parameter.

Index: gnatcmd.adb
===
--- gnatcmd.adb (revision 182655)
+++ gnatcmd.adb (working copy)
@@ -264,6 +264,7 @@
procedure Set_Library_For
  (Project   : Project_Id;
   Tree  : Project_Tree_Ref;
+  In_Aggregate_Lib  : Boolean;
   Libraries_Present : in out Boolean);
--  If Project is a library project, add the correct -L and -l switches to
--  the linker invocation.
@@ -1264,9 +1265,10 @@
procedure Set_Library_For
  (Project   : Project_Id;
   Tree  : Project_Tree_Ref;
+  In_Aggregate_Lib  : Boolean;
   Libraries_Present : in out Boolean)
is
-  pragma Unreferenced (Tree);
+  pragma Unreferenced (Tree, In_Aggregate_Lib);
 
   Path_Option : constant String_Access :=
   MLib.Linker_Library_Path_Option;
Index: prj.adb
===
--- prj.adb (revision 182655)
+++ prj.adb (working copy)
@@ -528,20 +528,24 @@
   Seen : Project_Boolean_Htable.Instance := Project_Boolean_Htable.Nil;
 
   procedure Recursive_Check
-(Project : Project_Id;
- Tree: Project_Tree_Ref);
-  --  Check if a project has already been seen. If not seen, mark it as
-  --  Seen, Call Action, and check all its imported projects.
+(Project  : Project_Id;
+ Tree : Project_Tree_Ref;
+ In_Aggregate_Lib : Boolean);
+  --  Check if a project has already been seen. If not seen, mark it
+  --  as Seen, Call Action, and check all its imported and aggregated
+  --  projects.
 
   -
   -- Recursive_Check --
   -
 
   procedure Recursive_Check
-(Project : Project_Id;
- Tree: Project_Tree_Ref)
+(Project  : Project_Id;
+ Tree : Project_Tree_Ref;
+ In_Aggregate_Lib : Boolean)
   is
  List : Project_List;
+ T: Project_Tree_Ref;
 
   begin
  if not Get (Seen, Project) then
@@ -552,22 +556,28 @@
 Set (Seen, Project, True);
 
 if not Imported_First then
-   Action (Project, Tree, With_State);
+   Action (Project, Tree, In_Aggregate_Lib, With_State);
 end if;
 
 --  Visit all extended projects
 
 if Project.Extends /= No_Project then
-   Recursive_Check (Project.Extends, Tree);
+   Recursive_Check (Project.Extends, Tree, In_Aggregate_Lib);
 end if;
 
---  Visit all imported projects
+--  Visit all imported projects if needed. This is not needed
+--  for an aggregate library as imported libraries are just
+--  there for dependency support.
 
-List := Project.Imported_Projects;
-while List /= null loop
-   Recursive_Check (List.Project, Tree);
-   List := List.Next;
-end loop;
+if Project.Qualifier /= Aggregate_Library
+  or else not Include_Aggregated
+then
+   List := Project.Imported_Projects;
+   while List /= null loop
+  Recursive_Check (List.Project, Tree, In_Aggregate_Lib);
+  List := List.Next;
+   end loop;
+end if;
 
 --  Visit all aggregated projects
 
@@ -580,14 +590,25 @@
   Agg := Project.Aggregated_Projects;
   while Agg /= null loop
  pragma Assert (Agg.Project /= No_Project);
- Recursive_Check (Agg.Project, Agg.Tree);
+
+ --  For aggregated libraries, the tree must be the one
+ --  of the aggregate library.
+
+ if Project.Qualifier = Aggregate_Libra

Re: [PATCH, PR 51600] IPA-CP workaround for negative size cloning estimates

2011-12-23 Thread Jan Hubicka
> Hi,
> 
> On Wed, Dec 21, 2011 at 05:29:51PM +0100, Jan Hubicka wrote:
> > > Hi,
> > > 
> > > given that we already have a workaround for zero size increase
> > > estimates from estimate_ipcp_clone_size_and_time, I see little reason
> > > not to extend it to negative values too, 0 is really just as bad as -2
> > > that we are getting in the testcase.  Hopefully this will allow peple
> > > who hit this bug proceed with their testing.
> > > 
> > > Bootstrapped and tested on x86-64-linux with no regressions.
> > > OK for trunk?
> > 
> > Hmm, so the size value is not negative because 
> > estimate_ipcp_clone_size_and_time
> > would return 0 or negative value but because of
> >   size -= stats.n_calls * removable_params_cost
> > (i.e. the callee function is so small that the program will really
> > shrink because of reduced call overhead)?
> 
> no, it is really estimate_ipcp_clone_size_and_time that returns size
> estimate -2.  In fact, the subtraction you described does not occur on
> that code path at all because I do it only for constants that occur in
> all contexts (from all callers) and this assert is on the path dealing
> with estimates of effects of constants that there are only in some
> contexts.
> 
> The reason why I don't do it for constants that come from only a
> subset of callers is that some of these callers might themselves
> require context specific cloning to provide tha value but when actual
> decisions are being made later on, they would not be cloned.  So I
> don't know the set of callers that provide the constant at this time
> and cannot do the subtraction.
> 
> > 
> > In that case I guess the patch is OK, but please update the comment,
> 
> Well, it't not the case, so what do you think?

Hmm, it is estimate_ipcp_clone_size_and_time bug then.  I will look into that
today.

Honza
> 
> Martin


Re: RFC: An alternative -fsched-pressure implementation

2011-12-23 Thread Richard Guenther
On Fri, Dec 23, 2011 at 12:46 PM, Richard Sandiford
 wrote:
> So it looks like two pieces of work related to scheduling and register
> pressure are being posted close together.  This one is an RFC for a less
> aggressive form of -fsched-pressure.  I think it should complement
> rather than compete with Bernd's IRA patch.  It seems like a good idea
> to take register pressure into account during the first scheduling pass,
> where we can still easily look at things like instruction latencies
> and pipeline utilisation.  Better rematerialisation in the register
> allocator would obviously be a good thing too though.
>
> This patch started when we (Linaro) saw a big drop in performance
> from vectorising an RGB to YIQ filter on ARM.  The first scheduling
> pass was overly aggressive in creating a "wide" schedule, and caused
> the newly-vectorised loop to contain lots of spills.  The loop grew so
> big that it even had a constant pool in the middle of it.
>
> -fsched-pressure did a very good job on this loop, creating far fewer
> spills and consequently avoiding the constant pool.  However, it seemed
> to make several other cases significantly worse.  The idea was therefore
> to try to come up with a form of -fsched-pressure that could be turned
> on for ARM by default.
>
> Current -fsched-pressure works by assigning an "excess (pressure) cost change"
> to each instruction; here I'll write that as ECC(X).  -fsched-pressure also
> changes the way that the main list scheduler handles stalls due to data
> dependencies.  If an instruction would stall for N cycles, the scheduler
> would normally add it to the "now+N" queue, then add it to the ready queue
> after N cycles.  With -fsched-pressure, it instead adds the instruction
> to the ready queue immediately, while still recording that the instruction
> would require N stalls.  I'll write the number of stalls on X as delay(X).
>
> This arrangement allows the scheduler to choose between increasing register
> pressure and introducing a deliberate stall.  Instructions are ranked by:
>
>  (a) lowest ECC(X) + delay(X)
>  (b) lowest delay(X)
>  (c) normal list-scheduler ranking (based on things like INSN_PRIORITY)
>
> Note that since delay(X) is measured in cycles, ECC(X) is effectively
> measured in cycles too.
>
> Several things seemed to be causing the degradations we were seeing
> with -fsched-pressure:
>
>  (1) The -fsched-pressure schedule is based purely on instruction latencies
>      and issue rate; it doesn't take the DFA into account.  This means that
>      we attempt to "dual issue" things like vector operations, loads and
>      stores on Cortex A8 and A9.  In the examples I looked at, these sorts
>      of inaccuracy seemed to accumulate, so that the value of delay(X)
>      became based on slightly unrealistic cycle times.
>
>      Note that this also affects code that doesn't have any pressure
>      problems; it isn't limited to code that does.
>
>      This may simply be historical.  It became much easier to use the
>      DFA here after Bernd's introduction of prune_ready_list, but the
>      original -fsched-pressure predates that.
>
>  (2) We calculate ECC(X) by walking the unscheduled part of the block
>      in its original order, then recording the pressure at each instruction.
>      This seemed to make ECC(X) quite sensitive to that original order.
>      I saw blocks that started out naturally "narrow" (not much ILP,
>      e.g. from unrolled loops) and others that started naturally "wide"
>      (a lot of ILP, such as in the libav h264 code), and walking the
>      block in order meant that the two styles would be handled differently.
>
>  (3) When calculating the pressure of the original block (as described
>      in (2)), we ignore the deaths of registers that are used by more
>      than one unscheduled instruction.  This tended to hurt long(ish)
>      loops in things like filters, where the same value is often used
>      as an input to two calculations.  The effect was that instructions
>      towards the end of the block would appear to have very high pressure.
>      This in turn made the algorithm very conservative; it wouldn't
>      promote instructions from later in the block because those
>      instructions seemed to have a prohibitively large cost.
>
>      I asked Vlad about this, and he confirmed that it was a deliberate
>      decision.  He'd tried honouring REG_DEAD notes instead, but it
>      produced worse results on x86.  I'll return to this at the end.
>
>  (4) ECC(X) is based on the pressure over and above ira_available_class_regs
>      (the number of allocatable registers in a given class).  ARM has 14
>      allocatable GENERAL_REGS: 16 minus the stack pointer and program
>      counter.  So if 14 integer variables are live across a loop but
>      not referenced within it, we end up scheduling that loop in a context
>      of permanent pressure.  Pressure becomes the overriding concern,
>      and we don't get muc

[PATCH] libstdc++: Make it possible to annotate the shared pointer operations in the std::thread implementation

2011-12-23 Thread Bart Van Assche
As documented in the libstdc++ manual, the shared pointer operations in
libstdc++ headers can be instrumented by defining the macros
_GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE()/AFTER() and libstdc++ has to be
rebuilt in order to instrument the remaining shared pointer operations.
However, rebuilding libstdc++ is inconvenient. So let's move the thread
wrapper code from thread.cc into .

See also:
* http://gcc.gnu.org/onlinedocs/libstdc++/manual/debug.html.
* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51504.

Signed-off-by: Bart Van Assche 

Index: libstdc++-v3/src/thread.cc
===
--- libstdc++-v3/src/thread.cc  (revision 182271)
+++ libstdc++-v3/src/thread.cc  (working copy)
@@ -59,28 +59,6 @@ static inline int get_nprocs()
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
-  namespace
-  {
-extern "C" void*
-execute_native_thread_routine(void* __p)
-{
-  thread::_Impl_base* __t = static_cast(__p);
-  thread::__shared_base_type __local;
-  __local.swap(__t->_M_this_ptr);
-
-  __try
-   {
- __t->_M_run();
-   }
-  __catch(...)
-   {
- std::terminate();
-   }
-
-  return 0;
-}
-  }
-
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   void
@@ -114,12 +92,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   thread::_M_start_thread(__shared_base_type __b)
   {
+  _M_start_thread(__b, &_M_entry);
+  }
+
+  void
+  thread::_M_start_thread(__shared_base_type __b, void* (*__pf)(void*))
+  {
 if (!__gthread_active_p())
   __throw_system_error(int(errc::operation_not_permitted));
 
 __b->_M_this_ptr = __b;
-int __e = __gthread_create(&_M_id._M_thread,
-  &execute_native_thread_routine, __b.get());
+int __e = __gthread_create(&_M_id._M_thread, __pf, __b.get());
 if (__e)
 {
   __b->_M_this_ptr.reset();
Index: libstdc++-v3/include/std/thread
===
--- libstdc++-v3/include/std/thread (revision 182271)
+++ libstdc++-v3/include/std/thread (working copy)
@@ -132,7 +132,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
 _M_start_thread(_M_make_routine(std::__bind_simple(
 std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)));
+std::forward<_Args>(__args)...)),
+&thread::_M_entry);
   }
 
 ~thread()
@@ -180,9 +181,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 hardware_concurrency() noexcept;
 
   private:
+static void* _M_entry(void* __p)
+{
+  thread::_Impl_base* __t = static_cast(__p);
+  thread::__shared_base_type __local;
+  __local.swap(__t->_M_this_ptr);
+  
+  __try
+{
+  __t->_M_run();
+}
+  __catch(...)
+{
+  std::terminate();
+}
+  
+  return 0;
+}
+
 void
 _M_start_thread(__shared_base_type);
 
+void
+_M_start_thread(__shared_base_type, void* (*)(void*));
+
 template
   shared_ptr<_Impl<_Callable>>
   _M_make_routine(_Callable&& __f)
Index: libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
===
--- libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
(revision 182271)
+++ libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
(working copy)
@@ -2145,6 +2145,7 @@ FUNC:_ZNSt6localeD1Ev@@GLIBCXX_3.4
 FUNC:_ZNSt6localeD2Ev@@GLIBCXX_3.4
 FUNC:_ZNSt6localeaSERKS_@@GLIBCXX_3.4
 
FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE@@GLIBCXX_3.4.11
+FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEEPFPvS3_E@@GLIBCXX_3.4.17
 FUNC:_ZNSt6thread4joinEv@@GLIBCXX_3.4.11
 FUNC:_ZNSt6thread6detachEv@@GLIBCXX_3.4.11
 FUNC:_ZNSt7codecvtIcc11__mbstate_tEC1EP15__locale_structm@@GLIBCXX_3.4
Index: libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt
===
--- libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt  
(revision 182271)
+++ libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt  
(working copy)
@@ -1955,6 +1955,7 @@ FUNC:_ZNSt6localeD1Ev@@GLIBCXX_3.4
 FUNC:_ZNSt6localeD2Ev@@GLIBCXX_3.4
 FUNC:_ZNSt6localeaSERKS_@@GLIBCXX_3.4
 
FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE@@GLIBCXX_3.4.11
+FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEEPFPvS3_E@@GLIBCXX_3.4.17
 FUNC:_ZNSt6thread4joinEv@@GLIBCXX_3.4.11
 FUNC:_ZNSt6thread6detachEv@@GLIBCXX_3.4.11
 FUNC:_ZNSt7codecvtIcc11__mbstate_tEC1EP15__locale_structm@@GLIBCXX_3.4
Index: libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt
===
--- libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt
(revision 182271)
+++ libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_sym

Re: RFC: IRA patch to reduce lifetimes

2011-12-23 Thread Vladimir Makarov

On 12/21/2011 09:09 AM, Bernd Schmidt wrote:

For a customer I've looked into improving code for 456.hmmer on a mips64
target. The benchmark responds to -fsched-pressure, which reduces
lifetimes of a few registers.

This patch was an experiment to see if we can get the same improvement
with modifications to IRA, making it more tolerant to over-aggressive
scheduling. THe idea is that if an instruction sets a register A, and
all its inputs are live and unmodified for the lifetime of A, then
moving the instruction downwards towards its first use is going to be
beneficial from a register pressure point of view.

That alone, however, turns out to be too aggressive, performance drops
presumably because we undo too many scheduling decisions. So, the patch
detects such situations, and splits the pseudo; a new pseudo is
introduced in the original setting instruction, and a copy is added
before the first use. If the new pseudo does not get a hard register, it
is removed again and instead the setting instruction is moved to the
point of the copy.

This gets up to 6.5% on 456.hmmer on the mips target I was working on;
an embedded benchmark suite also seems to have a (small) geomean
improvement. On x86_64, I've tested spec2k, where specint is unchanged
and specfp has a tiny performance regression. All these tests were done
with a gcc-4.6 based tree.

Thoughts? Currently the patch feels somewhat bolted on to the side of
IRA, maybe there's a nicer way to achieve this?

I think that is an excellent idea.  I used analogous approach for 
splitting pseudo in IRA on loop bounds even if it gets hard register 
inside and outside loops.  The copies are removed if the live ranges 
were not spilled in reload.


I have no problem with this patch.  It is just a small change in IRA.



Re: RFC: An alternative -fsched-pressure implementation

2011-12-23 Thread Vladimir Makarov

On 12/23/2011 06:46 AM, Richard Sandiford wrote:

So it looks like two pieces of work related to scheduling and register
pressure are being posted close together.  This one is an RFC for a less
aggressive form of -fsched-pressure.  I think it should complement
rather than compete with Bernd's IRA patch.  It seems like a good idea
to take register pressure into account during the first scheduling pass,
where we can still easily look at things like instruction latencies
and pipeline utilisation.  Better rematerialisation in the register
allocator would obviously be a good thing too though.

This patch started when we (Linaro) saw a big drop in performance
from vectorising an RGB to YIQ filter on ARM.  The first scheduling
pass was overly aggressive in creating a "wide" schedule, and caused
the newly-vectorised loop to contain lots of spills.  The loop grew so
big that it even had a constant pool in the middle of it.

-fsched-pressure did a very good job on this loop, creating far fewer
spills and consequently avoiding the constant pool.  However, it seemed
to make several other cases significantly worse.  The idea was therefore
to try to come up with a form of -fsched-pressure that could be turned
on for ARM by default.

Current -fsched-pressure works by assigning an "excess (pressure) cost change"
to each instruction; here I'll write that as ECC(X).  -fsched-pressure also
changes the way that the main list scheduler handles stalls due to data
dependencies.  If an instruction would stall for N cycles, the scheduler
would normally add it to the "now+N" queue, then add it to the ready queue
after N cycles.  With -fsched-pressure, it instead adds the instruction
to the ready queue immediately, while still recording that the instruction
would require N stalls.  I'll write the number of stalls on X as delay(X).

This arrangement allows the scheduler to choose between increasing register
pressure and introducing a deliberate stall.  Instructions are ranked by:

   (a) lowest ECC(X) + delay(X)
   (b) lowest delay(X)
   (c) normal list-scheduler ranking (based on things like INSN_PRIORITY)

Note that since delay(X) is measured in cycles, ECC(X) is effectively
measured in cycles too.

Several things seemed to be causing the degradations we were seeing
with -fsched-pressure:

   (1) The -fsched-pressure schedule is based purely on instruction latencies
   and issue rate; it doesn't take the DFA into account.  This means that
   we attempt to "dual issue" things like vector operations, loads and
   stores on Cortex A8 and A9.  In the examples I looked at, these sorts
   of inaccuracy seemed to accumulate, so that the value of delay(X)
   became based on slightly unrealistic cycle times.

   Note that this also affects code that doesn't have any pressure
   problems; it isn't limited to code that does.

   This may simply be historical.  It became much easier to use the
   DFA here after Bernd's introduction of prune_ready_list, but the
   original -fsched-pressure predates that.

   (2) We calculate ECC(X) by walking the unscheduled part of the block
   in its original order, then recording the pressure at each instruction.
   This seemed to make ECC(X) quite sensitive to that original order.
   I saw blocks that started out naturally "narrow" (not much ILP,
   e.g. from unrolled loops) and others that started naturally "wide"
   (a lot of ILP, such as in the libav h264 code), and walking the
   block in order meant that the two styles would be handled differently.

   (3) When calculating the pressure of the original block (as described
   in (2)), we ignore the deaths of registers that are used by more
   than one unscheduled instruction.  This tended to hurt long(ish)
   loops in things like filters, where the same value is often used
   as an input to two calculations.  The effect was that instructions
   towards the end of the block would appear to have very high pressure.
   This in turn made the algorithm very conservative; it wouldn't
   promote instructions from later in the block because those
   instructions seemed to have a prohibitively large cost.

   I asked Vlad about this, and he confirmed that it was a deliberate
   decision.  He'd tried honouring REG_DEAD notes instead, but it
   produced worse results on x86.  I'll return to this at the end.

   (4) ECC(X) is based on the pressure over and above ira_available_class_regs
   (the number of allocatable registers in a given class).  ARM has 14
   allocatable GENERAL_REGS: 16 minus the stack pointer and program
   counter.  So if 14 integer variables are live across a loop but
   not referenced within it, we end up scheduling that loop in a context
   of permanent pressure.  Pressure becomes the overriding concern,
   and we don't get much ILP.

   I suppose there are at least two ways of viewing this:

   (4a) We'r

Re: [PATCH] libstdc++: Make it possible to annotate the shared pointer operations in the std::thread implementation

2011-12-23 Thread Paolo Carlini
Hi,

> As documented in the libstdc++ manual, the shared pointer operations in
> libstdc++ headers can be instrumented by defining the macros
> _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE()/AFTER() and libstdc++ has to be
> rebuilt in order to instrument the remaining shared pointer operations.
> However, rebuilding libstdc++ is inconvenient. So let's move the thread
> wrapper code from thread.cc into .

First, do you have already a Copyright assignment on file? It's a precondition 
for any non trivial contribution.

That said, please leave alone the baselines. Otherwise, Jon can comment on 
whether the reshuffling makes sense and would be safe from the Abi point of 
view.

Paolo


[v3] implement LWG 2056

2011-12-23 Thread Jonathan Wakely
PR libstdc++/49204
* include/std/future (future_errc): Implement LWG 2056.

tested x86_84-linux, committed to trunk
Index: include/std/future
===
--- include/std/future  (revision 182657)
+++ include/std/future  (revision 182658)
@@ -60,10 +60,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Error code for futures
   enum class future_errc
   {
-broken_promise,
-future_already_retrieved,
+future_already_retrieved = 1,
 promise_already_satisfied,
-no_state
+no_state,
+broken_promise
   };
 
   /// Specialization.


#undef fopen+freopen prior to #def in system.h, for aix bootstrap

2011-12-23 Thread Olivier Hainque
bootstrap currently fails for mainline on AIX, first because of problems like

 ...trunk/libcpp/system.h:47:0: error: "fopen" redefined [-Werror]
 .../include-fixed/stdio.h:110:0: note: this is the location of the previous 
definition

Indeed, libcpp/system and gcc/system.h have

  /* Use the unlocked open routines from libiberty.  */
  ...
  #define fopen(PATH,MODE) fopen_unlocked(PATH,MODE)
  #define fdopen(FILDES,MODE) fdopen_unlocked(FILDES,MODE)
  #define freopen(PATH,MODE,STREAM) freopen_unlocked(PATH,MODE,STREAM)

while /usr/include/stdio.h on AIX (5.3 at least) has

  #ifdef _LARGE_FILES
  ...
  #define fopen fopen64
  #define freopen freopen64

gcc/system.h already has some provision for this sort of mishap:

  #ifdef fopen /* fopen is a #define on VMS.  */
  #undef fopen
  #endif

The attached patch is a suggestion to simplify and widen this a bit
to catch all the AIX related problems to date.

Tested by checking that bootstrap proceeds (and ends successfully
after another change, to be posted shortly) on powerpc-ibm-aix5.3.0
with languages=all,ada. Also bootstrapped on i686-suse-linux.

OK ?

Thanks in advance,

Regards,

Olivier

--

2011-12-23  Olivier Hainque  

* system.h: #undef fopen and freopen unconditionally.

libcpp/
* system.h: #undef fopen and freopen unconditionally.




aix-redef.dif
Description: video/dv


[v3] update comments

2011-12-23 Thread Jonathan Wakely
The comments in  were copied from the TR1 implementation,
this updates them w.r.t C++11, including removing the "likely a
defect" comment because 27.9.2/4 clarifies that abs and div are only
overloaded for intmax_t if it's an extended integer type.

* include/c_global/cinttypes: Update comments that refer to TR1.

Tested x86_64-linux, committed to trunk.
Index: include/c_global/cinttypes
===
--- include/c_global/cinttypes  (revision 182658)
+++ include/c_global/cinttypes  (revision 182659)
@@ -1,6 +1,6 @@
 //  -*- C++ -*-
 
-// Copyright (C) 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
+// Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -37,7 +37,7 @@
 
 #include 
 
-// For 8.11.1/1 (see C99, Note 184)
+// For 27.9.2/3 (see C99, Note 184)
 #if _GLIBCXX_HAVE_INTTYPES_H
 # ifndef __STDC_FORMAT_MACROS
 #  define _UNDEF__STDC_FORMAT_MACROS
@@ -59,16 +59,10 @@ namespace std
 
   // functions
   using ::imaxabs;
-
-  // May collide with _Longlong abs(_Longlong), and is not described
-  // anywhere outside the synopsis.  Likely, a defect.
-  //
-  // intmax_t abs(intmax_t)
-
   using ::imaxdiv;
 
-  // Likewise, with lldiv_t div(_Longlong, _Longlong).
-  //
+  // GCC does not support extended integer types
+  // intmax_t abs(intmax_t)
   // imaxdiv_t div(intmax_t, intmax_t)
 
   using ::strtoimax;


refine cast in collect2 for AIX, fixing bootstrap

2011-12-23 Thread Olivier Hainque
Hello, 

Past http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01691.html,
bootstrap still fails on AIX, from

  gcc/collect2.c:1484:25: error: to be safe all intermediate pointers in cast 
from
  'char **' to 'const char **' must be 'const' qualified [-Werror=cast-qual]
  gcc/collect2.c:1488:15: 

This patch fixes this by using CONST_CAST2, as seems appropriate for the
case at hand.

Tested by checking that bootstrap proceeds and terminates after the
change on powerpc-ibm-aix5.3

OK ?

Thanks in advance,

Regards,

Olivier

--

2011-12-23  Olivier Hainque  

* collect2.c (main): In AIX specific computations for vector insertions,
use CONST_CAST2 to cast from char ** to const char **.



aix-collect-cast.dif
Description: video/dv


[v3] adjust weak_ptr testcase

2011-12-23 Thread Jonathan Wakely
This modifies the test to PASS when the expected type of exception is
caught, instead of being XFAIL due to uncaught exception.

Tested x86_64-linux, committed to trunk.

* testsuite/tr1/2_general_utilities/shared_ptr/cons/
weak_ptr_expired.cc: Modify to PASS instead of XFAIL.
Index: testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc
===
--- testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc   
(revision 182660)
+++ testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc   
(revision 182661)
@@ -1,5 +1,5 @@
-// { dg-do run { xfail *-*-* } }
-// Copyright (C) 2005, 2009 Free Software Foundation
+// { dg-do run }
+// Copyright (C) 2005, 2009, 2010, 2011 Free Software Foundation
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -29,7 +29,7 @@ struct A { };
 int
 test01()
 {
-  bool test __attribute__((unused)) = true;
+  bool test = false;
 
   std::tr1::shared_ptr a1(new A);
   std::tr1::weak_ptr wa(a1);
@@ -42,12 +42,9 @@ test01()
   catch (const std::tr1::bad_weak_ptr&)
   {
 // Expected.
-  __throw_exception_again;
-  }
-  catch (...)
-  {
-// Failed.
+test = true;
   }
+  VERIFY( test );
 
   return 0;
 }


Re: [PATCH v3 00/10] MIPS vectorization improvements

2011-12-23 Thread Richard Henderson
On 12/22/2011 12:44 PM, Richard Sandiford wrote:
> Woah, thanks, that's quite some work.  OK for the patches I didn't
> respond to.

Here's a combined follow-on patch that I believe addresses all of
the comments you had.

Ok?


r~
commit 824b5ca31ea21bb02cedabf79bb98e4348c34366
Author: Richard Henderson 
Date:   Thu Dec 22 12:23:03 2011 -0800

mips: Feedback from rsandiford.

diff --git a/gcc/config/mips/mips-modes.def b/gcc/config/mips/mips-modes.def
index 85861a9..187c651 100644
--- a/gcc/config/mips/mips-modes.def
+++ b/gcc/config/mips/mips-modes.def
@@ -26,15 +26,15 @@ RESET_FLOAT_FORMAT (DF, mips_double_format);
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
-VECTOR_MODES (INT, 8);/*   V8QI  V4HI V2SI */
-VECTOR_MODES (FLOAT, 8);  /* V4HF V2SF */
-VECTOR_MODES (INT, 4);/* V4QI V2HI */
+VECTOR_MODES (INT, 4);/* V4QI  V2HI  */
+VECTOR_MODES (INT, 8);/* V8QI  V4HI V2SI */
+VECTOR_MODES (FLOAT, 8);  /*   V4HF V2SF */
 
 /* Double-sized vector modes for vec_concat.  */
-VECTOR_MODE (INT, QI, 16);
-VECTOR_MODE (INT, HI, 8);
-VECTOR_MODE (INT, SI, 4);
-VECTOR_MODE (FLOAT, SF, 4);
+VECTOR_MODE (INT, QI, 16);/* V16QI   */
+VECTOR_MODE (INT, HI, 8); /*   V8HI  */
+VECTOR_MODE (INT, SI, 4); /*V4SI */
+VECTOR_MODE (FLOAT, SF, 4);   /*V4SF */
 
 VECTOR_MODES (FRACT, 4);   /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4);  /* V4UQQ V2UHQ */
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index bc76078..94d2c2f 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -4638,7 +4638,7 @@ mips_get_arg_info (struct mips_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   /* The EABI conventions have traditionally been defined in terms
 of TYPE_MODE, regardless of the actual type.  */
   info->fpr_p = ((GET_MODE_CLASS (mode) == MODE_FLOAT
- || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+ || mode == V2SFmode)
 && GET_MODE_SIZE (mode) <= UNITS_PER_FPVALUE);
   break;
 
@@ -4653,7 +4653,7 @@ mips_get_arg_info (struct mips_arg_info *info, const 
CUMULATIVE_ARGS *cum,
 || SCALAR_FLOAT_TYPE_P (type)
 || VECTOR_FLOAT_TYPE_P (type))
 && (GET_MODE_CLASS (mode) == MODE_FLOAT
-|| GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+|| mode == V2SFmode)
 && GET_MODE_SIZE (mode) <= UNITS_PER_FPVALUE);
   break;
 
@@ -4666,7 +4666,7 @@ mips_get_arg_info (struct mips_arg_info *info, const 
CUMULATIVE_ARGS *cum,
 && (type == 0 || FLOAT_TYPE_P (type))
 && (GET_MODE_CLASS (mode) == MODE_FLOAT
 || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
-|| GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+|| mode == V2SFmode)
 && GET_MODE_UNIT_SIZE (mode) <= UNITS_PER_FPVALUE);
 
   /* ??? According to the ABI documentation, the real and imaginary
@@ -5103,7 +5103,7 @@ static bool
 mips_return_mode_in_fpr_p (enum machine_mode mode)
 {
   return ((GET_MODE_CLASS (mode) == MODE_FLOAT
-  || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+  || mode == V2SFmode
   || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
  && GET_MODE_UNIT_SIZE (mode) <= UNITS_PER_HWFPVALUE);
 }
@@ -10786,8 +10786,14 @@ mips_cannot_change_mode_class (enum machine_mode from,
   enum machine_mode to,
   enum reg_class rclass)
 {
-  /* There are several problems with changing the modes of values in
- floating-point registers:
+  /* Allow conversions between different Loongson integer vectors,
+ and between those vectors and DImode.  */
+  if (GET_MODE_SIZE (from) == 8 && GET_MODE_SIZE (to) == 8
+  && INTEGRAL_MODE_P (from) && INTEGRAL_MODE_P (to))
+return false;
+
+  /* Otherwise, there are several problems with changing the modes of
+ values in floating-point registers:
 
  - When a multi-word value is stored in paired floating-point
registers, the first register always holds the low word.  We
@@ -10809,12 +10815,6 @@ mips_cannot_change_mode_class (enum machine_mode from,
 
  We therefore disallow all mode changes involving FPRs.  */
 
-  /* Except for Loongson and its integral vectors.  We need to be able
- to change between those modes easily.  */
-  if (GET_MODE_SIZE (from) == 8 && GET_MODE_SIZE (to) == 8
-  && INTEGRAL_MODE_P (from) && INTEGRAL_MODE_P (to))
-return false;
-
   return reg_classes_intersect_p (FP_REGS, rclass);
 }
 
@@ -16352,7 +16352,8 @@ struct expand_vec_perm_d
return true if that's a valid instruction in the active ISA.  */
 
 static bool
-expand_vselect (rtx target, rtx op0, const unsigned char *perm, unsigned nelt)
+mips_

Re: [PATCH v3 00/10] MIPS vectorization improvements

2011-12-23 Thread Richard Sandiford
Richard Henderson  writes:
> On 12/22/2011 12:44 PM, Richard Sandiford wrote:
>> Woah, thanks, that's quite some work.  OK for the patches I didn't
>> respond to.
>
> Here's a combined follow-on patch that I believe addresses all of
> the comments you had.
>
> Ok?

Yeah, looks good, thanks.

Richard


Re: [PATCH v3 00/10] MIPS vectorization improvements

2011-12-23 Thread Richard Henderson
On 12/23/2011 10:00 AM, Richard Sandiford wrote:
> Richard Henderson  writes:
>> On 12/22/2011 12:44 PM, Richard Sandiford wrote:
>>> Woah, thanks, that's quite some work.  OK for the patches I didn't
>>> respond to.
>>
>> Here's a combined follow-on patch that I believe addresses all of
>> the comments you had.
>>
>> Ok?
> 
> Yeah, looks good, thanks.

Thanks for the review.  I've committed the squash of all those patches.


r~


[PATCH][Cilkplus] Array notations for Function Calls

2011-12-23 Thread Iyer, Balaji V
Hello Everyone,
This patch is for the C-Compiler in the Cilkplus branch. It is an extension 
of the following: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01667.html. This 
patch will allow users to use array notations inside the function call 
parameters.

Thanking Youi,

Yours Sincerely,

Balaji V. Iyer.diff --git a/gcc/ChangeLog.cilk b/gcc/ChangeLog.cilk
index 76decf6..b51ddcd 100644
--- a/gcc/ChangeLog.cilk
+++ b/gcc/ChangeLog.cilk
@@ -1,3 +1,14 @@
+2011-12-25  Balaji V. Iyer  
+
+   * c-array-notations.c (find_rank): Added a check for CALL_EXPR and
+   AGGR_INIT_EXPR.
+   (extract_array_notation_exprs): Likewise.
+   (replace_array_notations): Likewise.
+   (build_array_notation_expr): Changed variable from ii to jj.
+   (fix_array_notation_expr): Moved default_function_array_read_conversion
+   into if and else-if to handle function call case.
+   * c-parser.c (c_parser_expr_no_commas): Added a check for CALL_EXPR.
+
 2011-12-24  Balaji V. Iyer  
 
* c-array-notations.c (fix_array_notation_expr): New function.
diff --git a/gcc/c-array-notation.c b/gcc/c-array-notation.c
index ca8bfbb..eba7109 100644
--- a/gcc/c-array-notation.c
+++ b/gcc/c-array-notation.c
@@ -43,8 +43,23 @@ find_rank (tree array, int *rank)
 }
   else
 {
-  for (ii = 0; ii < TREE_CODE_LENGTH (TREE_CODE (array)); ii++)
-   find_rank (TREE_OPERAND (array, ii), rank);
+  if (TREE_CODE (array) == CALL_EXPR
+ || TREE_CODE (array) == AGGR_INIT_EXPR)
+   {
+ if (TREE_CODE (TREE_OPERAND (array, 0)) == INTEGER_CST)
+   {
+ int length = TREE_INT_CST_LOW (TREE_OPERAND (array, 0));
+ for (ii = 0; ii < length; ii++)
+   find_rank (TREE_OPERAND (array, ii), rank);
+   }
+ else
+   gcc_unreachable ();
+   }
+  else
+   {
+ for (ii = 0; ii < TREE_CODE_LENGTH (TREE_CODE (array)); ii++)
+   find_rank (TREE_OPERAND (array, ii), rank);
+   }
 }
   return;
 }
@@ -75,6 +90,20 @@ extract_array_notation_exprs (tree node, tree **array_list, 
int *list_size)
extract_array_notation_exprs (*tsi_stmt_ptr (ii_tsi), array_list,
  list_size);
 }
+  else if (TREE_CODE (node) == CALL_EXPR || TREE_CODE (node) == AGGR_INIT_EXPR)
+{
+  if (TREE_CODE (TREE_OPERAND (node, 0)) == INTEGER_CST)
+   {
+ int length = TREE_INT_CST_LOW (TREE_OPERAND (node, 0));
+
+ for (ii = 0; ii < length; ii++)
+   extract_array_notation_exprs (TREE_OPERAND (node, ii), array_list,
+ list_size);
+   }
+  else
+   gcc_unreachable  (); /* should not get here */
+ 
+} 
   else
 {
   for (ii = 0; ii < TREE_CODE_LENGTH (TREE_CODE (node)); ii++)
@@ -108,6 +137,19 @@ replace_array_notations (tree *orig, tree *list, tree 
*array_operand,
replace_array_notations (tsi_stmt_ptr (ii_tsi), list, array_operand,
 array_size);
 }
+  else if (TREE_CODE (*orig) == CALL_EXPR
+  || TREE_CODE (*orig) == AGGR_INIT_EXPR)
+{
+  if (TREE_CODE (TREE_OPERAND (*orig, 0)) == INTEGER_CST)
+   {
+ int length = TREE_INT_CST_LOW (TREE_OPERAND (*orig, 0));
+ for (ii = 0; ii < length; ii++)
+   replace_array_notations (&TREE_OPERAND (*orig, ii), list,
+array_operand, array_size);
+   }
+  else
+   gcc_unreachable (); /* should not get here! */
+}
   else
 {
   for (ii = 0; ii < TREE_CODE_LENGTH (TREE_CODE (*orig)); ii++)
@@ -489,12 +531,12 @@ build_array_notation_expr (location_t location, tree lhs, 
tree lhs_origtype,
}
   else
{
- if (lhs_count_down[ii])
-   cond_expr[ii] = build2
- (GT_EXPR, boolean_type_node, lhs_var[ii], lhs_length[ii]);
+ if (lhs_count_down[jj])
+   cond_expr[jj] = build2
+ (GT_EXPR, boolean_type_node, lhs_var[jj], lhs_length[jj]);
  else
-   cond_expr[ii] = build2
- (LT_EXPR, boolean_type_node, lhs_var[ii], lhs_length[ii]);
+   cond_expr[jj] = build2
+ (LT_EXPR, boolean_type_node, lhs_var[jj], lhs_length[jj]);
}
 }
   
@@ -1058,11 +1100,17 @@ fix_array_notation_expr (location_t location, enum 
tree_code code,
   add_stmt (body_label_expr[ii]);
 }
 
-  arg = default_function_array_read_conversion (location, arg);
   if (code == POSTINCREMENT_EXPR || code == POSTDECREMENT_EXPR)
-arg.value = build_unary_op (location, code, arg.value, 0);
+{
+  arg = default_function_array_read_conversion (location, arg);
+  arg.value = build_unary_op (location, code, arg.value, 0);
+}
   else if (code == PREINCREMENT_EXPR || code == PREDECREMENT_EXPR)
-arg = parser_build_unary_op (location, code, arg);
+{
+  arg = default_function_array_read_conversion (location, arg)

[BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Anatoly Sokolov
  Hi.

  This patch removes obsolete REGISTER_MOVE_COST and MEMORY_MOVE_COST
macros from the Blackfin back end in the GCC and introduces equivalent
TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST target hooks.

  Untested.

  OK to install?

* config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
* config/bfin/bfin-protos.h (bfin_register_move_cost,
bfin_memory_move_cost): Remove. 
* config/bfin/bfin.c (bfin_register_move_cost,
bfin_memory_move_cost): Make static. Change arguments type from
enum reg_class to reg_class_t and from int to bool.
(TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define.

Index: gcc/config/bfin/bfin-protos.h
===
--- gcc/config/bfin/bfin-protos.h   (revision 182658)
+++ gcc/config/bfin/bfin-protos.h   (working copy)
@@ -85,9 +85,6 @@ extern bool bfin_longcall_p (rtx, int);
 extern bool bfin_dsp_memref_p (rtx);
 extern bool bfin_expand_movmem (rtx, rtx, rtx, rtx);
 
-extern int bfin_register_move_cost (enum machine_mode, enum reg_class,
-   enum reg_class);
-extern int bfin_memory_move_cost (enum machine_mode, enum reg_class, int in);
 extern enum reg_class secondary_input_reload_class (enum reg_class,
enum machine_mode,
rtx);
Index: gcc/config/bfin/bfin.c
===
--- gcc/config/bfin/bfin.c  (revision 182658)
+++ gcc/config/bfin/bfin.c  (working copy)
@@ -2149,12 +2149,11 @@ bfin_vector_mode_supported_p (enum machi
   return mode == V2HImode;
 }
 
-/* Return the cost of moving data from a register in class CLASS1 to
-   one in class CLASS2.  A cost of 2 is the default.  */
+/* Worker function for TARGET_REGISTER_MOVE_COST.  */
 
-int
+static int
 bfin_register_move_cost (enum machine_mode mode,
-enum reg_class class1, enum reg_class class2)
+reg_class_t class1, reg_class_t class2)
 {
   /* These need secondary reloads, so they're more expensive.  */
   if ((class1 == CCREGS && !reg_class_subset_p (class2, DREGS))
@@ -2177,18 +2176,16 @@ bfin_register_move_cost (enum machine_mo
   return 2;
 }
 
-/* Return the cost of moving data of mode M between a
-   register and memory.  A value of 2 is the default; this cost is
-   relative to those in `REGISTER_MOVE_COST'.
+/* Worker function for TARGET_MEMORY_MOVE_COST.
 
??? In theory L1 memory has single-cycle latency.  We should add a switch
that tells the compiler whether we expect to use only L1 memory for the
program; it'll make the costs more accurate.  */
 
-int
+static int
 bfin_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
-  enum reg_class rclass,
-  int in ATTRIBUTE_UNUSED)
+  reg_class_t rclass,
+  bool in ATTRIBUTE_UNUSED)
 {
   /* Make memory accesses slightly more expensive than any register-register
  move.  Also, penalize non-DP registers, since they need secondary
@@ -5703,6 +5700,12 @@ bfin_conditional_register_usage (void)
 #undef  TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST bfin_address_cost
 
+#undef TARGET_REGISTER_MOVE_COST
+#define TARGET_REGISTER_MOVE_COST bfin_register_move_cost
+
+#undef TARGET_MEMORY_MOVE_COST
+#define TARGET_MEMORY_MOVE_COST bfin_memory_move_cost
+
 #undef  TARGET_ASM_INTEGER
 #define TARGET_ASM_INTEGER bfin_assemble_integer
 
Index: gcc/config/bfin/bfin.h
===
--- gcc/config/bfin/bfin.h  (revision 182658)
+++ gcc/config/bfin/bfin.h  (working copy)
@@ -975,29 +975,6 @@ typedef struct {
 /* Do not put function addr into constant pool */
 #define NO_FUNCTION_CSE 1
 
-/* A C expression for the cost of moving data from a register in class FROM to
-   one in class TO.  The classes are expressed using the enumeration values
-   such as `GENERAL_REGS'.  A value of 2 is the default; other values are
-   interpreted relative to that.
-
-   It is not required that the cost always equal 2 when FROM is the same as TO;
-   on some machines it is expensive to move between registers if they are not
-   general registers.  */
-
-#define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2) \
-   bfin_register_move_cost ((MODE), (CLASS1), (CLASS2))
-
-/* A C expression for the cost of moving data of mode M between a
-   register and memory.  A value of 2 is the default; this cost is
-   relative to those in `REGISTER_MOVE_COST'.
-
-   If moving between registers and memory is more expensive than
-   between two registers, you should define this macro to express the
-   relative cost.  */
-
-#define MEMORY_MOVE_COST(MODE, CLASS, IN)  \
-  bfin_memory_move_cost ((MODE), (CLASS), (IN))
-
 /* Specify the machine mode that this machine uses
for

[SCORE] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Anatoly Sokolov
  Hi.

  This patch removes obsolete REGISTER_MOVE_COST macro from the SCORE back 
end in the GCC and introduces equivalent TARGET_MEMORY_MOVE_COST target hook.
The MEMORY_MOVE_COST macros is removed and default implementation of the 
TARGET_MEMORY_MOVE_COST target hook is used.

  Untested.

  OK to install?

* config/score/score.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
* config/score/score-protos.h (score_register_move_cost): Remove.   
* config/score/score.c (TARGET_REGISTER_MOVE_COST): Define.
(score_register_move_cost): Make static. Change arguments type from
enum reg_class to reg_class_t.

Index: gcc/config/score/score.h
===
--- gcc/config/score/score.h(revision 182660)
+++ gcc/config/score/score.h(working copy)
@@ -601,14 +601,6 @@ typedef struct score_args
 #define REVERSIBLE_CC_MODE(MODE)1
 
 /* Describing Relative Costs of Operations  */
-/* Compute extra cost of moving data between one register class and another.  
*/
-#define REGISTER_MOVE_COST(MODE, FROM, TO) \
-  score_register_move_cost (MODE, FROM, TO)
-
-/* Moves to and from memory are quite expensive */
-#define MEMORY_MOVE_COST(MODE, CLASS, TO_P) \
-  (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P)))
-
 /* Try to generate sequences that don't involve branches.  */
 #define BRANCH_COST(speed_p, predictable_p) 2
 
Index: gcc/config/score/score-protos.h
===
--- gcc/config/score/score-protos.h (revision 182660)
+++ gcc/config/score/score-protos.h (working copy)
@@ -42,8 +42,6 @@ extern bool score_block_move (rtx* ops);
 extern int score_address_cost (rtx addr, bool speed);
 extern int score_address_p (enum machine_mode mode, rtx x, int strict);
 extern int score_reg_class (int regno);
-extern int score_register_move_cost (enum machine_mode mode, enum reg_class to,
- enum reg_class from);
 extern int score_hard_regno_mode_ok (unsigned int, enum machine_mode);
 extern int score_const_ok_for_letter_p (HOST_WIDE_INT value, char c);
 extern int score_extra_constraint (rtx op, char c);
Index: gcc/config/score/score.c
===
--- gcc/config/score/score.c(revision 182660)
+++ gcc/config/score/score.c(working copy)
@@ -187,6 +187,9 @@ struct extern_list *extern_head = 0;
 #undef TARGET_TRAMPOLINE_INIT
 #define TARGET_TRAMPOLINE_INIT score_trampoline_init
 
+#undef TARGET_REGISTER_MOVE_COST
+#define TARGET_REGISTER_MOVE_COST  score_register_move_cost
+
 /* Return true if SYMBOL is a SYMBOL_REF and OFFSET + SYMBOL points
to the same object as SYMBOL.  */
 static int
@@ -998,11 +1001,13 @@ score_legitimate_address_p (enum machine
   return score_classify_address (&addr, mode, x, strict);
 }
 
-/* Return a number assessing the cost of moving a register in class
+/* Implement TARGET_REGISTER_MOVE_COST.
+
+   Return a number assessing the cost of moving a register in class
FROM to class TO. */
-int
+static int
 score_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
-  enum reg_class from, enum reg_class to)
+  reg_class_t from, reg_class_t to)
 {
   if (GR_REG_CLASS_P (from))
 {


Anatoly.



Re: [SCORE] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Richard Henderson
On 12/23/2011 11:08 AM, Anatoly Sokolov wrote:
> * config/score/score.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
> * config/score/score-protos.h (score_register_move_cost): Remove. 
>   
> * config/score/score.c (TARGET_REGISTER_MOVE_COST): Define.
> (score_register_move_cost): Make static. Change arguments type from
> enum reg_class to reg_class_t.

Ok.


r~


Re: [BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Richard Henderson
On 12/23/2011 10:55 AM, Anatoly Sokolov wrote:
> * config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
> * config/bfin/bfin-protos.h (bfin_register_move_cost,
> bfin_memory_move_cost): Remove. 
> * config/bfin/bfin.c (bfin_register_move_cost,
> bfin_memory_move_cost): Make static. Change arguments type from
> enum reg_class to reg_class_t and from int to bool.
> (TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define.

Ok.


r~


Re: #undef fopen+freopen prior to #def in system.h, for aix bootstrap

2011-12-23 Thread Olivier Hainque
A minor update to provide a more precise ChangeLog:

>   * system.h: #undef fopen and freopen unconditionally.


2011-12-23  Olivier Hainque  

* system.h: Prior to #define, #undef fopen and freopen unconditionally.

libcpp/
* system.h: Likewise.




[lra] patch to fix an arm testsuite degradation

2011-12-23 Thread Vladimir Makarov
The following patch fixes a degradation of 20060102-1.c  on ARM.  Not 
updating REG notes resulted in removing an insn after LRA as it was 
wrongly considered dead.


The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 182664.

2011-12-23  Vladimir Makarov 

* lra.c (update_auto_inc_notes): Rename to update_reg_notes.  Make
it unconditional.  Remove REG_DEAD and REG_UNUSED too.  Make call
of add_auto_inc_notes conditional.

Index: lra.c
===
--- lra.c   (revision 182663)
+++ lra.c   (working copy)
@@ -2032,10 +2032,14 @@ add_auto_inc_notes (rtx insn, rtx x)
 }
 }
 
-/* DF infrastructure does not deal with REG_INC notes -- so update
-   them here.  */
+#endif
+
+/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC.
+   We change pseudos by hard registers without notification of DF and
+   that can make the notes obsolete.  DF-infrastructure does not deal
+   with REG_INC notes -- so we should regenerate them here.  */
 static void
-update_auto_inc_notes (void)
+update_reg_notes (void)
 {
   rtx *pnote;
   basic_block bb;
@@ -2048,17 +2052,19 @@ update_auto_inc_notes (void)
pnote = ®_NOTES (insn);
while (*pnote != 0)
  {
-   if (REG_NOTE_KIND (*pnote) == REG_INC)
+   if (REG_NOTE_KIND (*pnote) == REG_DEAD
+   || REG_NOTE_KIND (*pnote) == REG_UNUSED
+   || REG_NOTE_KIND (*pnote) == REG_INC)
  *pnote = XEXP (*pnote, 1);
else
  pnote = &XEXP (*pnote, 1);
  }
+#ifdef AUTO_INC_DEC
add_auto_inc_notes (insn, PATTERN (insn));
+#endif
   }
 }
 
-#endif
-
 /* Set to 1 while in lra.  */
 int lra_in_progress;
 
@@ -2204,9 +2210,7 @@ lra (FILE *f)
   regstat_free_n_sets_and_refs ();
   regstat_free_ri ();
   reload_completed = 1;
-#ifdef AUTO_INC_DEC
-  update_auto_inc_notes ();
-#endif
+  update_reg_notes ();
   finish_subregs_of_mode ();
 
   inserted_p = fixup_abnormal_edges ();


[PATCH, testsuite]: Fix gcc.dg/vect/fast-math-pr35982.c

2011-12-23 Thread Uros Bizjak
Hello!

2011-12-23  Uros Bizjak  

* gcc.dg/vect/fast-math-pr35982.c: Fix parenthesis in target selectors.

Tested on x86_64-pc-linux-gnu, committed to mainline.

Uros.
Index: gcc.dg/vect/fast-math-pr35982.c
===
--- gcc.dg/vect/fast-math-pr35982.c (revision 182661)
+++ gcc.dg/vect/fast-math-pr35982.c (working copy)
@@ -20,7 +20,6 @@
   return avg;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_extract_even_odd || vect_strided2 } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail 
vect_extract_even_odd || vect_strided2 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_extract_even_odd || vect_strided2 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { 
vect_extract_even_odd || vect_strided2 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
-


Re: [lra] patch to fix an arm testsuite degradation

2011-12-23 Thread Paolo Carlini
Hi Vladimir,

> The following patch fixes a degradation of 20060102-1.c  on ARM.

unless I'm badly mistaken, I see you using quite often the form 'degradation', 
which is somewhat unusual in this mailing list. Are you using it like 
'regression' or you actually mean something slightly, subtly, different?

A bit off topic, sorry,
Paolo


Re: [lra] patch to fix an arm testsuite degradation

2011-12-23 Thread Vladimir Makarov

On 12/23/2011 04:17 PM, Paolo Carlini wrote:

Hi Vladimir,


The following patch fixes a degradation of 20060102-1.c  on ARM.

unless I'm badly mistaken, I see you using quite often the form 'degradation', 
which is somewhat unusual in this mailing list. Are you using it like 
'regression' or you actually mean something slightly, subtly, different?

Paolo, thanks for pointing this out.  You are right.  I frequently 
wrongly use this word.  I should use regression.




Re: [patch, testsuite] One more strict-volatile-bitfields test case

2011-12-23 Thread Richard Henderson
On 12/22/2011 06:28 PM, Ye Joey wrote:
>   * gcc.dg/volatile-bitfields-2.c: New test.

Ok.


r~


Re: [BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Jie Zhang
Hi Anatoly,

I cannot apply your patch to a lean tree. I tried to save your email
as a text file, copy from thunderbird, copy from gmail, copy from the
mailing list archive. But neither works.

Regards,
Jie

2011/12/23 Anatoly Sokolov :
>  Hi.
>
>  This patch removes obsolete REGISTER_MOVE_COST and MEMORY_MOVE_COST
> macros from the Blackfin back end in the GCC and introduces equivalent
> TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST target hooks.
>
>  Untested.
>
>  OK to install?
>
>        * config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
>        * config/bfin/bfin-protos.h (bfin_register_move_cost,
>        bfin_memory_move_cost): Remove.
>        * config/bfin/bfin.c (bfin_register_move_cost,
>        bfin_memory_move_cost): Make static. Change arguments type from
>        enum reg_class to reg_class_t and from int to bool.
>        (TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define.
>
> Index: gcc/config/bfin/bfin-protos.h
> ===
> --- gcc/config/bfin/bfin-protos.h       (revision 182658)
> +++ gcc/config/bfin/bfin-protos.h       (working copy)
> @@ -85,9 +85,6 @@ extern bool bfin_longcall_p (rtx, int);
>  extern bool bfin_dsp_memref_p (rtx);
>  extern bool bfin_expand_movmem (rtx, rtx, rtx, rtx);
>
> -extern int bfin_register_move_cost (enum machine_mode, enum reg_class,
> -                                   enum reg_class);
> -extern int bfin_memory_move_cost (enum machine_mode, enum reg_class, int in);
>  extern enum reg_class secondary_input_reload_class (enum reg_class,
>                                                    enum machine_mode,
>                                                    rtx);
> Index: gcc/config/bfin/bfin.c
> ===
> --- gcc/config/bfin/bfin.c      (revision 182658)
> +++ gcc/config/bfin/bfin.c      (working copy)
> @@ -2149,12 +2149,11 @@ bfin_vector_mode_supported_p (enum machi
>   return mode == V2HImode;
>  }
>
> -/* Return the cost of moving data from a register in class CLASS1 to
> -   one in class CLASS2.  A cost of 2 is the default.  */
> +/* Worker function for TARGET_REGISTER_MOVE_COST.  */
>
> -int
> +static int
>  bfin_register_move_cost (enum machine_mode mode,
> -                        enum reg_class class1, enum reg_class class2)
> +                        reg_class_t class1, reg_class_t class2)
>  {
>   /* These need secondary reloads, so they're more expensive.  */
>   if ((class1 == CCREGS && !reg_class_subset_p (class2, DREGS))
> @@ -2177,18 +2176,16 @@ bfin_register_move_cost (enum machine_mo
>   return 2;
>  }
>
> -/* Return the cost of moving data of mode M between a
> -   register and memory.  A value of 2 is the default; this cost is
> -   relative to those in `REGISTER_MOVE_COST'.
> +/* Worker function for TARGET_MEMORY_MOVE_COST.
>
>    ??? In theory L1 memory has single-cycle latency.  We should add a switch
>    that tells the compiler whether we expect to use only L1 memory for the
>    program; it'll make the costs more accurate.  */
>
> -int
> +static int
>  bfin_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
> -                      enum reg_class rclass,
> -                      int in ATTRIBUTE_UNUSED)
> +                      reg_class_t rclass,
> +                      bool in ATTRIBUTE_UNUSED)
>  {
>   /* Make memory accesses slightly more expensive than any register-register
>      move.  Also, penalize non-DP registers, since they need secondary
> @@ -5703,6 +5700,12 @@ bfin_conditional_register_usage (void)
>  #undef  TARGET_ADDRESS_COST
>  #define TARGET_ADDRESS_COST bfin_address_cost
>
> +#undef TARGET_REGISTER_MOVE_COST
> +#define TARGET_REGISTER_MOVE_COST bfin_register_move_cost
> +
> +#undef TARGET_MEMORY_MOVE_COST
> +#define TARGET_MEMORY_MOVE_COST bfin_memory_move_cost
> +
>  #undef  TARGET_ASM_INTEGER
>  #define TARGET_ASM_INTEGER bfin_assemble_integer
>
> Index: gcc/config/bfin/bfin.h
> ===
> --- gcc/config/bfin/bfin.h      (revision 182658)
> +++ gcc/config/bfin/bfin.h      (working copy)
> @@ -975,29 +975,6 @@ typedef struct {
>  /* Do not put function addr into constant pool */
>  #define NO_FUNCTION_CSE 1
>
> -/* A C expression for the cost of moving data from a register in class FROM 
> to
> -   one in class TO.  The classes are expressed using the enumeration values
> -   such as `GENERAL_REGS'.  A value of 2 is the default; other values are
> -   interpreted relative to that.
> -
> -   It is not required that the cost always equal 2 when FROM is the same as 
> TO;
> -   on some machines it is expensive to move between registers if they are not
> -   general registers.  */
> -
> -#define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2) \
> -   bfin_register_move_cost ((MODE), (CLASS1), (CLASS2))
> -
> -/* A C expression for the cost of moving data of mode M between a
> -   register and memor

Re: [patch] libitm: Fix privatization safety during upgrades to serial mode.

2011-12-23 Thread Richard Henderson
On 12/22/2011 11:28 AM, Torvald Riegel wrote:
> libitm: Fix privatization safety during upgrades to serial mode.
> 
>   libitm/
>   * beginend.cc (GTM::gtm_thread::restart): Add and handle
>   finish_serial_upgrade parameter.
>   * libitm.h (GTM::gtm_thread::restart): Adapt declaration.
>   * config/linux/rwlock.cc (GTM::gtm_rwlock::write_lock_generic):
>   Don't unset reader flag.
>   (GTM::gtm_rwlock::write_upgrade_finish): New.
>   * config/posix/rwlock.cc: Same.
>   * config/linux/rwlock.h (GTM::gtm_rwlock::write_upgrade_finish):
>   Declare.
>   * config/posix/rwlock.h: Same.
>   * method-serial.cc (GTM::gtm_thread::serialirr_mode): Unset reader
>   flag after commit or after rollback when restarting.

Ok.



r~


C++ PATCH for c++/51507 (pack expansion in trailing-return-type)

2011-12-23 Thread Jason Merrill
The existing code to handle pack expansions in trailing-return-type 
assumed that such expansions would only occur inside decltype, which is 
not the case.  This patch fixes the test to check for whether or not 
we're doing the substitution in the context of a function body, and 
fixes at_function_scope_p to properly return false when we're 
substituting deduced arguments into a candidate function template.


Even with the change to at_function_scope_p it was impossible to tell 
that we weren't in function scope when instantiating a function 
declaration as part of overload resolution, so I also changed 
instantiate_template_1 to use push_to_top_level rather than just clear 
processing_template_decl.  In my testing it was enough to just clear 
current_function_decl as well, but since in fact the instantiation 
happens at top level it seems more correct to use push_to_top_level.


The second patch is a bug I noticed in dependent_name while working on 
this patch, though it isn't necessary to this patch; a BASELINK should 
not be considered a dependent name, or we end up treating calls to 
members of different classes as equivalent.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 4df8b36d378adc678ed4ca9ac91088ad0772b750
Author: Jason Merrill 
Date:   Thu Dec 22 11:03:09 2011 -0500

	PR c++/51507
	* search.c (at_function_scope_p): Also check cfun.
	* pt.c (tsubst_pack_expansion): Check it instead of
	cp_unevaluated_operand.
	(instantiate_template_1): Clear current_function_decl.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 820b1ff..20f67aa 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9297,6 +9297,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
   int i, len = -1;
   tree result;
   htab_t saved_local_specializations = NULL;
+  bool need_local_specializations = false;
   int levels;
 
   gcc_assert (PACK_EXPANSION_P (t));
@@ -9330,7 +9331,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
}
   if (TREE_CODE (parm_pack) == PARM_DECL)
 	{
-	  if (!cp_unevaluated_operand)
+	  if (at_function_scope_p ())
 	arg_pack = retrieve_local_specialization (parm_pack);
 	  else
 	{
@@ -9346,6 +9347,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
 		arg_pack = NULL_TREE;
 	  else
 		arg_pack = make_fnparm_pack (arg_pack);
+	  need_local_specializations = true;
 	}
 	}
   else
@@ -9476,7 +9478,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
   if (len < 0)
 return error_mark_node;
 
-  if (cp_unevaluated_operand)
+  if (need_local_specializations)
 {
   /* We're in a late-specified return type, so create our own local
 	 specializations table; the current table is either NULL or (in the
@@ -14524,7 +14526,6 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain)
   tree fndecl;
   tree gen_tmpl;
   tree spec;
-  HOST_WIDE_INT saved_processing_template_decl;
 
   if (tmpl == error_mark_node)
 return error_mark_node;
@@ -14585,18 +14586,22 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain)
  deferring all checks until we have the FUNCTION_DECL.  */
   push_deferring_access_checks (dk_deferred);
 
-  /* Although PROCESSING_TEMPLATE_DECL may be true at this point
- (because, for example, we have encountered a non-dependent
- function call in the body of a template function and must now
- determine which of several overloaded functions will be called),
- within the instantiation itself we are not processing a
- template.  */  
-  saved_processing_template_decl = processing_template_decl;
-  processing_template_decl = 0;
+  /* Instantiation of the function happens in the context of the function
+ template, not the context of the overload resolution we're doing.  */
+  push_to_top_level ();
+  if (DECL_CLASS_SCOPE_P (gen_tmpl))
+{
+  tree ctx = tsubst (DECL_CONTEXT (gen_tmpl), targ_ptr,
+			 complain, gen_tmpl);
+  push_nested_class (ctx);
+}
   /* Substitute template parameters to obtain the specialization.  */
   fndecl = tsubst (DECL_TEMPLATE_RESULT (gen_tmpl),
 		   targ_ptr, complain, gen_tmpl);
-  processing_template_decl = saved_processing_template_decl;
+  if (DECL_CLASS_SCOPE_P (gen_tmpl))
+pop_nested_class ();
+  pop_from_top_level ();
+
   if (fndecl == error_mark_node)
 return error_mark_node;
 
diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index 0ceb5bc..45fdafc 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -539,7 +539,11 @@ int
 at_function_scope_p (void)
 {
   tree cs = current_scope ();
-  return cs && TREE_CODE (cs) == FUNCTION_DECL;
+  /* Also check cfun to make sure that we're really compiling
+ this function (as opposed to having set current_function_decl
+ for access checking or some such).  */
+  return (cs && TREE_CODE (cs) == FUNCTION_DECL
+	  && cfun && cfun->decl == current_function_decl);
 }
 
 /* Returns true if th

[committed] Remove VEC_EXTRACT_EVEN/ODD_EXPR

2011-12-23 Thread Richard Henderson
Having now committed patches to convert all targets to vec_perm_const, 
supporting the interleave and even/odd permutations, we can now remove the 
VEC_INTERLEAVE_HIGH/LOW_EXPR and VEC_EXTRACT_EVEN/ODD_EXPR  codes as redundant 
with the primary VEC_PERM_EXPR code.

I have committed the patch previously posted by Jakub (and approved by Richi) 
that removes VEC_INTERLEAVE_HIGH/LOW_EXPR.  I have also committed thefollowing 
patch which removes VEC_EXTRACT_EVEN/ODD_EXPR.

All re-tested on x86_64-linux.


r~
* tree.def (VEC_EXTRACT_EVEN_EXPR, VEC_EXTRACT_ODD_EXPR): Remove.
* cfgexpand.c (expand_debug_expr): Don't handle them.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (fold_binary_loc): Likewise.
* gimple-pretty-print.c (dump_binary_rhs): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-vect-generic.c (expand_vector_operations_1): Likewise.
* optabs.c (optab_for_tree_code): Likewise.
(can_vec_perm_for_code_p): Remove.
(expand_binop): Don't try it.
(init_optabs): Don't init vec_extract_even/odd_optab.
* genopinit.c (optabs): Likewise.
* optabs.h (OTI_vec_extract_even, OTI_vec_extract_odd): Remove.
(vec_extract_even_optab, vec_extract_odd_optab): Remove.
* tree-vect-data-refs.c (vect_strided_store_supported): Tidy code.
(vect_permute_store_chain): Use TYPE_VECTOR_SUBPARTS instead of
GET_MODE_NUNITS; check vect_gen_perm_mask return value instead of
asserting vect_strided_store_supported.
(vect_strided_load_supported): Use can_vec_perm_p.
(vect_permute_load_chain): Use VEC_PERM_EXPR.

* doc/generic.texi (VEC_EXTRACT_EVEN_EXPR): Remove.
(VEC_EXTRACT_ODD_EXPR): Remove.
* doc/md.texi (vec_extract_even, vec_extract_odd): Remove.


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index dfe5442..2b2e464 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3449,8 +3449,6 @@ expand_debug_expr (tree exp)
 case REDUC_MIN_EXPR:
 case REDUC_PLUS_EXPR:
 case VEC_COND_EXPR:
-case VEC_EXTRACT_EVEN_EXPR:
-case VEC_EXTRACT_ODD_EXPR:
 case VEC_LSHIFT_EXPR:
 case VEC_PACK_FIX_TRUNC_EXPR:
 case VEC_PACK_SAT_EXPR:
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 4f26238..31e8855 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1695,8 +1695,6 @@ its sole argument yields the representation for @code{ap}.
 @tindex VEC_PACK_TRUNC_EXPR
 @tindex VEC_PACK_SAT_EXPR
 @tindex VEC_PACK_FIX_TRUNC_EXPR
-@tindex VEC_EXTRACT_EVEN_EXPR
-@tindex VEC_EXTRACT_ODD_EXPR
 
 @table @code
 @item VEC_LSHIFT_EXPR
@@ -1765,13 +1763,6 @@ of elements of a floating point type.  The result is a 
vector that contains
 twice as many elements of an integral type whose size is half as wide.  The
 elements of the two vectors are merged (concatenated) to form the output
 vector.
-
-@item VEC_EXTRACT_EVEN_EXPR
-@itemx VEC_EXTRACT_ODD_EXPR
-These nodes represent extracting of the even/odd elements of the two input
-vectors, respectively. Their operands and result are vectors that contain the
-same number of elements of the same type.
-
 @end table
 
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6dd6a58..93183e6 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4145,20 +4145,6 @@ operand 1 is new value of field and operand 2 specify 
the field index.
 Extract given field from the vector value.  Operand 1 is the vector, operand 2
 specify field index and operand 0 place to store value into.
 
-@cindex @code{vec_extract_even@var{m}} instruction pattern
-@item @samp{vec_extract_even@var{m}}
-Extract even elements from the input vectors (operand 1 and operand 2).
-The even elements of operand 2 are concatenated to the even elements of operand
-1 in their original order. The result is stored in operand 0.
-The output and input vectors should have the same modes.
-
-@cindex @code{vec_extract_odd@var{m}} instruction pattern
-@item @samp{vec_extract_odd@var{m}}
-Extract odd elements from the input vectors (operand 1 and operand 2).
-The odd elements of operand 2 are concatenated to the odd elements of operand
-1 in their original order. The result is stored in operand 0.
-The output and input vectors should have the same modes.
-
 @cindex @code{vec_init@var{m}} instruction pattern
 @item @samp{vec_init@var{m}}
 Initialize the vector to given values.  Operand 0 is the vector to initialize
diff --git a/gcc/expr.c b/gcc/expr.c
index cb28f48..c10f915 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8647,10 +8647,6 @@ expand_expr_real_2 (sepops ops, rtx target, enum 
machine_mode tmode,
 return temp;
   }
 
-case VEC_EXTRACT_EVEN_EXPR:
-case VEC_EXTRACT_ODD_EXPR:
-  goto binop;
-
 case VEC_LSHIFT_EXPR:
 case VEC_RSHIFT_EXPR:
   {
diff --git a/gcc/f