date:20140214

Re: TYPE_BINFO and canonical types at LTO

2014-02-14 Thread Jason Merrill


On 02/13/2014 04:48 PM, Jan Hubicka wrote:

all bases are also fields of within the type, so the second loop should notice
all the types seen by first loop if I am correct?


Yes, except that empty bases don't get fields because they have no data. 
 But since they have no data they aren't interesting to aliasing 
either, so you should be OK just looking at field types.


Jason

Re: TYPE_BINFO and canonical types at LTO

2014-02-14 Thread Richard Biener

On Fri, 14 Feb 2014, Jan Hubicka wrote:

> Hi,
> I have noticed that record_component_aliases is called during LTO time and it
> examines contents of BINFO:
> 0x5cd7a5 record_component_aliases(tree_node*)
> ../../gcc/alias.c:1005
> 0x5cd4a9 get_alias_set(tree_node*)
> ../../gcc/alias.c:895
> 0x5cc67a component_uses_parent_alias_set_from(tree_node const*)
> ../../gcc/alias.c:548
> 0x5ccc42 reference_alias_ptr_type_1
> ../../gcc/alias.c:660
> 0x5ccf93 get_alias_set(tree_node*)
> ../../gcc/alias.c:740
> 0xb823d8 indirect_refs_may_alias_p
> ../../gcc/tree-ssa-alias.c:1125
> 0xb82d8d refs_may_alias_p_1(ao_ref*, ao_ref*, bool)
> ../../gcc/tree-ssa-alias.c:1279
> 0xb848df stmt_may_clobber_ref_p_1(gimple_statement_base*, ao_ref*)
> ../../gcc/tree-ssa-alias.c:2013
> 0xb85d27 walk_non_aliased_vuses(ao_ref*, tree_node*, void* (*)(ao_ref*, 
> tree_node*, unsigned int, void*), void* (*)(ao_ref*, tree_node*, void*), 
> void*)
> ../../gcc/tree-ssa-alias.c:2411
> 0xc509f3 vn_reference_lookup(tree_node*, tree_node*, vn_lookup_kind, 
> vn_reference_s**)
> ../../gcc/tree-ssa-sccvn.c:2063
> 0xc52ea4 visit_reference_op_store
> ../../gcc/tree-ssa-sccvn.c:2970
> 0xc55404 extract_and_process_scc_for_name
> ../../gcc/tree-ssa-sccvn.c:3825
> 
> This smells bad, since it is given a canonical type that is after the
> structural equivalency merging that ignores BINFOs, so it may be completely
> different class with completely different bases than the original.  Bases are
> structuraly merged, too and may be exchanged for normal fields because
> DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part of
> the canonical type definition in LTO.

Can you elaborate on that DECL_ARTIFICIAL thing?  That is, what is broken
by considering all fields during that merging?

Note that the BINFO walk below only adds extra aliasing - it should
be harmless correctness-wise, no?

> I wonder if that code is needed after all:
> case QUAL_UNION_TYPE:
>   /* Recursively record aliases for the base classes, if there are any.  
> */
>   if (TYPE_BINFO (type))
> { 
>   int i;
>   tree binfo, base_binfo;
> 
>   for (binfo = TYPE_BINFO (type), i = 0;
>BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
> record_alias_subset (superset,
>  get_alias_set (BINFO_TYPE (base_binfo)));
> }
>   for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN (field))
> if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P (field))
>   record_alias_subset (superset, get_alias_set (TREE_TYPE (field)));
>   break;
> all bases are also fields of within the type, so the second loop should notice
> all the types seen by first loop if I am correct?
> So perhaps the loop can be dropped at first place. 

Yeah, I remember seeing that code and thinking the same multiple times.
Though I also vaguely remember that removing that loop regressed
something.  How is virtual inheritance represented in the fields?

But I'd be happy if this BINFO user would go away ;)

(similar in general for the get_alias_set langhook - with LTO we
only preserve extra alias-set zero answers from that)

Richard.

Need help: Is a VAR_DECL type builtin or not?

2014-02-14 Thread Dominik Vogt

Given a specific VAR_DECL tree node, I need to find out whether
its type is built in or not.  Up to now I have

  tree tn = TYPE_NAME (TREE_TYPE (var_decl));
  if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn))
{
  ...
}

This if-condition is true for both,

  int x;
  const int x;
  ...

and

  typedef int i_t;
  i_t x;
  const i_t x;
  ...

I need to weed out the class of VAR_DECLs that directly use built
in types.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Peter Zijlstra

On Thu, Feb 13, 2014 at 09:07:55PM -0800, Torvald Riegel wrote:
> That depends on what your goal is.

A compiler that we don't need to fight in order to generate sane code
would be nice. But as Linus said; we can continue to ignore you lot and
go on as we've done.

MSP430 in gcc4.9 ... enable interrupts?

2014-02-14 Thread Brian Drummond

I have built a crosscompiler for the MSP430, using a gcc4.9 snapshot
(gcc-4.9-20140112) and the compiler seems OK and builds a simple
"blinky" LED flashing example.

But my slightly larger example, originally built using Peter Bigot's
mspgcc backend, no longer compiles ... 

mspgcc had a number of intrinsic functions, such as __nop(), __eint()
and __dint() respectively. Calling these would execute a nop, enable and
disable interrupts respectively. 

Others such as __bis_status_register(), __bic_status_register() would
manipulate system status, low power modes etc.

Now in the MSP430 port for gcc4.9, these intrinsic functions have gone. 

Perusing the config/msp430 source files, e.g. config/msp430/msp430.md I
can see evidence that the _functionality_ is still there, e.g.

(define_insn "enable_interrupts"
  [(unspec_volatile [(const_int 0)] UNS_EINT)]
  ""
  "EINT"
  )
...
(define_insn "bis_SR"
  [(unspec_volatile [(match_operand 0 "nonmemory_operand" "ir")]
UNS_BIS_SR)]
  ""
  "BIS.W\t%0, %O0(SP)"
  )

... but how do I access it? In other words, what C code fragment would
cause the "enable_interrupts" instruction to be emitted, and generate
"EINT" in the assembler or object output?

- Brian

[RFC] Rationale for passing vectors by value in SIMD registers

2014-02-14 Thread Matthew Fortune

MIPS is currently evaluating the benefit of using SIMD registers to pass vector 
data by value. It is currently unclear how important it is for vector data to 
be passed in SIMD registers. I.e. the need for passing vector data by value in 
real world code is not immediately obvious. The performance advantage is 
therefore also unclear.

Can anyone offer insight in the rationale behind decision decisions made for 
other architectures ABIs? For example, the x86 and x86_64 calling convention 
for vector data types presumes that they will passed in SSE/AVX registers and 
raises warnings if passed when sse/avx support is not enabled. This is what 
MIPS is currently considering however there are two concerns:

1) What about the ability to create architecture/implementation independent 
APIs that may include vector types in the prototypes. Such APIs may be built 
for varying levels of hardware support to make the most of a specific 
architecture implementation but be called from otherwise implementation 
agnostic code. To support such a scenario we would need to use a common calling 
convention usable on all architecture variants.
2) Although vector types are not specifically covered by existing ABI 
definitions for MIPS we have unfortunately got a defacto standard for how to 
pass these by value. Vector types are simply considered to be small structures 
and passed as such following normal ABI rules. This is still a concern even 
though it is generally accepted that there is some room for change when it 
comes to vector data types in an existing ABI.

If anyone could offer a brief history the x86 ABI with respect to vector data 
types that may also be interesting. One question would be whether the use of 
vector registers in the calling convention was only enabled by default once 
there was a critical mass of implementations, and therefore the default ABI was 
changed to start making assumptions about the availability of features like SSE 
and AVX.

Comments from any other architecture that has had to make such changes over 
time would also be welcome.

Thanks in advance,
Matthew

Re: [RFC] Rationale for passing vectors by value in SIMD registers

2014-02-14 Thread Andrew Pinski

On Fri, Feb 14, 2014 at 2:17 AM, Matthew Fortune
 wrote:
> MIPS is currently evaluating the benefit of using SIMD registers to pass 
> vector data by value. It is currently unclear how important it is for vector 
> data to be passed in SIMD registers. I.e. the need for passing vector data by 
> value in real world code is not immediately obvious. The performance 
> advantage is therefore also unclear.
>
> Can anyone offer insight in the rationale behind decision decisions made for 
> other architectures ABIs? For example, the x86 and x86_64 calling convention 
> for vector data types presumes that they will passed in SSE/AVX registers and 
> raises warnings if passed when sse/avx support is not enabled. This is what 
> MIPS is currently considering however there are two concerns:
>
> 1) What about the ability to create architecture/implementation independent 
> APIs that may include vector types in the prototypes. Such APIs may be built 
> for varying levels of hardware support to make the most of a specific 
> architecture implementation but be called from otherwise implementation 
> agnostic code. To support such a scenario we would need to use a common 
> calling convention usable on all architecture variants.
> 2) Although vector types are not specifically covered by existing ABI 
> definitions for MIPS we have unfortunately got a defacto standard for how to 
> pass these by value. Vector types are simply considered to be small 
> structures and passed as such following normal ABI rules. This is still a 
> concern even though it is generally accepted that there is some room for 
> change when it comes to vector data types in an existing ABI.
>
> If anyone could offer a brief history the x86 ABI with respect to vector data 
> types that may also be interesting. One question would be whether the use of 
> vector registers in the calling convention was only enabled by default once 
> there was a critical mass of implementations, and therefore the default ABI 
> was changed to start making assumptions about the availability of features 
> like SSE and AVX.
>
> Comments from any other architecture that has had to make such changes over 
> time would also be welcome.

PPC and arm and AARCH64 are common targets where vectors are
passed/return via value.  The idea is simple, sometimes you have
functions like vector float vsinf(vector float a) where you want to be
faster and avoid a round trip to L1 (or even L2).  These kind of
functions are common for vector programming.  That is extending the
scalar versions to the vector versions.

Thanks,
Andrew Pinski

>
> Thanks in advance,
> Matthew
>

Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery

2014-02-14 Thread Brian Drummond

Luke A. Guest  archeia.com> writes:

> 
> Hi,
> 
> I've been over this before and have got nowhere with it. 

> Say you want to build an Ada compiler for embedded work, ...
> You can build it with "make all-gcc" and install with "make install-gcc"
> ...
> But what about the gnattools? Not buildable. A message in the ml
> archives states to build them with "make -C gcc gnattools," but this
> fails:
> ...
> You can't disable libada using the command line as that also disables
> gnattools - I don't know the reason for requiring this behaviour.
> 

Revisiting the MSP430 as it's now an official gcc target, this is still a
problem here too, so there are at least 3 targets for which it's a problem.

Looking at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19959
and comparing comments #6 and #14, perhaps this command should be 
"make -C gcc gnattools-cross" ? I'll try to do a clean build and test this
today.

But it would be better if the build "just worked" so at

> if test "${ENABLE_LIBADA}" != "yes" ; then
>   noconfigdirs="$noconfigdirs"
> fi
we need a better test here (also checking for a crosscompiler build).

> I think that we need a configure flag to disable libada and not
> gnattools for these bare board targets.

Following comment #15 in bug 19959, perhaps it's time to open a bug against
--disable-libada.

- Brian

Optimizing bit extract

2014-02-14 Thread Allan Sandfeld Jensen

Hello gcc

I have been looking at optimizations of pixel-format conversion recently and 
have noticed that gcc does take advantage of SSE4a extrq, BMI1 bextr TBM 
bextri or BMI2 pext instructions when it could be useful.

As far as I can tell it should not be that hard. A bextr expression can 
typically be recognized as ((x >> s)  & mask) or ((x << s1)) >> s2). But I am 
unsure where to do such a matching since the mask needs to have specific form 
to be valid for bextr, so it seems it needs to be done before instruction 
selection.

Secondly the bextr instruction in itself only replace two already fast 
instructions so is very minor (unless extracting variable bit-fields which is 
harder recognize).  The real optimization comes from being able to use pext 
(parallel bit extract), which can implement several bextr expressions in 
parallel. 

So, where would be the right place to implement such instructions. Would it 
make sense to recognize bextr early before we get to i386 code, or would it be 
better to recognize it late. And where do I put such instruction selection 
optimizations?

Motivating example:

unsigned rgb32_to_rgb16(unsigned rgb32) {
unsigned char red = (rgb32 >> 19) & 0x1f;
unsigned char green = (rgb32 >> 10) & 0x3f;
unsigned char blue = rgb32  & 0x1f;
   return (red << 11) | (green << 5) | blue;
}

can be implemented as pext(rgb32, 0x001f3f1f)

Best regards
`Allan Sandfeld

Re: Need help: Is a VAR_DECL type builtin or not?

2014-02-14 Thread Richard Biener

On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt  wrote:
> Given a specific VAR_DECL tree node, I need to find out whether
> its type is built in or not.  Up to now I have
>
>   tree tn = TYPE_NAME (TREE_TYPE (var_decl));
>   if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn))
> {
>   ...
> }
>
> This if-condition is true for both,
>
>   int x;
>   const int x;
>   ...
>
> and
>
>   typedef int i_t;
>   i_t x;
>   const i_t x;
>   ...
>
> I need to weed out the class of VAR_DECLs that directly use built
> in types.

Try DECL_IS_BUILTIN.  But I question how you define "builtin" here?

Richard.

> Ciao
>
> Dominik ^_^  ^_^
>
> --
>
> Dominik Vogt
> IBM Germany
>

Re: Optimizing bit extract

2014-02-14 Thread Richard Biener

On Fri, Feb 14, 2014 at 2:23 PM, Allan Sandfeld Jensen
 wrote:
> Hello gcc
>
> I have been looking at optimizations of pixel-format conversion recently and
> have noticed that gcc does take advantage of SSE4a extrq, BMI1 bextr TBM
> bextri or BMI2 pext instructions when it could be useful.
>
> As far as I can tell it should not be that hard. A bextr expression can
> typically be recognized as ((x >> s)  & mask) or ((x << s1)) >> s2). But I am
> unsure where to do such a matching since the mask needs to have specific form
> to be valid for bextr, so it seems it needs to be done before instruction
> selection.
>
> Secondly the bextr instruction in itself only replace two already fast
> instructions so is very minor (unless extracting variable bit-fields which is
> harder recognize).  The real optimization comes from being able to use pext
> (parallel bit extract), which can implement several bextr expressions in
> parallel.
>
> So, where would be the right place to implement such instructions. Would it
> make sense to recognize bextr early before we get to i386 code, or would it be
> better to recognize it late. And where do I put such instruction selection
> optimizations?
>
> Motivating example:
>
> unsigned rgb32_to_rgb16(unsigned rgb32) {
> unsigned char red = (rgb32 >> 19) & 0x1f;
> unsigned char green = (rgb32 >> 10) & 0x3f;
> unsigned char blue = rgb32  & 0x1f;
>return (red << 11) | (green << 5) | blue;
> }
>
> can be implemented as pext(rgb32, 0x001f3f1f)

We have a special pass that already deals with similar patterns,
the "bswap" pass in tree-ssa-math-opts.c.  It does symbolic
execution to produce the composition of a value.  It currently
handles byte-shifts only I think (not shifting by 19 or 10) but
this is certainly the way I'd recognize pext() (and other generic
shuffles supported by vector ISAs).

You'd have to extend the representation it uses to handle
these more arbitrary shifts/masks of course.

Richard.

> Best regards
> `Allan Sandfeld

Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery

2014-02-14 Thread Luke A. Guest

On Fri, 2014-02-14 at 12:11 +, Brian Drummond wrote:

> Revisiting the MSP430 as it's now an official gcc target, this is still a
> problem here too, so there are at least 3 targets for which it's a problem.

It's a problem for all targets.

> Looking at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19959
> and comparing comments #6 and #14, perhaps this command should be 
> "make -C gcc gnattools-cross" ? I'll try to do a clean build and test this
> today.
> 

It won't work, I've tried all those so called work arounds. They rely on
having libada built, it will complain to say that it can't be built
because there is no libada. See the attachments.

> But it would be better if the build "just worked" so at
> 
> > if test "${ENABLE_LIBADA}" != "yes" ; then
> >   noconfigdirs="$noconfigdirs"
> > fi
> we need a better test here (also checking for a crosscompiler build).

I have added one. It does work, sort of.

> > I think that we need a configure flag to disable libada and not
> > gnattools for these bare board targets.
> 
> Following comment #15 in bug 19959, perhaps it's time to open a bug against
> --disable-libada.

The attachments allow you to configure the toolchain for bare metal, no
RTS. This flag is --enable-cross-gnattools, it checks if this is a cross
compiler and if it is, disables libada, but not gnattools.

There is an issue and I don't know why it does this, but it won't allow
you to build the gnattools straight away. You have to use:

make all-gcc
make -C gnattools/ gnattools
rm gcc/stamp-tools
make -C gcc cross-gnattools
make install-gcc

This will then build.

I have not checked to see if you can still build gcc for normal hosted
with this patch, neither have I checked to make sure --disable-libada
works in hosted either.

I have built the following with these patches:

arm-none-eabi
mips-elf
msp430-elf
x86_64-elf

Not one installs libada and I have gnattools:

$ ls ~/opt/tinyada/lib/gcc/arm-none-eabi/4.9.0/
include  include-fixed  install-tools  plugin
$ ls ~/opt/tinyada/lib/gcc/mips-elf/4.9.0/
include  include-fixed  install-tools  plugin
$ ls ~/opt/tinyada/lib/gcc/msp430-elf/4.9.0/
include  include-fixed  install-tools  plugin
$ ls ~/opt/tinyada/lib/gcc/x86_64-elf/4.9.0/
include  include-fixed  install-tools  plugin

Luke.

>From f8c74c16b9f7ef3be02a9a7d3480baf88d09efd6 Mon Sep 17 00:00:00 2001
From: "Luke A. Guest" 
Date: Fri, 14 Feb 2014 13:53:27 +
Subject: [PATCH 1/2] Set the target for a bare metal environment.

---
 gnattools/configure| 9 +
 gnattools/configure.ac | 9 +
 2 files changed, 18 insertions(+)

diff --git a/gnattools/configure b/gnattools/configure
index 883b705..b5bf03d 100755
--- a/gnattools/configure
+++ b/gnattools/configure
@@ -2090,6 +2090,15 @@ case "${target}" in
 indepsw.adb>From a2b4516f93f4d99e5ddae4c1eed78f2014f0875b Mon Sep 17 00:00:00 2001
From: "Luke A. Guest" 
Date: Fri, 14 Feb 2014 13:54:29 +
Subject: [PATCH 2/2] Added --enable-cross-gnattools flag for bare metal
 environment.

---
 configure| 25 ++--
 configure.ac | 63 
 2 files changed, 61 insertions(+), 27 deletions(-)

diff --git a/configure b/configure
index 749a35e..c0d706d 100755
--- a/configure
+++ b/configure
@@ -749,6 +749,7 @@ enable_ld
 enable_libquadmath
 enable_libquadmath_support
 enable_libada
+enable_cross_gnattools
 enable_libssp
 enable_libstdcxx
 enable_static_libjava
@@ -1466,6 +1467,10 @@ Optional Features:
   --disable-libquadmath-support
   disable libquadmath support for Fortran
   --enable-libada build libada directory
+  --enable-cross-gnattools
+  Enable cross gnattools for cross-compiler for
+  freestanding environment, --disable-libada is set
+  automatically
   --enable-libssp build libssp directory
   --disable-libstdcxx do not build libstdc++-v3 directory
   --enable-static-libjava[=ARG]
@@ -3068,8 +3073,24 @@ else
   ENABLE_LIBADA=yes
 fi
 
-if test "${ENABLE_LIBADA}" != "yes" ; then
-  noconfigdirs="$noconfigdirs gnattools"
+
+# Check whether --enable-cross-gnattools was given.
+if test "${enable_cross_gnattools+set}" = set; then :
+  enableval=$enable_cross_gnattools; ENABLE_CROSS_GNATTOOLS=$enableval
+else
+  ENABLE_CROSS_GNATTOOLS=yes
+fi
+
+
+if test "${is_cross_compiler}" = "yes" && test "${ENABLE_CROSS_GNATTOOLS}" = "yes" ; then
+  if test "${target_vendor}" = "none" || test "${target_vendor}" = "unknown" ; then
+enable_libada=no
+ENABLE_LIBADA=$enable_libada
+  fi
+else
+  if test "${ENABLE_LIBADA}" != "yes" ; then
+noconfigdirs="$noconfigdirs gnattools"
+  fi
 fi
 
 # Check whether --enable-libssp was given.
diff --git a/configure.ac b/configure.ac
index b24b33d..4fcac1a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -113,11 +113,11 @@ extra_host_args=
 ### or a host dependent tool.  Then put it into the appropriate list

Re: [RFC] Offloading Support in libgomp

2014-02-14 Thread Ilya Verbin

2014-01-31 22:03 GMT+04:00 Jakub Jelinek :
> Implicit map(tofrom: a) on #pragma omp target is what the standard
> requires, so I don't see a bug on the compiler side.
> Jakub

There is an exception in the standard (page 177, lines 17-21):

> If a corresponding list item of the original list item is in the enclosing 
> device data
> environment, the new device data environment uses the corresponding list item 
> from the
> enclosing device data environment. No additional storage is allocated in the 
> new device
> data environment and neither initialization nor assignment is performed, 
> regardless of
> the map-type that is specified.

So, the pointer 'a' should inherit map-type ALLOC from the enclosing
device data environment.

  -- Ilya

Re: [RFC] Offloading Support in libgomp

2014-02-14 Thread Jakub Jelinek

On Fri, Feb 14, 2014 at 07:24:16PM +0400, Ilya Verbin wrote:
> 2014-01-31 22:03 GMT+04:00 Jakub Jelinek :
> > Implicit map(tofrom: a) on #pragma omp target is what the standard
> > requires, so I don't see a bug on the compiler side.
> > Jakub
> 
> There is an exception in the standard (page 177, lines 17-21):
> 
> > If a corresponding list item of the original list item is in the enclosing 
> > device data
> > environment, the new device data environment uses the corresponding list 
> > item from the
> > enclosing device data environment. No additional storage is allocated in 
> > the new device
> > data environment and neither initialization nor assignment is performed, 
> > regardless of
> > the map-type that is specified.
> 
> So, the pointer 'a' should inherit map-type ALLOC from the enclosing
> device data environment.

The standard itself is very unclear.  I'll cite my omp-lang mail from
September:

> Ok, I'm for now implementing this refcounted model.   
>
> One still unclear thing is what is supposed to happen if multiple host 
> threads   
> enter a target data construct mapping at least one same object with different 
>
> map kind. 
>
> Say thread A enters #pragma omp target data map(tofrom:p[:64]), then  
>
> thread B enters #pragma omp target data map(alloc:p[:64]) while thread A is   
>
> still running the body of it's target data (so, the mapping just increments   
>
> refcount of the p[:64] array section), then thread A leaves the target data   
>
> construct, decrements p[:64] refcount, but as it is non-zero, doesn't 
>
> deallocate it, and finally thread B enters end of its target data construct 
> and  
> unmaps p[:64].  The question is, when (if ever) is the array section supposed 
>
> to be copied back to host?  Shall it be done at the end of thread's A target  
>
> data section, or at the end of thread's B target data section (i.e. propagate 
>
> the flag, has at least one of the mapping's requested copy from the device to 
>
> host at the end of it's lifetime), or not copied at all?  
>
> What if thread B doesn't request the whole array section, but only a portion  
>
> thereof map(alloc:p[:32]) ?  Would it copy the whole p[:64] array section 
>
> back, or just a portion of it?  Though, admittedly, this latter case of a 
> subset 
> might be harder to construct valid non-racy testcase for, one needs to make   
>
> sure one of the target data constructs is always entered before the other;
>
> though perhaps with #pragma omp atomic and spinning it might be doable.   
>

and will just paraphrase the Sep 9th answer I got for that, because not sure 
I'm allowed
to repost it.  The answer was that on entry the standard is pretty clear what
happens, the first encountering thread/data construct allocates and optionally 
copies
based on the flags, all others when it is already mapped do nothing.  On exit, 
the
standard is silent and none of the solutions are right, the committee will 
discuss
it further.

So, for now the implementation choice was to or in the copy from device bit.

Now, you could argue this case is different, because it is not different 
threads,
but the same thread, just nested construct on the same thread.  But how to
reliably differentiate that?  Even if you stored some thread identification
into the tree along with each mapping (what thread mapped this in), what if some
other thread also does the same (outer #pragma omp target data, inner

Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery

2014-02-14 Thread Luke A. Guest

On Fri, 2014-02-14 at 15:32 +, Brian Drummond wrote:

> OK I'll take a look.
> Too many make and install targets; I have no idea how this process
> interacts with the process specified here:
> https://sourceforge.net/apps/mediawiki/mspgcc/index.php?title=Install:redhat
> for the msp430, using (in short)

Well, like I've said before that process won't work when you enable Ada,
my patches will make it work, but you need to add the extra/new flag to
configure.

> make all-host
> sudo make install-host
> (Configure and build Newlib for MSP430)
> make all-target
> sudo make install-target
> 
> Maybe that split build process actually resolves your issue...
> But who can tell? : I can't say I've found adequate documentation.

It really doesn't. 

> With the original (avr/msp430) patch/hack, gnattools were automatically
> built and installed by make all-host, make install-host.

Yes, but that hack is from avr-ada which, if you look back in this ml, I
specifically state that it breaks normal builds. My patches, in theory,
should not.

Trust me, I've been trying to build these style tools for around 10
years, on and off, and they have never built correctly for these targets
and any attempt to put in a bug report about it is a waste of time as
someone comes along states it works and closes it, even though it
doesn't.

Luke.

Issue with CRTP generation under 4.8.1

2014-02-14 Thread Andrew Stern

I created a CRTP (Curiously recurring template pattern)
and added non-static member variables to my base class and that works without
issue.  But when I add non-static
variables to the subclass instance the initialization and values for the
variables in this class don't get initialized properly and also don't
consistently keep the same values.  I
checked the sizeof the object and it does seem to contain the correct size that
is the Base member variables plus the Subclasses member variables and the this
pointer is consistently the same values. 
It's just that when dumping the values to the screen the ones in the
subclass aren't the values that I'm expecting within my template specialization
methods of the subclass.  Since multiple
instances of my object are dumping the same values regardless of the object I'm
guessing that the computation of the offset in the object for these variables
is being calculated incorrectly.  Note
that 472 should have been printed and not 0.

 

Example of bug:

 

#include 

 

struct ParamOne {

    double val
{0.0};

};

 

struct ParamTwo {

    int val {0};

};

 

template
class Baseclass {

public:

    using
subclass_type = P;

    using data_type
= Data;

    using
other_type = Other;

 

    bool Method(
const Data &data);

  

public:

  int
m_BaseClassValue { 304 };

};

 

template
using pdata_type = typename P::data_type; template using pother_type = typename P::other_type;

 

template
bool Baseclass::Method( const Data &data ) {

    P& Subclass
= static_cast( *this );

   
pother_type other;

    other.val = 11;

 

    return
Subclass.SubclassMethod( data, other ); }

 

template

class Subclass : public Baseclass, Data, Other> {

public:

    using data_type
= Data;

    using
other_type = Other;

 

    bool
SubclassMethod( const Data &data, Other &other );

  

public:

  int
m_SubClassValue { 472 };

};

 

template

bool Subclass::SubclassMethod( const
Data &data, Other &other ) {

    return true;

}

 

template<>

bool Subclass::SubclassMethod(
const ParamOne &data, ParamTwo &other ) {

    printf(
"The this pointer is %lx with a size of %ld and values of %d and
%d\n", (long)this, sizeof(*this), m_BaseClassValue, m_SubClassValue );

    return true;

}

 

int main(int argc, char **argv)

{

    ParamOne one;

    one.val = 5.0;

 

    Baseclass, ParamOne, ParamTwo> test;

 

   
test.Method(one);

    return 0;

}

 

Output is:

 

The this pointer is 7fffc4e87670 with a size of 8 and
values of 304 and 0. 

 

Compile options are: -g;-Wall;-std=c++11;-O0

Re: Issue with CRTP generation under 4.8.1

2014-02-14 Thread Jonathan Wakely

On 14 February 2014 07:46, Andrew Stern wrote:
> I created a CRTP (Curiously recurring template pattern)
> and added non-static member variables to my base class and that works without
> issue.

This mailing list is for development of GCC, not for bug reports or
help using it.

I would suggest the gcc-help list instead, but you should consider
reformatting your code before you ask anyone to look at it, so it is
readable. And maybe not have variables with the same names as class
templates.

Re: TYPE_BINFO and canonical types at LTO

2014-02-14 Thread Jan Hubicka

> > This smells bad, since it is given a canonical type that is after the
> > structural equivalency merging that ignores BINFOs, so it may be completely
> > different class with completely different bases than the original.  Bases 
> > are
> > structuraly merged, too and may be exchanged for normal fields because
> > DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part of
> > the canonical type definition in LTO.
> 
> Can you elaborate on that DECL_ARTIFICIAL thing?  That is, what is broken
> by considering all fields during that merging?

To make the code work with LTO, one can not merge 
struct B {struct A a}
struct B: A {}

these IMO differ only by DECL_ARTIFICIAL flag on the fields.
> 
> Note that the BINFO walk below only adds extra aliasing - it should
> be harmless correctness-wise, no?

If it is needed for the second case, then it we will produce wrong code with LTO
when we merge first and second case toghetr.  If it is not needed, then we are 
safe
but we don't really need the loop then.

Having the testcase where the BINFO walk is needed for corectness, I can turn
it into wrong code in LTO by interposing the class by structure.

I am rebuilding firefox with sanity checking that the second loop never adds 
anything
useful. Lets see.
> 
> > I wonder if that code is needed after all:
> > case QUAL_UNION_TYPE:
> >   /* Recursively record aliases for the base classes, if there are any. 
> >  */
> >   if (TYPE_BINFO (type))
> > { 
> >   int i;
> >   tree binfo, base_binfo;
> > 
> >   for (binfo = TYPE_BINFO (type), i = 0;
> >BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
> > record_alias_subset (superset,
> >  get_alias_set (BINFO_TYPE (base_binfo)));
> > }
> >   for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN 
> > (field))
> > if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P 
> > (field))
> >   record_alias_subset (superset, get_alias_set (TREE_TYPE (field)));
> >   break;
> > all bases are also fields of within the type, so the second loop should 
> > notice
> > all the types seen by first loop if I am correct?
> > So perhaps the loop can be dropped at first place. 
> 
> Yeah, I remember seeing that code and thinking the same multiple times.
> Though I also vaguely remember that removing that loop regressed
> something.  How is virtual inheritance represented in the fields?

struct A {int a;};
struct B: virtual A {};
struct C: virtual A {};
struct D: B, C {};
struct D *d;
struct B *b;
int t(void)
{
  d->a=1;
  return b->a;
}

I srepresented as:

  constant 192>
unit size  constant 24>
align 64 symtab 0 alias set 6 canonical type 0x76c58000
fields 
unit size 
align 64 symtab 0 alias set 3 canonical type 0x76c452a0 fields 
 context 
full-name "struct B"
needs-constructor X() X(constX&) this=(X&) n_parents=1 
use_template=0 interface-unknown
pointer_to_this  chain >
ignored decl_6 BLK file t.C line 4 col 8
size 
unit size 
align 64 offset_align 128
offset 
bit offset  context 
chain 
ignored decl_6 BLK file t.C line 4 col 8 size  unit size 
align 64 offset_align 128 offset  bit 
offset  context  
chain >> context 
full-name "struct D"
needs-constructor X() X(constX&) this=(X&) n_parents=2 use_template=0 
interface-unknown
pointer_to_this  chain >


You my end up with A being a field, too, but that only bypas the recursion.

> 
> But I'd be happy if this BINFO user would go away ;)

Me too, indeed.
> 
> (similar in general for the get_alias_set langhook - with LTO we
> only preserve extra alias-set zero answers from that)

I see, I was surprised we construct alias sets at LTO time, I believed we
always stream them in&out.

Honza
> 
> Richard.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Paul E. McKenney

On Thu, Feb 13, 2014 at 08:43:01PM -0800, Torvald Riegel wrote:
> On Thu, 2014-02-13 at 18:01 -0800, Paul E. McKenney wrote:

[ . . . ]

> > Another option would be to flag the conditional expression, prohibiting
> > the compiler from optimizing out any conditional branches.  Perhaps
> > something like this:
> > 
> > r1 = atomic_load(x, memory_order_control);
> > if (control_dependency(r1))
> > atomic_store(y, memory_order_relaxed);
> 
> That's the one I had in mind and talked to you about earlier today.  My
> gut feeling is that this is preferably over the other because it "marks"
> the if-statement, so the compiler knows exactly which branches matter.
> I'm not sure one would need the other memory order for that, if indeed
> all you want is relaxed -> branch -> relaxed.  But maybe there are
> corner cases (see the weaker-than-relaxed discussion in SG1 today).

Linus, Peter, any objections to marking places where we are relying on
ordering from control dependencies against later stores?  This approach
seems to me to have significant documentation benefits.

Thanx, Paul

Re: [RFC] Offloading Support in libgomp

2014-02-14 Thread Richard Henderson

On 02/14/2014 07:43 AM, Jakub Jelinek wrote:
> So, perhaps we should just stop for now oring the copyfrom in and just use
> the copyfrom from the very first mapping only, and wait for what the committee
> actually agrees on.
> 
> Richard, your thoughts on this?

I think stopping the or'ing until the issue is resolved is the best plan.


r~

Re: MSP430 in gcc4.9 ... enable interrupts?

2014-02-14 Thread DJ Delorie


The constructs in the *.md files are for the compiler's internal use
(i.e. there are function attributes that trigger those).  You don't
need compiler support for these opcodes at the user level; the right
way is to implement those builtins as inline assembler in a common
header file:

static inline __attribute__((always_inline))
void __nop()
{
  asm volatile ("NOP");
}

static inline __attribute__((always_inline))
void __eint()
{
  asm volatile ("EINT");
}


Or more simply:

#define __eint() asm("EINT")
#define __nop() asm("NOP")


For opcodes with parameters, you use a more complex form of inline
assembler:

static inline __attribute__((always_inline))
void BIC_SR(const int x)
{
  asm volatile ("BIC.W %0,R2" :: "i" (x));
}

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Torvald Riegel

On Fri, 2014-02-14 at 10:50 +0100, Peter Zijlstra wrote:
> On Thu, Feb 13, 2014 at 09:07:55PM -0800, Torvald Riegel wrote:
> > That depends on what your goal is.

First, I don't know why you quoted that, but without the context,
quoting it doesn't make sense.  Let me repeat the point.  The standard
is the rule set for the compiler.  Period.  The compiler does not just
serve the semantics that you might have in your head.  It does have to
do something meaningful for all of its users.  Thus, the goal for the
compiler is to properly compile programs in the language as specified.

If there is a deficiency in the standard (bug or missing feature) -- and
thus the specification, we need to have a patch for the standard that
fixes this deficiency.  If you think that this is the case, that's where
you fix it.

If your goal is to do wishful thinking, imagine some kind of semantics
in your head, and then assume that magically, implementations will do
just that, then that's bound to fail.

> A compiler that we don't need to fight in order to generate sane code
> would be nice. But as Linus said; we can continue to ignore you lot and
> go on as we've done.

I don't see why it's so hard to understand that you need to specify
semantics, and the place (or at least the base) for that is the
standard.  Aren't you guys the ones replying "send a patch"? :)

This isn't any different.  If you're uncomfortable working with the
standard, then say so, and reach out to people that aren't.

You can surely ignore the specification of the language(s) that you are
depending on.  But that won't help you.  If you want a change, get
involved.  (Oh, and claiming that the other side doesn't get it doesn't
count as getting involved.)

There's no fight between people here.  It's just a technical problem
that we have to solve in the right way.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Torvald Riegel

On Fri, 2014-02-14 at 09:29 -0800, Paul E. McKenney wrote:
> On Thu, Feb 13, 2014 at 08:43:01PM -0800, Torvald Riegel wrote:
> > On Thu, 2014-02-13 at 18:01 -0800, Paul E. McKenney wrote:
> 
> [ . . . ]
> 
> > > Another option would be to flag the conditional expression, prohibiting
> > > the compiler from optimizing out any conditional branches.  Perhaps
> > > something like this:
> > > 
> > >   r1 = atomic_load(x, memory_order_control);
> > >   if (control_dependency(r1))
> > >   atomic_store(y, memory_order_relaxed);
> > 
> > That's the one I had in mind and talked to you about earlier today.  My
> > gut feeling is that this is preferably over the other because it "marks"
> > the if-statement, so the compiler knows exactly which branches matter.
> > I'm not sure one would need the other memory order for that, if indeed
> > all you want is relaxed -> branch -> relaxed.  But maybe there are
> > corner cases (see the weaker-than-relaxed discussion in SG1 today).
> 
> Linus, Peter, any objections to marking places where we are relying on
> ordering from control dependencies against later stores?  This approach
> seems to me to have significant documentation benefits.

Let me note that at least as I'm concerned, that's just a quick idea.
At least I haven't looked at (1) how to properly specify the semantics
of this, (2) whether it has any bad effects on unrelated code, (3) and
whether there are pitfalls for compiler implementations.  It looks not
too bad at first glance, though.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Linus Torvalds

On Fri, Feb 14, 2014 at 9:29 AM, Paul E. McKenney
 wrote:
>
> Linus, Peter, any objections to marking places where we are relying on
> ordering from control dependencies against later stores?  This approach
> seems to me to have significant documentation benefits.

Quite frankly, I think it's stupid, and the "documentation" is not a
benefit, it's just wrong.

How would you figure out whether your added "documentation" holds true
for particular branches but not others?

How could you *ever* trust a compiler that makes the dependency meaningful?

Again, let's keep this simple and sane:

 - if a compiler ever generates code where an atomic store movement is
"visible" in any way, then that compiler is broken shit.

I don't understand why you even argue this. Seriously, Paul, you seem
to *want* to think that "broken shit" is acceptable, and that we
should then add magic markers to say "now you need to *not* be broken
shit".

Here's a magic marker for you: DON'T USE THAT BROKEN COMPILER.

And if a compiler can *prove* that whatever code movement it does
cannot make a difference, then let it do so. No amount of
"documentation" should matter.

Seriously, this whole discussion has been completely moronic. I don't
understand why you even bring shit like this up:

> > r1 = atomic_load(x, memory_order_control);
> > if (control_dependency(r1))
> > atomic_store(y, memory_order_relaxed);

I mean, really? Anybody who writes code like that, or any compiler
where that "control_dependency()" marker makes any difference
what-so-ever for code generation should just be retroactively aborted.

There is absolutely *zero* reason for that "control_dependency()"
crap. If you ever find a reason for it, it is either because the
compiler is buggy, or because the standard is so shit that we should
never *ever* use the atomics.

Seriously. This thread has devolved into some kind of "just what kind
of idiotic compiler cesspool crap could we accept". Get away from that
f*cking mindset. We don't accept *any* crap.

Why are we still discussing this idiocy? It's irrelevant. If the
standard really allows random store speculation, the standard doesn't
matter, and sane people shouldn't waste their time arguing about it.

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Linus Torvalds

On Fri, Feb 14, 2014 at 11:50 AM, Linus Torvalds
 wrote:
>
> Why are we still discussing this idiocy? It's irrelevant. If the
> standard really allows random store speculation, the standard doesn't
> matter, and sane people shouldn't waste their time arguing about it.

Btw, the other part of this coin is that our manual types (using
volatile and various architecture-specific stuff) and our manual
barriers and inline asm accesses are generally *fine*.

The C11 stuff doesn't buy us anything. The argument that "new
architectures might want to use it" is prue and utter bollocks, since
unless the standard gets the thing *right*, nobody sane would ever use
it for some new architecture, when the sane thing to do is to just
fill in the normal barriers and inline asms.

So I'm very very serious: either the compiler and the standard gets
things right, or we don't use it. There is no middle ground where "we
might use it for one or two architectures and add random hints".
That's just stupid.

The only "middle ground" is about which compiler version we end up
trusting _if_ it turns out that the compiler and standard do get
things right. From Torvald's explanations (once I don't mis-read them
;), my take-away so far has actually been that the standard *does* get
things right, but I do know from over-long personal experience that
compiler people sometimes want to be legalistic and twist the
documentation to the breaking point, at which point we just go "we'd
be crazy do use that".

See our use of "-fno-strict-aliasing", for example. The C standard
aliasing rules are a mistake, stupid, and wrong, and gcc uses those
stupid type-based alias rules even when statically *proving* the
aliasing gives the opposite result. End result: we turn the shit off.

Exact same deal wrt atomics. We are *not* going to add crazy "this
here is a control dependency" crap. There's a test, the compiler
*sees* the control dependency for chrissake, and it still generates
crap, we turn that broken "optimization" off. It really is that
simple.

Linus

Re: MSP430 in gcc4.9 ... enable interrupts?

2014-02-14 Thread Brian Drummond

On Fri, 2014-02-14 at 14:17 -0500, DJ Delorie wrote:
> The constructs in the *.md files are for the compiler's internal use
> (i.e. there are function attributes that trigger those).  You don't
> need compiler support for these opcodes at the user level; the right
> way is to implement those builtins as inline assembler 

> static inline __attribute__((always_inline))
> void __nop()
> {
>   asm volatile ("NOP");
> }

Thanks for the answer. I thought I was missing something, but apparently
not. Now it's clear that inline assembler is the official way, I can
adapt that approach to my situation. 

- Brian

Re: TYPE_BINFO and canonical types at LTO

2014-02-14 Thread Jan Hubicka

> > > This smells bad, since it is given a canonical type that is after the
> > > structural equivalency merging that ignores BINFOs, so it may be 
> > > completely
> > > different class with completely different bases than the original.  Bases 
> > > are
> > > structuraly merged, too and may be exchanged for normal fields because
> > > DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part 
> > > of
> > > the canonical type definition in LTO.
> > 
> > Can you elaborate on that DECL_ARTIFICIAL thing?  That is, what is broken
> > by considering all fields during that merging?
> 
> To make the code work with LTO, one can not merge 
> struct B {struct A a}
> struct B: A {}
> 
> these IMO differ only by DECL_ARTIFICIAL flag on the fields.
> > 
> > Note that the BINFO walk below only adds extra aliasing - it should
> > be harmless correctness-wise, no?
> 
> If it is needed for the second case, then it we will produce wrong code with 
> LTO
> when we merge first and second case toghetr.  If it is not needed, then we 
> are safe
> but we don't really need the loop then.
> 
> Having the testcase where the BINFO walk is needed for corectness, I can turn
> it into wrong code in LTO by interposing the class by structure.
> 
> I am rebuilding firefox with sanity checking that the second loop never adds 
> anything
> useful. Lets see.

The code really seems to be adding only cases of zero sized classes.  I use the
following hack in my tree. I do not know how to discover zero sized class, so I
test of unit size 1, but I think if there was other cases, we would notice
anyway.

The reason why I am looking into is that because I am trying to evaulate how
expensive BINFOs are. I am trying to keep only those that matters for debug 
info and devirt,
and avoid streaming others.

Honza

Index: alias.c
===
--- alias.c (revision 20)
+++ alias.c (working copy)
@@ -995,20 +996,40 @@ record_component_aliases (tree type)
 case RECORD_TYPE:
 case UNION_TYPE:
 case QUAL_UNION_TYPE:
-  /* Recursively record aliases for the base classes, if there are any.  */
-  if (TYPE_BINFO (type))
-   {
- int i;
- tree binfo, base_binfo;
-
- for (binfo = TYPE_BINFO (type), i = 0;
-  BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
-   record_alias_subset (superset,
-get_alias_set (BINFO_TYPE (base_binfo)));
-   }
-  for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN (field))
-   if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P (field))
- record_alias_subset (superset, get_alias_set (TREE_TYPE (field)));
+  {
+#ifdef ENABLE_CHECKING
+bitmap_obstack my_obstack;
+bitmap_obstack_initialize (&my_obstack); 
+   bitmap added = BITMAP_ALLOC (&my_obstack);
+#endif
+   alias_set_type t;
+   for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN (field))
+ if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P (field))
+   {
+ t = get_alias_set (TREE_TYPE (field));
+#ifdef ENABLE_CHECKING
+ bitmap_set_bit (added, t);
+#endif
+ record_alias_subset (superset, t);
+   }
+#ifdef ENABLE_CHECKING
+   /* Recursively record aliases for the base classes, if there are any.  
*/
+   if (!in_lto_p && TYPE_BINFO (type))
+ {
+   int i;
+   tree binfo, base_binfo;
+
+   for (binfo = TYPE_BINFO (type), i = 0;
+BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
+   if (!bitmap_bit_p (added, get_alias_set (BINFO_TYPE 
(base_binfo
+   {
+ gcc_assert (integer_onep (TYPE_SIZE_UNIT (BINFO_TYPE 
(base_binfo;
+   }
+ }
+   BITMAP_FREE (added);
+  bitmap_obstack_release (&my_obstack);
+#endif
+  }
   break;
 
 case COMPLEX_TYPE:

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Paul E. McKenney

On Fri, Feb 14, 2014 at 12:02:23PM -0800, Linus Torvalds wrote:
> On Fri, Feb 14, 2014 at 11:50 AM, Linus Torvalds
>  wrote:
> >
> > Why are we still discussing this idiocy? It's irrelevant. If the
> > standard really allows random store speculation, the standard doesn't
> > matter, and sane people shouldn't waste their time arguing about it.
> 
> Btw, the other part of this coin is that our manual types (using
> volatile and various architecture-specific stuff) and our manual
> barriers and inline asm accesses are generally *fine*.
> 
> The C11 stuff doesn't buy us anything. The argument that "new
> architectures might want to use it" is prue and utter bollocks, since
> unless the standard gets the thing *right*, nobody sane would ever use
> it for some new architecture, when the sane thing to do is to just
> fill in the normal barriers and inline asms.
> 
> So I'm very very serious: either the compiler and the standard gets
> things right, or we don't use it. There is no middle ground where "we
> might use it for one or two architectures and add random hints".
> That's just stupid.
> 
> The only "middle ground" is about which compiler version we end up
> trusting _if_ it turns out that the compiler and standard do get
> things right. From Torvald's explanations (once I don't mis-read them
> ;), my take-away so far has actually been that the standard *does* get
> things right, but I do know from over-long personal experience that
> compiler people sometimes want to be legalistic and twist the
> documentation to the breaking point, at which point we just go "we'd
> be crazy do use that".
> 
> See our use of "-fno-strict-aliasing", for example. The C standard
> aliasing rules are a mistake, stupid, and wrong, and gcc uses those
> stupid type-based alias rules even when statically *proving* the
> aliasing gives the opposite result. End result: we turn the shit off.
> 
> Exact same deal wrt atomics. We are *not* going to add crazy "this
> here is a control dependency" crap. There's a test, the compiler
> *sees* the control dependency for chrissake, and it still generates
> crap, we turn that broken "optimization" off. It really is that
> simple.

>From what I can see at the moment, the standard -generally- avoids
speculative stores, but there are a few corner cases where it might
allow them.  I will be working with the committee to see exactly what the
situation is.  Might be that I am confused and that everything really
is OK, might be that I am right but the corner cases are things that
no sane kernel developer would do anyway, it might be that the standard
needs a bit of repair, or it might be that the corner cases are somehow
inherent and problematic (but I hope not!).  I will let you know what
I find, but it will probably be a few months.

In the meantime, agreed, we keep doing what we have been doing.  And
maybe in the long term as well, for that matter.

One way of looking at the discussion between Torvald and myself would be
as a seller (Torvald) and a buyer (me) haggling over the fine print in
a proposed contract (the standard).  Whether that makes you feel better
or worse about the situation I cannot say.  ;-)

Thanx, Paul

Re: asking your advice about bug

2014-02-14 Thread Tobias Grosser


On 02/12/2014 11:51 AM, Roman Gareev wrote:

Hi Roman,

thanks for the quick feedback!


I've found out that this bug appeared in revision 189156 
(svn://gcc.gnu.org/svn/gcc/trunk)
and similar error message appeared in revision 191757 
(svn://gcc.gnu.org/svn/gcc/trunk)
(maybe it's because of changes in diagnostic.c). If
subtract_commutative_associative_deps, a function located in
gcc/graphite-dependences.c, is commented out, the error will disappear . I
am trying to find a bug in this function now. *Could you *please answer a
few questions about it?

1) Where can I find the algorithm for finding associative commutative
reduction, which was used in subtract_commutative_associative_deps?


It seems as if there is no such description available. This is a problem 
by itself and we should probably add documentation about what is going 
on. Unfortunately, I did not write the code and I also don't really get 
what it is doing. The intuition seems to be that the dependences between 
a set of reduction statements are computed and those dependences are 
then removed from the overall set of dependences.
This is in general a good idea, but the code could need some 
improvements and fixes. Several things look shady here. Here one example


1) We only remove dependences, but we should also add new ones

Let 'a->a' be reduction dependences, then dependences between 'a'
can only be removed in case we add new dependences between 'b1' and all
'a' and all 'a' and 'b2.

b1
|
a -> a -> a
  |
  b2

This should probably be fixed, but I don't think this is the problem of 
the current bug report. In fact, to fix the bug report, I don't even 
think we need to understand the full algorithm. The first question to 
ask is:


Why are we segfaulting? Which statement is causing the segfault?



2) What is the number returned by isl_union_map_compute_flow? (I haven't
found its description in “Integer Set Library: Manual”)

3) I've found the following terms in subtract_commutative_associative_deps:
“may accesses”, “must access”. “Integer Set Library: Manual” gives the
following definition: «If any of the source accesses are marked as being
may accesses, then there will be a dependence to the last must access and
to any may access that follows this last must access». *Could you *please 
describe
their meaning? Are they related to transitively-covered dependences?


Thanks Sven for answering those.

Tobias

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Linus Torvalds

On Fri, Feb 14, 2014 at 6:08 PM, Paul E. McKenney
 wrote:
>
> One way of looking at the discussion between Torvald and myself would be
> as a seller (Torvald) and a buyer (me) haggling over the fine print in
> a proposed contract (the standard).  Whether that makes you feel better
> or worse about the situation I cannot say.  ;-)

Oh, I'm perfectly fine with that. But we're not desperate to buy, and
I actually think the C11 people are - or at least should be - *way*
more desperate to sell their model to us than we are to take it.

Which means that as a buyer you should say "this is what we want, if
you don't give us this, we'll just walk away". Not try to see how much
we can pay for it. Because there is very little upside for us, and
_unless_ the C11 standard gets things right it's just extra complexity
for us, coupled with new compiler fragility and years of code
generation bugs.

Why would we want that extra complexity and inevitable compiler bugs?
If we then have to fight compiler writers that point to the standard
and say "..but look, the standard says we can do this", then at that
point it went from "extra complexity and compiler bugs" to a whole
'nother level of frustration and pain.

So just walk away unless the C11 standard gives us exactly what we
want. Not "something kind of like what we'd use". EXACTLY. Because I'm
not in the least interested in fighting compiler people that have a
crappy standard they can point to. Been there, done that, got the
T-shirt and learnt my lesson.

And the thing is, I suspect that the Linux kernel is the most complete
- and most serious - user of true atomics that the C11 people can sell
their solution to.

If we don't buy it, they have no serious user. Sure, they'll have lots
of random other one-off users for their atomics, where each user wants
one particular thing, but I suspect that we'll have the only really
unified portable code base that handles pretty much *all* the serious
odd cases that the C11 atomics can actually talk about to each other.

Oh, they'll push things through with or without us, and it will be a
collection of random stuff, where they tried to please everybody, with
particularly compiler/architecture people who have no f*cking clue
about how their stuff is used pushing to make it easy/efficient for
their particular compiler/architecture.

But we have real optimized uses of pretty much all relevant cases that
people actually care about.

We can walk away from them, and not really lose anything but a small
convenience (and it's a convenience *only* if the standard gets things
right).

And conversely, the C11 people can walk away from us too. But if they
can't make us happy (and by "make us happy", I really mean no stupid
games on our part) I personally think they'll have a stronger
standard, and a real use case, and real arguments. I'm assuming they
want that.

That's why I complain when you talk about things like marking control
dependencies explicitly. That's *us* bending over backwards. And as a
buyer, we have absolutely zero reason to do that.

Tell the C11 people: "no speculative writes". Full stop. End of story.
Because we're not buying anything else.

Similarly, if we need to mark atomics "volatile", then now the C11
atomics are no longer even a "small convenience", now they are just
extra complexity wrt what we already have. So just make it clear that
if the C11 standard needs to mark atomics volatile in order to get
non-speculative and non-reloading behavior, then the C11 atomics are
useless to us, and we're not buying.

Remember: a compiler can *always* do "as if" optimizations - if a
compiler writer can prove that the end result acts 100% the same using
an optimized sequence, then they can do whatever the hell they want.
That's not the issue. But if we can *ever* see semantic impact of
speculative writes, the compiler is buggy, and the compiler writers
need to be aware that it is buggy. No ifs, buts, maybes about it.

So I'm perfectly fine with you seeing yourself as a buyer. But I want
you to be a really *picky* and anal buyer - one that knows he has the
upper hand, and can walk away with no downside.

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Linus Torvalds

On Fri, Feb 14, 2014 at 6:44 PM, Linus Torvalds
 wrote:
>
> And conversely, the C11 people can walk away from us too. But if they
> can't make us happy (and by "make us happy", I really mean no stupid
> games on our part) I personally think they'll have a stronger
> standard, and a real use case, and real arguments. I'm assuming they
> want that.

I should have somebody who proof-reads my emails before I send them out.

I obviously meant "if they *can* make us happy" (not "can't").

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Paul E. McKenney

On Fri, Feb 14, 2014 at 06:48:02PM -0800, Linus Torvalds wrote:
> On Fri, Feb 14, 2014 at 6:44 PM, Linus Torvalds
>  wrote:
> >
> > And conversely, the C11 people can walk away from us too. But if they
> > can't make us happy (and by "make us happy", I really mean no stupid
> > games on our part) I personally think they'll have a stronger
> > standard, and a real use case, and real arguments. I'm assuming they
> > want that.
> 
> I should have somebody who proof-reads my emails before I send them out.
> 
> I obviously meant "if they *can* make us happy" (not "can't").

Understood.  My next step is to take a more detailed look at the piece
of the standard that should support RCU.  Depending on how that turns
out, I might look at other parts of the standard vs. Linux's atomics
and memory-ordering needs.  Should be interesting.  ;-)

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Paul E. McKenney

On Fri, Feb 14, 2014 at 10:35:44PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 14, 2014 at 06:48:02PM -0800, Linus Torvalds wrote:
> > On Fri, Feb 14, 2014 at 6:44 PM, Linus Torvalds
> >  wrote:
> > >
> > > And conversely, the C11 people can walk away from us too. But if they
> > > can't make us happy (and by "make us happy", I really mean no stupid
> > > games on our part) I personally think they'll have a stronger
> > > standard, and a real use case, and real arguments. I'm assuming they
> > > want that.
> > 
> > I should have somebody who proof-reads my emails before I send them out.
> > 
> > I obviously meant "if they *can* make us happy" (not "can't").
> 
> Understood.  My next step is to take a more detailed look at the piece
> of the standard that should support RCU.  Depending on how that turns
> out, I might look at other parts of the standard vs. Linux's atomics
> and memory-ordering needs.  Should be interesting.  ;-)

And perhaps a better way to represent the roles is that I am not the
buyer, but rather the purchasing agent for the -potential- buyer.  -You-
are of course the potential buyer.

If I were to see myself as the buyer, then I must confess that the
concerns you implicitly expressed in your prior email would be all too
well-founded!

Thanx, Paul

Re: TYPE_BINFO and canonical types at LTO

Re: TYPE_BINFO and canonical types at LTO

Need help: Is a VAR_DECL type builtin or not?

Re: [RFC][PATCH 0/5] arch: atomic rework

MSP430 in gcc4.9 ... enable interrupts?

[RFC] Rationale for passing vectors by value in SIMD registers

Re: [RFC] Rationale for passing vectors by value in SIMD registers

Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery

Optimizing bit extract

Re: Need help: Is a VAR_DECL type builtin or not?

Re: Optimizing bit extract

Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery

Re: [RFC] Offloading Support in libgomp

Re: [RFC] Offloading Support in libgomp

Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery

Issue with CRTP generation under 4.8.1

Re: Issue with CRTP generation under 4.8.1

Re: TYPE_BINFO and canonical types at LTO

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC] Offloading Support in libgomp

Re: MSP430 in gcc4.9 ... enable interrupts?

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: MSP430 in gcc4.9 ... enable interrupts?

Re: TYPE_BINFO and canonical types at LTO

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: asking your advice about bug

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

33 matches

Site Navigation

Mail list logo

Footer information