Re: [RFC] Meta-description for tree and gimple folding

2014-02-28 Thread Basile Starynkevitch
On Thu, 2014-02-27 at 15:34 +0100, Richard Biener wrote:
> 
> I've been hacking on a prototype that generates matching and 
> simplification code from a meta-description. 

For what it is worth, MELT has a similar feature. http://gcc-melt.org/

regards

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***




Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Eric Botcazou
> I think part of the problem is that some parts of GCC (like the one you
> noted) are far more conservative than others.  E.g. take:
> 
>   void foo (int x, int *y)
>   {
> y[0] = x + 1;
> asm volatile ("# asm");
> y[1] = x + 1;
>   }
> 
> The extra-paranoid check you pointed out means that we assume that
> x + 1 is no longer available after the asm for rtx-level CSE, but take
> the opposite view for tree-level CSE, which happily optimises away the
> second +.

Right, I think that it should be possible to (partially) achieve this for RTL 
CSE without totally gettting rid of the invalidation.

> One of the big grey areas is what should happen for floating-point ops
> that depend on the current rounding mode.  That isn't really modelled
> properly yet though.  Again, it affects calls as well as volatile asms.

There is an explicit comment about this in the scheduler:

case ASM_OPERANDS:
case ASM_INPUT:
  {
/* Traditional and volatile asm instructions must be considered to use
   and clobber all hard registers, all pseudo-registers and all of
   memory.  So must TRAP_IF and UNSPEC_VOLATILE operations.

   Consider for instance a volatile asm that changes the fpu rounding
   mode.  An insn should not be moved across this even if it only uses
   pseudo-regs because it might give an incorrectly rounded result.  */
if (code != ASM_OPERANDS || MEM_VOLATILE_P (x))
  reg_pending_barrier = TRUE_BARRIER;

-- 
Eric Botcazou


Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Eric Botcazou
>  So, the main question is not about triggering condition, but about the
> behavior itself. Is it correct to flush and reload all constants ? They are
> constants after all, they are even not stored in .data section but inlined
> in the code, and thus cannot be modified.

I'd think that everyone more or less agrees that this behavior of RTL CSE is 
overly conservative and ought to be relaxed.  The question is whether this can 
be done in a sensible way without totally getting rid of the barrier.

-- 
Eric Botcazou


Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Richard Biener
On Fri, Feb 28, 2014 at 9:51 AM, Eric Botcazou  wrote:
>>  So, the main question is not about triggering condition, but about the
>> behavior itself. Is it correct to flush and reload all constants ? They are
>> constants after all, they are even not stored in .data section but inlined
>> in the code, and thus cannot be modified.
>
> I'd think that everyone more or less agrees that this behavior of RTL CSE is
> overly conservative and ought to be relaxed.  The question is whether this can
> be done in a sensible way without totally getting rid of the barrier.

Of course if the GIMPLE level doesn't care about the barrier then it doesn't
make sense to be overly conservative at the RTL CSE level.  Thus I think
we can just remove this barrier completely.

[GIMPLE of course respects the volatile in that it doesn't remove the asm
and it respects the "memory" clobber as a barrier for memory optimizations
(but it does _not_ consider it a memory barrier for automatic variables that
do not have their address taken!)]

Richard.

> --
> Eric Botcazou


Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Georg-Johann Lay

Am 02/27/2014 06:03 PM, schrieb Richard Sandiford:

Yury Gribov  writes:

Richard Biener wrote:

If this behavior is not intended, what would be the best way to fix
performance? I could teach GCC to not remove constant RTXs in
flush_hash_table() but this is probably very naive and won't cover some
corner-cases.


That could be a good starting point though.


Though with modifying "machine state" you can modify constants as well, no?


Valid point but this would mean relying on compiler to always load all
constants from memory (instead of, say, generating them via movhi/movlo)
for a piece of code which looks extremely unstable.


Right.  And constant rtx codes have mode-independent semantics.
(const_int 1) is always 1, whatever a volatile asm does.  Same for
const_double, symbol_ref, label_ref, etc.  If a constant load is implemented
using some mode-dependent operation then it would need to be represented
as something like an unspec instead.  But even then, the result would
usually be annotated with a REG_EQUAL note giving the value of the final
register result.  It should be perfectly OK to reuse that register after
a volatile asm if the value in the REG_EQUAL note is needed again.


What is the general attitude towards volatile asm? Are people interested
in making it more defined/performant or should we just leave this can of
worms as is? I can try to improve generated code but my patches will be
doomed if there is no consensus on what volatile asm actually means...


I think part of the problem is that some parts of GCC (like the one you
noted) are far more conservative than others.  E.g. take:

   void foo (int x, int *y)
   {
 y[0] = x + 1;
 asm volatile ("# asm");
 y[1] = x + 1;
   }

The extra-paranoid check you pointed out means that we assume that
x + 1 is no longer available after the asm for rtx-level CSE, but take
the opposite view for tree-level CSE, which happily optimises away the
second +.

Some places were (maybe still are) worried that volatile asms could
clobber any register they like.  But the register allocator assumes that
registers are preserved across volatile asms unless explicitly clobbered.
And AFAIK it always has.  So in the above example we get:

 addl$1, %edi
 movl%edi, (%rsi)
#APP
# 4 "/tmp/foo.c" 1
 # asm
# 0 "" 2
#NO_APP
 movl%edi, 4(%rsi)
 ret

with %edi being live across the asm.

We do nothing this draconian for a normal function call, which could
easily use a volatile asm internally.  IMO anything that isn't flushed
for a call shouldn't be flushed for a volatile asm either.


Disagree here.

Just take the example I gave in my other main, worked out a bit here:

extern double __attribute__((const)) costly_func (double);
extern void func (void);

extern double X;

void code1_func (double x)
{
 func ();
 X = costly_func (x);
}

void code2_func (double x)
{
 double temp.1 = costly_func(x);
 func ();
 X = temp.1;
}

void code1_asm (double x)
{
 asm volatile ("disable interrupts":::"memory");
 // some atomic operations not involving x or X or costly_func
 asm volatile ("re-enable interrupts":::"memory");

 X = costly_func (x);
}

void code2_asm (double x)
{
 asm volatile ("disable interrupts":::"memory");
 double temp.1 = costly_func(x);
 // some atomic operations not involving x or X or costly_func
 asm volatile ("re-enable interrupts":::"memory");

 X = temp.1;
}

costly_func can be moved across func as usual, i.e. the transformation from 
code1_func to code2_func is in all right.


But this is no more the case if we move costly_func across an asm barrier 
instead of across an ordinary function.


Now observe the impact on interrupt respond times.  This is a side effect not 
covered by the standards' notion of "side effects". But it is part of "any 
weird side effect" than asm barrier can have.


WHereas code1_asm has no big impact in interrupt respond times (provided the 
code in comments is small), code2_asm is disastrous w.r.t. interrupt respond times.


Notice that in code1, func might contain such asm-pairs to implement atomic 
operations, but moving costly_func across func does *not* affect the interrupt 
respond times in such a disastrous way.


Thus you must be *very* careful w.r.t. optimizing against asm volatile + memory 
clobber.  It's too easy to miss some side effects of *real* code.


And this is just one example that proves that moving const across asm is not 
sane.  Setting floating point rounding mode might be an other, but are surely 
many other cases you don't imagine and I don't imagine.  Thus always keep in 
mind that GCC is supposed to be a compiler that shall generate code for *real* 
machines with real *weird* hardware of requirements.


Optimizing code to scrap and pointing to some GCC internal reasoning or some 
standard's wording does not help with real code.


Johann


One of the big grey areas is what should happen for floating

Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Eric Botcazou
> Of course if the GIMPLE level doesn't care about the barrier then it doesn't
> make sense to be overly conservative at the RTL CSE level.  Thus I think we
> can just remove this barrier completely.

Not clear to me, what happens e.g. for register variables?

-- 
Eric Botcazou


Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Richard Biener
On Fri, Feb 28, 2014 at 10:23 AM, Eric Botcazou  wrote:
>> Of course if the GIMPLE level doesn't care about the barrier then it doesn't
>> make sense to be overly conservative at the RTL CSE level.  Thus I think we
>> can just remove this barrier completely.
>
> Not clear to me, what happens e.g. for register variables?

Not sure, on GIMPLE some passes explicitely avoid doing some transforms
for DECL_HARD_REGISTER variables, some don't.  A volatile asm is
certainly not considered a barrier here (DECL_HARD_REGISTER vars are
not written into SSA form but they are not having their address taken
and thus the volatile asm memory clobber isn't a barrier for them).

Richard.


Handling error conditions in libgomp

2014-02-28 Thread Thomas Schwinge
Hi!

Currently when libgomp hits an error (that is, an internal error,
possibly due to the user doing "stupid" things), after printing a message
libgomp/error.c:gomp_fatal terminates the executing process:

void
gomp_fatal (const char *fmt, ...)
{
  va_list list;

  va_start (list, fmt);
  gomp_verror (fmt, list);
  va_end (list);

  exit (EXIT_FAILURE);
}

The process cannot recover from this, trying to continue despite the
error.  (It is of course questionable what exactly to do in this case, as
libgomp's internal state may now be corrupt.)  So far, such errors may
have been rare (aside from real bugs, only/primarily dynamic resource
exhaustion?), but in the advent of libgomp using external modules (plugin
interface, for acceleration devices) I expect them to become more
frequent.

That aside, it is also an hurdle for ourselves if in libgomp's testsuite
we want to test that the library does the right thing (abort) for
non-sensible requests.

Does it make sense to add the option for the user to install a handler
for this?  Three quick ideas, all untested: generally use abort in
libgomp, which can then be caught with a SIGABRT handler that the user
has set up (difficult to communicate information from libgomp to the
handler); adding a weak function GOMP_error that the user can provide and
that libgomp will call in presence of an error; or provide some GOMP_init
function for registering an error handler.  The actual interface might be
something like: an enum to indicate the class (severity?) of the error, a
const char* for an error message that libgomp has generated (possibly
forwarded from a plugin), and a boolean return value to tell libgomp to
either continue or terminate the process.  Then, also libgomp's internal
initialization could be made more explicit, so that it can (be)
reinitialize(d) after an error occured.  It makes sense that the default
remains to terminate the process in presence of an error.

I have not yet researched what other OpenACC or OpenMP implementations
are doing, or other compiler support/runtime libraries.

Thoughts?


Grüße,
 Thomas


pgplDFDltggmv.pgp
Description: PGP signature


Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Richard Sandiford
Eric Botcazou  writes:
>> One of the big grey areas is what should happen for floating-point ops
>> that depend on the current rounding mode.  That isn't really modelled
>> properly yet though.  Again, it affects calls as well as volatile asms.
>
> There is an explicit comment about this in the scheduler:
>
> case ASM_OPERANDS:
> case ASM_INPUT:
>   {
>   /* Traditional and volatile asm instructions must be considered to use
>  and clobber all hard registers, all pseudo-registers and all of
>  memory.  So must TRAP_IF and UNSPEC_VOLATILE operations.
>
>  Consider for instance a volatile asm that changes the fpu rounding
>  mode.  An insn should not be moved across this even if it only uses
>  pseudo-regs because it might give an incorrectly rounded result.  */
>   if (code != ASM_OPERANDS || MEM_VOLATILE_P (x))
> reg_pending_barrier = TRUE_BARRIER;

But here too the point is that we don't assume the same thing at the
tree level or during register allocation.  It seems a bit silly for
the scheduler to assume that all hard registers are clobbered when the
register allocator itself doesn't assume that.  And most rtl passes
assume that changes to pseudo registers are explicitly modelled via SETs
or CLOBBERs.

E.g. to slightly adjust my previous example:

  void foo (float *dest, float x, float y)
  {
dest[0] = x + y;
asm volatile ("# foo");
dest[1] = x + y;
  }

gives:

addss   %xmm1, %xmm0
movss   %xmm0, (%rdi)
#APP
# 4 "/tmp/foo.c" 1
# foo
# 0 "" 2
#NO_APP
movss   %xmm0, 4(%rdi)
ret

And we certainly don't consider volatile asms to clobber memory in other
places.  E.g.:

  void foo (float *dest, float x, float y)
  {
dest[0] = x + y;
asm volatile ("# foo");
dest[1] = dest[0];
  }

gives the same code as above.  In contrast:

  void foo (float *dest, float x, float y)
  {
dest[0] = x + y;
asm volatile ("# foo" ::: "memory");
dest[1] = dest[0];
  }

_does_ force dest[0] to be reloaded.

As far as it being a scheduling barrier, consider something like:

  void foo (float *dest, float x, float y)
  {
int i;
for (i = 0; i < 100; i++)
  {
asm volatile ("# a");
dest[i] = x + y;
asm volatile ("# b");
  }
  }

At -O2 the addition is hoisted out of the loop.

Obviously the scheduler is free to be extra conservative but I don't
think the comment describes the semantics of volatile asms.

Thanks,
Richard


Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Richard Sandiford
Georg-Johann Lay  writes:
> Notice that in code1, func might contain such asm-pairs to implement
> atomic operations, but moving costly_func across func does *not*
> affect the interrupt respond times in such a disastrous way.
>
> Thus you must be *very* careful w.r.t. optimizing against asm volatile
> + memory clobber.  It's too easy to miss some side effects of *real*
> code.

I understand the example, but I don't think volatile asms guarantee
what you want here.

> Optimizing code to scrap and pointing to some GCC internal reasoning or some 
> standard's wording does not help with real code.

But how else can a compiler work?  It doesn't just regurgitate canned code,
so it can't rely on human intuition as to what "makes sense".  We have to
have specific rules and guarantees and say that anything outside those
rules and guarantees is undefined.

It sounds like you want an asm with an extra-strong ordering guarantee.
I think that would need to be an extension, since it would need to consider
cases where the asm is used in a function.  (Shades of carries_dependence
or whatever in the huge atomic thread.)  I think anything where:

  void foo (void) { X; }
  void bar (void) { Y1; foo (); Y2; }

has different semantics from:

  void bar (void) { Y1; X; Y2; }

is very dangerous.  And assuming that any function call could enable
or disable interrupts, and therefore that nothing can be moved across
a non-const function call, would limit things a bit too much.

Thanks,
Richard



Re: Asm volatile causing performance regressions on ARM

2014-02-28 Thread Eric Botcazou
> But here too the point is that we don't assume the same thing at the
> tree level or during register allocation.  It seems a bit silly for
> the scheduler to assume that all hard registers are clobbered when the
> register allocator itself doesn't assume that.  And most rtl passes
> assume that changes to pseudo registers are explicitly modelled via SETs
> or CLOBBERs.

OK, let's declare volatile asms dead as full optimization barriers then, which 
means that they will need to be taken out of volatile_insn_p eventually and 
the comments in the RTL middle-end adjusted.

In the short term, i.e. for 4.8.x/4.9.x, we cannot simply revert the patch 
because it has been written to catch UNSPEC_VOLATILE.  So we could devise 
another, temporary predicate in rtlanal.c that would be volatile_insn_p minus 
volatile asms and invoke it from the places in cse.c, cselib.c and dse.c which 
have been changed by the patch.

-- 
Eric Botcazou


Re: [RFC] Meta-description for tree and gimple folding

2014-02-28 Thread Marc Glisse

On Thu, 27 Feb 2014, Richard Biener wrote:


I've been hacking on a prototype that generates matching and
simplification code from a meta-description.  The goal is
to provide a single source of transforms currently spread
over the compiler, mostly fold-const.c, gimple-fold.c and
tree-ssa-forwprop.c.  Another goal is to make these transforms
(which are most of the form of generating a simpler form of
a pattern-matched IL piece) more readily available to passes
like value-numbering so they can be used on-the-fly, using
information provided by the pass lattice.  The ultimate
goal is to generate (most of) fold-const.c and gimple-fold.c
and tree-ssa-forwprop.c from a single meta description.

[...]

Comments or suggestions?


It is hard to judge from such simple examples. What if I want to do the 
transformation only if CST+CST' doesn't overflow? Can I make it handle 
commutative operations cleverly? How do I restrict some subexpression to 
have a single use? If I want to check some flag_*, do I have to create a 
predicate that tests for that as well as whatever other test the specific 
operand needed? Won't valueize create a lot of not-so-necessary trees when 
used on gimple? If I write a COND_EXPR matcher, could it generate code for 
phiopt as well? What is the point of gimple_match (without 
_and_substitute)? Will it clean up the new dead statements after itself? 
Can I write (PLUS_EXPR@0 @1 @2) so I can still refer to this subexpression 
later? How do you handle a transformation that currently tries to 
recursively fold something else and does the main transformation only if 
that simplified?



I guess most of the answers will become obvious once you have converted a 
few existing transformations and I can look at those...


Since this is going to require a rewrite of many transformations, I wonder 
if there are things that should be improved before starting so we can 
write the right tests directly. For A+B-B, we can do the transformation if 
the type is an integer for which overflow either wraps or is undefined or 
traps-but-it-is-ok-to-remove-those-traps (but not if it traps and we 
promise to preserve traps, or if the type saturates), or the type is a 
float and we have the unsafe math flag, or the type is a vector/complex 
and the transformation is valid for the element type. Ok, maybe not the 
best example, but we have for instance -ftrapping-math for which a split 
was discussed (PR 53805#c4).


--
Marc Glisse


Re: [RFC] Meta-description for tree and gimple folding

2014-02-28 Thread Diego Novillo
On Thu, Feb 27, 2014 at 9:34 AM, Richard Biener  wrote:

> Comments or suggestions?

On the surface it looks like a nice idea.  However, I would like to
understand the scope of this.  Are you thinking of a pattern matcher
with peephole like actions?  Or would you like to evolve a DSL capable
of writing compiler passes?  (much like MELT).

I prefer keeping it simple and limit the scope to pattern matching and
replacement. There will be other things to handle, however (overflow,
trapping arithmetic, etc).  The language will grow over time.

In terms of the mini-language, I don't like the lisp syntax but I
appreciate that it is a familiar language for GCC since the backend
uses it extensively.

Please consider testing facilities as well. We should be able to write
unit tests in this language to test its syntax validation, semantic
actions, and transformations.


Diego.


Changing the MIPS ISA for the Loongson 3A from MIPS64 to MIPS64r2

2014-02-28 Thread Andrew Bennett
Hi,

I have noticed that a patch was placed in bugzilla to do this change, but it 
does not appear to have been pushed.  I was wondering if anyone could comment
why this is the case?

The bugzilla URL is the following:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57754


Many thanks,


Andrew

Andrew Bennett
Software Design Engineer, MIPS Processor IP
Imagination Technologies Limited
t: +44 (0)113 2429814
www.imgtec.com


Re: [RFC] Meta-description for tree and gimple folding

2014-02-28 Thread Kai Tietz
Hmm,  this all reminds me about the approach Andrew Pinski and I came
up with two years ago.  All in all I think it might be worth to
express folding-patterns in a more abstract way.  So the md-like Lisp
syntax for this seems to be just stringent.  We make use of such a
script-language already for machine-description.

Nevertheless I doubt that we really want to have same facility for
fold-const and gimple.  Instead I got the impression that we would
prefer to have all folding-optimizations instead in Middle-end
(GIMPLE).  We need folding in front-end (AST) mostly for determination
of constant-expression detection. Otherwise we want to keep maximum of
original AST to have best results for debugging (and even to think
about having alternative FE on our middle/backend) and code-analyzers.

So I doubt that we want to keep fold-const patterns similar to gimple
(forward-prop) ones.
Wouldn't it make more sense to move fold-const patterns completely
into gimple, and having a facility in FE to ask gimple to *pre*-fold a
given tree to see if a constant-expression can be achieved?

Regards,
Kai


Re: Handling error conditions in libgomp

2014-02-28 Thread Richard Henderson
On 02/28/2014 03:37 AM, Thomas Schwinge wrote:
> The process cannot recover from this, trying to continue despite the
> error.  (It is of course questionable what exactly to do in this case, as
> libgomp's internal state may now be corrupt.)  So far, such errors may
> have been rare (aside from real bugs, only/primarily dynamic resource
> exhaustion?), but in the advent of libgomp using external modules (plugin
> interface, for acceleration devices) I expect them to become more
> frequent.

I could see that, yes.  However...

> Does it make sense to add the option for the user to install a handler
> for this?  Three quick ideas, all untested: generally use abort in
> libgomp, which can then be caught with a SIGABRT handler that the user
> has set up (difficult to communicate information from libgomp to the
> handler); adding a weak function GOMP_error that the user can provide and
> that libgomp will call in presence of an error; or provide some GOMP_init
> function for registering an error handler.  The actual interface might be
> something like: an enum to indicate the class (severity?) of the error, a
> const char* for an error message that libgomp has generated (possibly
> forwarded from a plugin), and a boolean return value to tell libgomp to
> either continue or terminate the process.  Then, also libgomp's internal
> initialization could be made more explicit, so that it can (be)
> reinitialize(d) after an error occured.  It makes sense that the default
> remains to terminate the process in presence of an error.

I've never been keen on weak symbol interposition.  That works for an
application which is the sole user of libgomp, but does not work for libraries
using libgomp and worse for multiple such libraries.

I'd be ok with some kind of registration interface, like

  old = omp_set_error_handler (new);

so that a library can set and restore the handler around its own omp usage.

As for the interface of the handler itself...  I dunno.  I'd hate to make it
too complicated to actually use.  If severity is restricted to a single bit
meaning "cannot possibly continue" or "transient error".  Maybe with the error
string, since who knows what kind of weirdness is going on in the plugin.

A significant question is what thread this error would be reported on, and the
state of the other threads in the team when it happens.  This is the primary
reason that OMP disallows EH to propagate from inside the parallel region to
outside -- the exception would have no where to go if it was raised from other
than the primary thread.

In order to be usable at all, one would have to arrange for the error to be
reported on a thread that could do something about it.  Does it do any good to
report the error on thread 3, if thread 3 can't communicate with the user to
actually report the error?

I suppose there are a few types of "continuable" errors that might want some
sort of user interaction.  E.g. if the error is from a plugin, before we've
committed to performing any computation with the plugin, we could well be in a
state where we could fall back to the host for execution.  But we could want to
let the user know that the plugin failed and this is why the run is going to be
slow this time.

> I have not yet researched what other OpenACC or OpenMP implementations
> are doing, or other compiler support/runtime libraries.

That would be good to know.


r~



signature.asc
Description: OpenPGP digital signature


Re: Handling error conditions in libgomp

2014-02-28 Thread Nathan Sidwell

On 02/28/14 16:05, Richard Henderson wrote:


I'd be ok with some kind of registration interface, like

   old = omp_set_error_handler (new);

so that a library can set and restore the handler around its own omp usage.


I agree.  An explicit API should be exposed to do this.


As for the interface of the handler itself...  I dunno.  I'd hate to make it
too complicated to actually use.  If severity is restricted to a single bit
meaning "cannot possibly continue" or "transient error".  Maybe with the error
string, since who knows what kind of weirdness is going on in the plugin.


I wonder whether there should be 2 separate handlers -- one for the diagnostic 
and one for fatality?  Something like:


typedef void omp_diag_t (void *usr_data, unsigned flags, const char *, va_list);

void *OMP_diag_data;
omp_diag_t *OMP_diag_fn = OMP_diag_default;
void (*OMP_fatal_fn) (void) = abort;

then in gomp:
void OMP_error (unsigned flags, const char *fmt, ...) {
  va_list args;
  ...
  OMP_diag_fn (OMP_diag_data, flags, fmt, args);
}

void __attribute__ ((noreturn))
OMP_fatal (const char *fmt, ...) {
  ...
  OMP_diag_fn (OMP_diag_data, SOMETHING, fmt, args);
  OMP_fatal_fn ();
}

That way the non-returning nature of OMP_fatal can be exposed to the compiler 
(I'm sure that'll kill a lot of false positive compiler warnings).


nathan

--
Nathan Sidwell


Re: Changing the MIPS ISA for the Loongson 3A from MIPS64 to MIPS64r2

2014-02-28 Thread Richard Sandiford
Andrew Bennett  writes:
> Hi,
>
> I have noticed that a patch was placed in bugzilla to do this change, but it 
> does not appear to have been pushed.  I was wondering if anyone could comment
> why this is the case?
>
> The bugzilla URL is the following:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57754

Looks OK if it passes testing.  We'll need a name and email address for
the commit though.

Thanks,
Richard


Re: Handling error conditions in libgomp

2014-02-28 Thread Torvald Riegel
On Fri, 2014-02-28 at 12:37 +0100, Thomas Schwinge wrote:
> Does it make sense to add the option for the user to install a handler
> for this?

The problem with calling a handler is that it's often hard to define
which state the program is in during the handler.  If libgomp had to
terminate because of a programming error, there's often little one can
do anyway to resolve the problem; even proper cleanup could be hard
because the program may still hold locks (or violate other invariants)
that are necessary for meaningful cleanup.

Another problem is that if you specify that a handler is called, then
you typically have to always call the handler in an erroneous situation.

Thus, I think it would be good to differentiate between different
classes of problems.
For example, if it's a plugin initialization that's implicit in some
lambda construct, then a handler might work, but offering a separate
function that triggers such initialization and returns an error might be
even better; with such a function, the programmer can do error handling
up front.
In contrast, for programming errors, simply terminating would be fine
for me.  At least for C/C++ there's plenty of undefined behavior, so
trying to do something better for OpenMP might not be as helpful.

> I have not yet researched what other OpenACC or OpenMP implementations
> are doing, or other compiler support/runtime libraries.

I suggest to also have a look at what proposed C++ parallelism
abstractions do.  They essentially return exceptions from user-supplied
code as a list of exceptions, which is returned (or thrown) by the
construct that spawned the parallel execution; translating this to
handlers, the thread that spawned the construct should execute any
handler.

However, part of the problem is certainly OpenMPs goal to provide
unified abstractions that work the same in all 3 base languages, which
isn't too helpful if considering that those languages have different
mechanisms for error handling (e.g., C++ exceptions vs. error codes in
C).



Re: Handling error conditions in libgomp

2014-02-28 Thread Joseph S. Myers
On Fri, 28 Feb 2014, Thomas Schwinge wrote:

> That aside, it is also an hurdle for ourselves if in libgomp's testsuite
> we want to test that the library does the right thing (abort) for
> non-sensible requests.

If you want to verify that a program exits with an error (so that exit 
with an error means PASS, exit successfully means FAIL), that's what 
dg-shouldfail is for.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-28 Thread Paul E. McKenney
On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote:
> > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
> >  wrote:
> > >
> > > 3.  The comparison was against another RCU-protected pointer,
> > > where that other pointer was properly fetched using one
> > > of the RCU primitives.  Here it doesn't matter which pointer
> > > you use.  At least as long as the rcu_assign_pointer() for
> > > that other pointer happened after the last update to the
> > > pointed-to structure.
> > >
> > > I am a bit nervous about #3.  Any thoughts on it?
> > 
> > I think that it might be worth pointing out as an example, and saying
> > that code like
> > 
> >p = atomic_read(consume);
> >X;
> >q = atomic_read(consume);
> >Y;
> >if (p == q)
> > data = p->val;
> > 
> > then the access of "p->val" is constrained to be data-dependent on
> > *either* p or q, but you can't really tell which, since the compiler
> > can decide that the values are interchangeable.
> > 
> > I cannot for the life of me come up with a situation where this would
> > matter, though. If "X" contains a fence, then that fence will be a
> > stronger ordering than anything the consume through "p" would
> > guarantee anyway. And if "X" does *not* contain a fence, then the
> > atomic reads of p and q are unordered *anyway*, so then whether the
> > ordering to the access through "p" is through p or q is kind of
> > irrelevant. No?
> 
> I can make a contrived litmus test for it, but you are right, the only
> time you can see it happen is when X has no barriers, in which case
> you don't have any ordering anyway -- both the compiler and the CPU can
> reorder the loads into p and q, and the read from p->val can, as you say,
> come from either pointer.
> 
> For whatever it is worth, hear is the litmus test:
> 
> T1:   p = kmalloc(...);
>   if (p == NULL)
>   deal_with_it();
>   p->a = 42;  /* Each field in its own cache line. */
>   p->b = 43;
>   p->c = 44;
>   atomic_store_explicit(&gp1, p, memory_order_release);
>   p->b = 143;
>   p->c = 144;
>   atomic_store_explicit(&gp2, p, memory_order_release);
> 
> T2:   p = atomic_load_explicit(&gp2, memory_order_consume);
>   r1 = p->b;  /* Guaranteed to get 143. */
>   q = atomic_load_explicit(&gp1, memory_order_consume);
>   if (p == q) {
>   /* The compiler decides that q->c is same as p->c. */
>   r2 = p->c; /* Could get 44 on weakly order system. */
>   }
> 
> The loads from gp1 and gp2 are, as you say, unordered, so you get what
> you get.
> 
> And publishing a structure via one RCU-protected pointer, updating it,
> then publishing it via another pointer seems to me to be asking for
> trouble anyway.  If you really want to do something like that and still
> see consistency across all the fields in the structure, please put a lock
> in the structure and use it to guard updates and accesses to those fields.

And here is a patch documenting the restrictions for the current Linux
kernel.  The rules change a bit due to rcu_dereference() acting a bit
differently than atomic_load_explicit(&p, memory_order_consume).

Thoughts?

Thanx, Paul



documentation: Record rcu_dereference() value mishandling

Recent LKML discussings (see http://lwn.net/Articles/586838/ and
http://lwn.net/Articles/588300/ for the LWN writeups) brought out
some ways of misusing the return value from rcu_dereference() that
are not necessarily completely intuitive.  This commit therefore
documents what can and cannot safely be done with these values.

Signed-off-by: Paul E. McKenney 

diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX
index fa57139f50bf..f773a264ae02 100644
--- a/Documentation/RCU/00-INDEX
+++ b/Documentation/RCU/00-INDEX
@@ -12,6 +12,8 @@ lockdep-splat.txt
- RCU Lockdep splats explained.
 NMI-RCU.txt
- Using RCU to Protect Dynamic NMI Handlers
+rcu_dereference.txt
+   - Proper care and feeding of return values from rcu_dereference()
 rcubarrier.txt
- RCU and Unloadable Modules
 rculist_nulls.txt
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 9d10d1db16a5..877947130ebe 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -114,12 +114,16 @@ over a rather long period of time, but improvements are 
always welcome!
http://www.openvms.compaq.com/wizard/wiz_2637.html
 
The rcu_dereference() primitive is also an excellent
-   documentation aid, letting the person reading the code
-   know exactly which pointers are protected by RCU.
+   documentation aid, letting the person reading the
+