Re: suspect code in fold-const.c

2013-11-18 Thread Richard Biener
On Fri, 15 Nov 2013, Kenneth Zadeck wrote:

> 
> This patch fixes a number of places where the mode bitsize had been used but
> the mode precision should have been used.  The tree level is somewhat sloppy
> about this - some places use the mode precision and some use the mode bitsize.
> It seems that the mode precision is the proper choice since it does the
> correct thing if the underlying mode is a partial int mode.
> 
> This code has been tested on x86-64 with no regressions.   Ok to commit?

Ok.

Thanks,
Richard.

> 
> 
> 2013-11-15 Kenneth Zadeck 
> * tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to
> GET_MODE_PRECISION.
> * fold-const.c (fold_single_bit_test_into_sign_test,
> fold_binary_loc):  Change GET_MODE_BITSIZE to
> GET_MODE_PRECISION.
> 
> Kenny
> 
> 
> On 11/15/2013 08:32 AM, Kenneth Zadeck wrote:
> > On 11/15/2013 04:07 AM, Eric Botcazou wrote:
> > > > this code from fold-const.c starts on line 13811.
> > > > 
> > > >   else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
> > > >&& TREE_INT_CST_LOW (arg1) == signed_max_lo
> > > >&& TYPE_UNSIGNED (arg1_type)
> > > >/* We will flip the signedness of the comparison operator
> > > >   associated with the mode of arg1, so the sign bit is
> > > >   specified by this mode.  Check that arg1 is the signed
> > > >   max associated with this sign bit.  */
> > > >&& width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
> > > >/* signed_type does not work on pointer types. */
> > > >&& INTEGRAL_TYPE_P (arg1_type))
> > > with width defined as:
> > > 
> > > unsigned int width = TYPE_PRECISION (arg1_type);
> > > 
> > > > it seems that the check on bitsize should really be a check on the
> > > > precision of the variable.   If this seems right, i will correct this on
> > > > the trunk and make the appropriate changes to the wide-int branch.
> > > Do you mean
> > > 
> > >&& width == GET_MODE_PRECISION (TYPE_MODE (arg1_type))
> > > 
> > > instead?  If so, that would probably make sense, but there are a few other
> > > places with the same TYPE_PRECISION/GET_MODE_BITSIZE check, in particular
> > > the
> > > very similar transformation done in fold_single_bit_test_into_sign_test.
> > > 
> > yes.  I understand the need to do this check on the mode rather than the
> > precision of the type itself.
> > The point is that if the mode under this type happens to be a partial int
> > mode, then that sign bit may not even be where the bitsize points to.
> > 
> > However, having just done a few greps, it looks like this case was just the
> > one that i found while doing the wide-int work, there may be several more of
> > these cases.   Just in fold-const, there are a couple in fold_binary_loc.
> > The one in tree.c:int_fits_type_p looks particularly wrong.
> > 
> > I think that there are also several in tree-vect-patterns.c.
> > 
> > Kenny
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer


Re: memset zero bytes at NULL - isolate-erroneous-paths

2013-11-18 Thread Richard Biener
On Mon, Nov 18, 2013 at 8:11 AM, Florian Weimer  wrote:
> * Jeff Law:
>
>>> Is this new in C11?  Does it apply to functions such as strnlen as well?
>
>> No, it's C99 I think.  There was a clarification which came in after
>> C99 which clarified that even if the length is zero, the pointers must
>> still be valid.
>
> Okay, I found the language in sections 7.1.4 and 7.21.1 (thanks Marc).
>
> This is a bit unfortunate because it interoperates poorly with
> std::vector::data(), which can return a null pointer if the vector
> is empty.

I'd say that turning memset (0, '\0', 0) into a trap is bad from a QOI
perspective.  Jeff, is there an easy way to avoid this?  Testcase:

void fn (void *addr, int a)
{
  if (a == 0)
addr = (void *)0;
  __builtin_memset (addr, '\0', a);
}

I wonder where in isolate-paths you check for builtins at all?  ah,
it's probably from the nonnull attribute on memset.  Which also
means that trying to catch this case reliably isn't going to work
(you cannot prove the program has len == 0 in this case and
conditionally not trapping would somewhat defeat the purpose
of isolating this path)

Richard.


Re: memset zero bytes at NULL - isolate-erroneous-paths

2013-11-18 Thread Jakub Jelinek
On Mon, Nov 18, 2013 at 12:08:27PM +0100, Richard Biener wrote:
> I'd say that turning memset (0, '\0', 0) into a trap is bad from a QOI
> perspective.  Jeff, is there an easy way to avoid this?  Testcase:
> 
> void fn (void *addr, int a)
> {
>   if (a == 0)
> addr = (void *)0;
>   __builtin_memset (addr, '\0', a);
> }
> 
> I wonder where in isolate-paths you check for builtins at all?  ah,
> it's probably from the nonnull attribute on memset.  Which also
> means that trying to catch this case reliably isn't going to work
> (you cannot prove the program has len == 0 in this case and
> conditionally not trapping would somewhat defeat the purpose
> of isolating this path)

Well, if some function has nonnull attribute on some argument, then that
argument shouldn't have NULL value even if some length argument is 0.
In the case of memset (and various other functions) C99 clearly says that
memset (NULL, 0, 0); is invalid, if there are some functions that have
a pointer, length argument pair and for length 0 pointer is allowed to be
NULL, then those functions shouldn't have nonnull attribute.

Jakub


Re: PLUGIN_HEADER_FILE event for tracing of header inclusions.

2013-11-18 Thread Joseph S. Myers
On Sun, 17 Nov 2013, Basile Starynkevitch wrote:

> What would be the good way to add such a plugin event to GCC 4.9?

See the cpp_callbacks structure, used to make diagnostics go through GCC's 
diagnostics machinery, for example.  I'm not clear why the existing 
callbacks (in particular the file_change one) wouldn't be enough.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: memset zero bytes at NULL - isolate-erroneous-paths

2013-11-18 Thread Jeff Law

On 11/18/13 04:08, Richard Biener wrote:

I'd say that turning memset (0, '\0', 0) into a trap is bad from a QOI

perspective.  Jeff, is there an easy way to avoid this?  Testcase:

void fn (void *addr, int a)
{
   if (a == 0)
 addr = (void *)0;
   __builtin_memset (addr, '\0', a);
}

I wonder where in isolate-paths you check for builtins at all?  ah,
it's probably from the nonnull attribute on memset.  Which also
means that trying to catch this case reliably isn't going to work
(you cannot prove the program has len == 0 in this case and
conditionally not trapping would somewhat defeat the purpose
of isolating this path)
It's the nonnull attribute on memset.  One thought would split the 
optimization into two parts.  One which transforms *0 and the other 
which transforms calls/returns.  Have the former enabled by -O2 the 
latter off for now. For the next release, both enabled by default at -O2.


Add distinct warnings for both cases, possibly enabled by -Wall (depends 
on the noise).


That gets most of the benefit now and gives a way for users to identify 
brokenness in their code.


Sadly, this feels a lot like -fstrict-aliasing did eons ago.  Aggressive 
TBAA exposed all kinds problems and it took a lot of user (re)education 
to get them fixed.


jeff


Re: memset zero bytes at NULL - isolate-erroneous-paths

2013-11-18 Thread Ondřej Bílka
On Mon, Nov 18, 2013 at 07:24:46AM -0700, Jeff Law wrote:
> On 11/18/13 04:08, Richard Biener wrote:
> >>I'd say that turning memset (0, '\0', 0) into a trap is bad from a QOI
> >perspective.  Jeff, is there an easy way to avoid this?  Testcase:
> >
> >void fn (void *addr, int a)
> >{
> >   if (a == 0)
> > addr = (void *)0;
> >   __builtin_memset (addr, '\0', a);
> >}
> >
> >I wonder where in isolate-paths you check for builtins at all?  ah,
> >it's probably from the nonnull attribute on memset.  Which also
> >means that trying to catch this case reliably isn't going to work
> >(you cannot prove the program has len == 0 in this case and
> >conditionally not trapping would somewhat defeat the purpose
> >of isolating this path)
> It's the nonnull attribute on memset.  One thought would split the
> optimization into two parts.  One which transforms *0 and the other
> which transforms calls/returns.  Have the former enabled by -O2 the
> latter off for now. For the next release, both enabled by default at
> -O2.
> 
> Add distinct warnings for both cases, possibly enabled by -Wall
> (depends on the noise).
> 
> That gets most of the benefit now and gives a way for users to
> identify brokenness in their code.
> 
> Sadly, this feels a lot like -fstrict-aliasing did eons ago.
> Aggressive TBAA exposed all kinds problems and it took a lot of user
> (re)education to get them fixed.
> 
You risk that when user tries to use isolate paths only to find spurious
errors like these that he will not use it even in cases where it helps.

One way would be remove nonnull attribute in mem* functions.

Note that c standard also disallows.

char *m = malloc (32);
if (!m)
  return 0;
...
int pos = 32;
return memchr (m, 42, 32 - pos);

On other hand if we could break invalid programs with impunity
one could make memchr/memcmp a cycle faster by dropping
an initial n == 0 check.


OpenACC or OpenMP 4.0 target directives

2013-11-18 Thread guray ozen
Hello,

I'm doing master at Polytechnic University of Catalonia, BarcelonaTech
and I started to my master thesis. My topic is code generation for
hardware accelerator into OmpSs. OmpSs is being developed by Barcelona
Supercomputer Center, and it has a runtime for gpu. It can manage
kernel invocation, multi-gpu, data transfer, asyncronus kernel
invocation and so on. That's why i'm using OmpSs. Because i want to
only focus code generation and optimizations. But i'm so new for this
work. Now i support that "target", "teams", "distribute", "distribute
parallel for" directives. However of course i can generate a so naive
kernel :( I'm looking for optimization techniques.

I came across a news about gcc will support OpenACC/OpenMP target
directive. How can i download this version? Moreover i'm going to ask
question about optimization. Which optimization techniques have you
applied? Do you have a any suggestion for me for this thesis? (papers,
algorithms and so on)

Regards,

Güray Özen
~grypp


Re: PLUGIN_HEADER_FILE event for tracing of header inclusions.

2013-11-18 Thread Basile Starynkevitch
On Mon, 2013-11-18 at 13:17 +, Joseph S. Myers wrote:
> On Sun, 17 Nov 2013, Basile Starynkevitch wrote:
> 
> > What would be the good way to add such a plugin event to GCC 4.9?
> 
> See the cpp_callbacks structure, used to make diagnostics go through GCC's 
> diagnostics machinery, for example.  I'm not clear why the existing 
> callbacks (in particular the file_change one) wouldn't be enough.


Thanks for your reply (and your interest to my suggestion).

I am not sure to understand what you suggest (because I see several ways
to understand it).

The first would be to add inside file libcpp/directives.c in its
function _cpp_do_file_change (e.g. after line 1044 the statement

  /* Signal to plugins that a header file is included.  */
  invoke_plugin_callbacks (PLUGIN_HEADER_FILE,
   ORDINARY_MAP_FILE_NAME (map));

The second would be to add a new way to invoke plugin callbacks which
would be to add the file libcpp/internals.h to the list of plugin
exported headers. At the very least, this means to add into the
PLUGIN_HEADERS variable of gcc/Makefile.in several files from
libcpp/includes/ and possibley even libcpp/internals.h

I find that the second way introduce a policy change w.r.t plugins. Up
to now, we tried hard to define the way plugins interact with GCC thru
the plugins.h and plugins.def file, but it looks that you want yet
another way.

I strongly prefer adding a new plugin event (PLUGIN_HEADER_FILE) and
just use it (and document it) to adding a new way of having plugins
modify the behavior of GCC (thru our various hooks, in that case the
file_change callback).

What do you practically suggest? Don't you feel that adding a new plugin
event (PLUGIN_HEADER_FILE) to plugins.def and adding a single call to
invoke_plugin_callbacks much lighter and simpler than having the plugin
need several additional files (into PLUGIN_HEADERS make variable)
etc...?

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***




Re: PLUGIN_HEADER_FILE event for tracing of header inclusions.

2013-11-18 Thread Joseph S. Myers
On Mon, 18 Nov 2013, Basile Starynkevitch wrote:

> On Mon, 2013-11-18 at 13:17 +, Joseph S. Myers wrote:
> > On Sun, 17 Nov 2013, Basile Starynkevitch wrote:
> > 
> > > What would be the good way to add such a plugin event to GCC 4.9?
> > 
> > See the cpp_callbacks structure, used to make diagnostics go through GCC's 
> > diagnostics machinery, for example.  I'm not clear why the existing 
> > callbacks (in particular the file_change one) wouldn't be enough.
> 
> 
> Thanks for your reply (and your interest to my suggestion).
> 
> I am not sure to understand what you suggest (because I see several ways
> to understand it).

I'm suggesting:

* You probably don't need to change libcpp at all.  Instead, insert your 
call to invoke_plugin_callbacks inside c-opts.c:cb_file_change.

* But if for some reason cb_file_change isn't called at the right time, 
then create a new function, still inside the c-family code, which calls 
invoke_plugin_callbacks, and a corresponding cpp_callbacks entry for it, 
and make one of the c-opts.c functions that sets callbacks fill in the new 
entry.

The key point is that both of those keep libcpp self-contained - you don't 
need to include plugin headers inside libcpp, because the libcpp client is 
responsible for registering callbacks with libcpp's callback mechanism, 
and it's the responsibility of such a callback to call plugins if the 
libcpp client (GCC in this case) has a plugin mechanism such that a plugin 
should be called from the callback.

> What do you practically suggest? Don't you feel that adding a new plugin
> event (PLUGIN_HEADER_FILE) to plugins.def and adding a single call to
> invoke_plugin_callbacks much lighter and simpler than having the plugin

The point is that this call needs to be in GCC, the client of the libcpp 
library, not directly in libcpp, the library.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Frame pointer, bug or feature? (x86)

2013-11-18 Thread Hendrik Greving
What's the difference in the C vs. the C++ spec that makes it a VLA in GNU-C?

On Fri, Nov 15, 2013 at 10:07 AM, Andrew Pinski  wrote:
> On Fri, Nov 15, 2013 at 9:31 AM, Hendrik Greving
>  wrote:
>> In the below test case, "CASE_A" actually uses a frame pointer, while
>> !CASE_A doesn't. I can't imagine this is a feature, this is a bug,
>> isn't it? Is there any reason the compiler couldn't know that
>> loop_blocks never needs a dynamic stack size?
>
>
> Both a feature and a bug.  In the CASE_A case (with GNU C) it is a VLA
> while in the !CASE_A case (or in either case with C++), it is a normal
> array definition.  The compiler could have converted the VLA to a
> normal array but does not depending on the size of the array.
>
> Thanks,
> Andrew Pinski
>
>>
>> #include 
>> #include 
>>
>> #define MY_DEFINE 100
>> #define CASE_A 1
>>
>> extern init(int (*a)[]);
>>
>> int
>> foo()
>> {
>> #if CASE_A
>> const int max = MY_DEFINE * 2;
>> int loop_blocks[max];
>> #else
>> int loop_blocks[MY_DEFINE * 2];
>> #endif
>> init(&loop_blocks);
>> return loop_blocks[5];
>> }
>>
>> int
>> main()
>> {
>> int i = foo();
>> printf("is is %d\n", i);
>> }
>>
>> Thanks,
>> Hendrik Greving


Re: Frame pointer, bug or feature? (x86)

2013-11-18 Thread Andrew Pinski
On Mon, Nov 18, 2013 at 10:47 AM, Hendrik Greving
 wrote:
> What's the difference in the C vs. the C++ spec that makes it a VLA in GNU-C?


max in C++ is considered an integer constant expression while in C it
is just an expression.

Thanks,
Andrew Pinski

>
> On Fri, Nov 15, 2013 at 10:07 AM, Andrew Pinski  wrote:
>> On Fri, Nov 15, 2013 at 9:31 AM, Hendrik Greving
>>  wrote:
>>> In the below test case, "CASE_A" actually uses a frame pointer, while
>>> !CASE_A doesn't. I can't imagine this is a feature, this is a bug,
>>> isn't it? Is there any reason the compiler couldn't know that
>>> loop_blocks never needs a dynamic stack size?
>>
>>
>> Both a feature and a bug.  In the CASE_A case (with GNU C) it is a VLA
>> while in the !CASE_A case (or in either case with C++), it is a normal
>> array definition.  The compiler could have converted the VLA to a
>> normal array but does not depending on the size of the array.
>>
>> Thanks,
>> Andrew Pinski
>>
>>>
>>> #include 
>>> #include 
>>>
>>> #define MY_DEFINE 100
>>> #define CASE_A 1
>>>
>>> extern init(int (*a)[]);
>>>
>>> int
>>> foo()
>>> {
>>> #if CASE_A
>>> const int max = MY_DEFINE * 2;
>>> int loop_blocks[max];
>>> #else
>>> int loop_blocks[MY_DEFINE * 2];
>>> #endif
>>> init(&loop_blocks);
>>> return loop_blocks[5];
>>> }
>>>
>>> int
>>> main()
>>> {
>>> int i = foo();
>>> printf("is is %d\n", i);
>>> }
>>>
>>> Thanks,
>>> Hendrik Greving


RFC: Use 32-byte PLT to preserve bound registers

2013-11-18 Thread H.J. Lu
Here is a proposal to use 32-byte PLT to preserve bound registers.
Any comments?

BTW, we are working on another proposal to use a second PLT
section with 8 byte or 16 byte memory overhead, instead of
24 byte overhead.

-- 
H.J.
---
Intel MPX:

http://software.intel.com/sites/default/files/319433-015.pdf

introduces 4 bound registers, which will be used for parameter passing
in x86-64.  Bound registers are cleared by branch instructions.  Branch
instructions with BND prefix will keep bound register contents. This leads
to 2 requirements to 64-bit MPX run-time:

1. Dynamic linker (ld.so) should save and restore bound registers during
symbol lookup.
2. Change the current 16-byte PLT0:

  ff 35 08 00 00 00pushq  GOT+8(%rip)
  ff 25 00 10 00jmpq  *GOT+16(%rip)
  0f 1f 40 00nopl   0x0(%rax)

and 16-byte PLT1:

  ff 25 00 00 00 00jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00   pushq  $index
  e9 00 00 00 00   jmpq   PLT0

which clear bound registers, to preserve bound registers.

We use 2 new relocations:

#define R_X86_64_PC32_BND  39 /* PC relative 32 bit signed with BND prefix */
#define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */

to mark branch instructions with BND prefix.

When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
it switches to a different PLT0:

  ff 35 08 00 00 00pushq  GOT+8(%rip)
  f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip)
  0f 1f 00nopl   (%rax)

to preserve bound registers for symbol lookup.  For a symbol with
R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use
a 32-byte PLT1:

  f2 ff 25 00 00 00 00bnd jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00pushq   $index
  f2 e9 00 00 00 00   bnd jmpq   PLT0
  0f 1f 80 00 00 00 00nopl   0(%rax)
  0f 1f 80 00 00 00 00nopl   0(%rax)

Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and
GOT[1] is stored in GOT[3].  We can undo prelink in GOT by computing
the corresponding the pushq offset with

GOT[1] + (GOT offset - &GOT[3]) * 2

It depends on that each pushq is 16-byte apart and GOT entry is 8 byte.
To support prelink, each 16-byte block in PLT must have an 8-byte entry
in GOT.  Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1.
Then we can undo prelink by computing the corresponding the pushq offset
with

pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2
pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0

For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND
relocations, this approach increases PLT size by 16 bytes and
GOT size by 8 bytes.  That is 24 bytes in total.

Pros: No additional sections are needed.
Cons: 24-byte memory overhead for each symbol with BND relocation.


Re: RFC: Use 32-byte PLT to preserve bound registers

2013-11-18 Thread H.J. Lu
There is a typo in pushq offset computation.  It should be

pushq_offset += ((unsigned char *) pushq_offset)[-6] == 0xf2 ? 1 : 0

instead of

pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0

H.J.

On Mon, Nov 18, 2013 at 11:03 AM, H.J. Lu  wrote:
> Here is a proposal to use 32-byte PLT to preserve bound registers.
> Any comments?
>
> BTW, we are working on another proposal to use a second PLT
> section with 8 byte or 16 byte memory overhead, instead of
> 24 byte overhead.
>
> --
> H.J.
> ---
> Intel MPX:
>
> http://software.intel.com/sites/default/files/319433-015.pdf
>
> introduces 4 bound registers, which will be used for parameter passing
> in x86-64.  Bound registers are cleared by branch instructions.  Branch
> instructions with BND prefix will keep bound register contents. This leads
> to 2 requirements to 64-bit MPX run-time:
>
> 1. Dynamic linker (ld.so) should save and restore bound registers during
> symbol lookup.
> 2. Change the current 16-byte PLT0:
>
>   ff 35 08 00 00 00pushq  GOT+8(%rip)
>   ff 25 00 10 00jmpq  *GOT+16(%rip)
>   0f 1f 40 00nopl   0x0(%rax)
>
> and 16-byte PLT1:
>
>   ff 25 00 00 00 00jmpq   *name@GOTPCREL(%rip)
>   68 00 00 00 00   pushq  $index
>   e9 00 00 00 00   jmpq   PLT0
>
> which clear bound registers, to preserve bound registers.
>
> We use 2 new relocations:
>
> #define R_X86_64_PC32_BND  39 /* PC relative 32 bit signed with BND prefix */
> #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */
>
> to mark branch instructions with BND prefix.
>
> When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
> it switches to a different PLT0:
>
>   ff 35 08 00 00 00pushq  GOT+8(%rip)
>   f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip)
>   0f 1f 00nopl   (%rax)
>
> to preserve bound registers for symbol lookup.  For a symbol with
> R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use
> a 32-byte PLT1:
>
>   f2 ff 25 00 00 00 00bnd jmpq   *name@GOTPCREL(%rip)
>   68 00 00 00 00pushq   $index
>   f2 e9 00 00 00 00   bnd jmpq   PLT0
>   0f 1f 80 00 00 00 00nopl   0(%rax)
>   0f 1f 80 00 00 00 00nopl   0(%rax)
>
> Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and
> GOT[1] is stored in GOT[3].  We can undo prelink in GOT by computing
> the corresponding the pushq offset with
>
> GOT[1] + (GOT offset - &GOT[3]) * 2
>
> It depends on that each pushq is 16-byte apart and GOT entry is 8 byte.
> To support prelink, each 16-byte block in PLT must have an 8-byte entry
> in GOT.  Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1.
> Then we can undo prelink by computing the corresponding the pushq offset
> with
>
> pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2
> pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0
>
> For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND
> relocations, this approach increases PLT size by 16 bytes and
> GOT size by 8 bytes.  That is 24 bytes in total.
>
> Pros: No additional sections are needed.
> Cons: 24-byte memory overhead for each symbol with BND relocation.



-- 
H.J.


Re: Frame pointer, bug or feature? (x86)

2013-11-18 Thread Hendrik Greving
Interesting, I just read up on it and I didn't know that. Thanks. Is
it correct to say though that it is a missing optimization and
frame_pointer_needed shouldn't evaluate to true?

On Mon, Nov 18, 2013 at 10:55 AM, Andrew Pinski  wrote:
> On Mon, Nov 18, 2013 at 10:47 AM, Hendrik Greving
>  wrote:
>> What's the difference in the C vs. the C++ spec that makes it a VLA in GNU-C?
>
>
> max in C++ is considered an integer constant expression while in C it
> is just an expression.
>
> Thanks,
> Andrew Pinski
>
>>
>> On Fri, Nov 15, 2013 at 10:07 AM, Andrew Pinski  wrote:
>>> On Fri, Nov 15, 2013 at 9:31 AM, Hendrik Greving
>>>  wrote:
 In the below test case, "CASE_A" actually uses a frame pointer, while
 !CASE_A doesn't. I can't imagine this is a feature, this is a bug,
 isn't it? Is there any reason the compiler couldn't know that
 loop_blocks never needs a dynamic stack size?
>>>
>>>
>>> Both a feature and a bug.  In the CASE_A case (with GNU C) it is a VLA
>>> while in the !CASE_A case (or in either case with C++), it is a normal
>>> array definition.  The compiler could have converted the VLA to a
>>> normal array but does not depending on the size of the array.
>>>
>>> Thanks,
>>> Andrew Pinski
>>>

 #include 
 #include 

 #define MY_DEFINE 100
 #define CASE_A 1

 extern init(int (*a)[]);

 int
 foo()
 {
 #if CASE_A
 const int max = MY_DEFINE * 2;
 int loop_blocks[max];
 #else
 int loop_blocks[MY_DEFINE * 2];
 #endif
 init(&loop_blocks);
 return loop_blocks[5];
 }

 int
 main()
 {
 int i = foo();
 printf("is is %d\n", i);
 }

 Thanks,
 Hendrik Greving


Re: OpenACC or OpenMP 4.0 target directives

2013-11-18 Thread Tobias Burnus

Güray Özen wrote:

I came across a news about gcc will support OpenACC/OpenMP target
directive. How can i download this version?


Well, the support is at an early stage, targetting several different 
backends. The work is done by several teams and, hence, not always very 
well coordinated. I think over the next months, it will improve as bits 
get merged into a common branch.


Some first steps to OpenACC support can be found in the GOMP-4_0-branch 
and in the openacc-1_0-branch branch.


The GOMP-4_0-branch bits aren't sufficient for offloading, yet. To my 
knowledge, the only publicly available implementation, which allows 
offloading is the openacc-1_0-branch, cf. 
http://gcc.gnu.org/ml/gcc/2013-10/msg9.html


To try it, download either of the two branches and build GCC yourself; 
see bottom of http://gcc.gnu.org/wiki/GFortranBinaries#FromSource




Moreover i'm going to ask question about optimization.


I think the first step is to actually get it working. Otherwise, the 
normal compiler optimizations are applied also to the target sections. I 
could imagine that there will also be some specific optimizations, e.g. 
with regards to copy/copyin/-out by avoiding unnecessary data transfers; 
however, I don't know whether such an optimization is already done in 
any of the branches.


Tobias


Re: Frame pointer, bug or feature? (x86)

2013-11-18 Thread Jakub Jelinek
On Mon, Nov 18, 2013 at 11:22:22AM -0800, Hendrik Greving wrote:
> Interesting, I just read up on it and I didn't know that. Thanks. Is
> it correct to say though that it is a missing optimization and
> frame_pointer_needed shouldn't evaluate to true?

Certainly not unconditionally.  It depends on the size and in which scope
it is declared.  If user meant to use a VLA and compiler optimizes it into
non-VLA, then it isn't deallocated at the end of it's scope, so if it e.g.
is very large or there are many of those, the optimization can break valid
programs (especially if it's scope isn't the function scope but some smaller
scope).

Jakub


Re: Frame pointer, bug or feature? (x86)

2013-11-18 Thread Hendrik Greving
Hmm don't VLA's obey the same lifetime rules as regular automatic
arrays on the stack?

On Mon, Nov 18, 2013 at 11:48 AM, Jakub Jelinek  wrote:
> On Mon, Nov 18, 2013 at 11:22:22AM -0800, Hendrik Greving wrote:
>> Interesting, I just read up on it and I didn't know that. Thanks. Is
>> it correct to say though that it is a missing optimization and
>> frame_pointer_needed shouldn't evaluate to true?
>
> Certainly not unconditionally.  It depends on the size and in which scope
> it is declared.  If user meant to use a VLA and compiler optimizes it into
> non-VLA, then it isn't deallocated at the end of it's scope, so if it e.g.
> is very large or there are many of those, the optimization can break valid
> programs (especially if it's scope isn't the function scope but some smaller
> scope).
>
> Jakub


Re: Frame pointer, bug or feature? (x86)

2013-11-18 Thread Jakub Jelinek
On Mon, Nov 18, 2013 at 12:43:50PM -0800, Hendrik Greving wrote:
> Hmm don't VLA's obey the same lifetime rules as regular automatic
> arrays on the stack?

In the languages yes, in GCC no.  There is code to determine possibilities
of sharing some stack space between variables that can't be used at the same
time, but all the stack space for addressable automatic variables is
typically allocated in function prologue and deallocated in the epilogue.

So, if you have say:
extern void baz (char *);

__attribute__((noinline)) void
bar (void)
{
  char buf[7 * 1024 * 1024];
  baz (buf);
}

void
foo (void)
{
  bar ();
  {
const int length = 5 * 1024 * 1024;
char buf[length];
baz (buf);
  }
  bar ();
}

and say typical Linux stack limit of 8-10MB, then if baz function (nor
anything it calls) doesn't need much stack space, nor foo callers, then
if buf[length] is a VLA, it will probably work just fine, if GCC decided
to optimize it into char buf[5 * 1024 * 1024]; instead, it would likely
fail.

Jakub


Re: Frame pointer, bug or feature? (x86)

2013-11-18 Thread Hendrik Greving
I see what you're saying. You mean because the VLA stack space can be
dynamically "free'd" right away, as opposed to be there until the
epilogue. That is true :( Is still seems odd when just looking at it.
It's hard to imagine somebody would actually code
myarray[const_thousand_var] as opposed to myarray[1000] with the
intention to control stack allocation... thanks though

On Mon, Nov 18, 2013 at 12:54 PM, Jakub Jelinek  wrote:
> On Mon, Nov 18, 2013 at 12:43:50PM -0800, Hendrik Greving wrote:
>> Hmm don't VLA's obey the same lifetime rules as regular automatic
>> arrays on the stack?
>
> In the languages yes, in GCC no.  There is code to determine possibilities
> of sharing some stack space between variables that can't be used at the same
> time, but all the stack space for addressable automatic variables is
> typically allocated in function prologue and deallocated in the epilogue.
>
> So, if you have say:
> extern void baz (char *);
>
> __attribute__((noinline)) void
> bar (void)
> {
>   char buf[7 * 1024 * 1024];
>   baz (buf);
> }
>
> void
> foo (void)
> {
>   bar ();
>   {
> const int length = 5 * 1024 * 1024;
> char buf[length];
> baz (buf);
>   }
>   bar ();
> }
>
> and say typical Linux stack limit of 8-10MB, then if baz function (nor
> anything it calls) doesn't need much stack space, nor foo callers, then
> if buf[length] is a VLA, it will probably work just fine, if GCC decided
> to optimize it into char buf[5 * 1024 * 1024]; instead, it would likely
> fail.
>
> Jakub


Re: suspect code in fold-const.c

2013-11-18 Thread Kenneth Zadeck

committed as revision 204987.

thanks

kenny

On 11/18/2013 05:38 AM, Richard Biener wrote:

On Fri, 15 Nov 2013, Kenneth Zadeck wrote:


This patch fixes a number of places where the mode bitsize had been used but
the mode precision should have been used.  The tree level is somewhat sloppy
about this - some places use the mode precision and some use the mode bitsize.
It seems that the mode precision is the proper choice since it does the
correct thing if the underlying mode is a partial int mode.

This code has been tested on x86-64 with no regressions.   Ok to commit?

Ok.

Thanks,
Richard.



2013-11-15 Kenneth Zadeck 
 * tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to
 GET_MODE_PRECISION.
 * fold-const.c (fold_single_bit_test_into_sign_test,
 fold_binary_loc):  Change GET_MODE_BITSIZE to
 GET_MODE_PRECISION.

Kenny


On 11/15/2013 08:32 AM, Kenneth Zadeck wrote:

On 11/15/2013 04:07 AM, Eric Botcazou wrote:

this code from fold-const.c starts on line 13811.

   else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
&& TREE_INT_CST_LOW (arg1) == signed_max_lo
&& TYPE_UNSIGNED (arg1_type)
/* We will flip the signedness of the comparison operator
   associated with the mode of arg1, so the sign bit is
   specified by this mode.  Check that arg1 is the signed
   max associated with this sign bit.  */
&& width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
/* signed_type does not work on pointer types. */
&& INTEGRAL_TYPE_P (arg1_type))

with width defined as:

 unsigned int width = TYPE_PRECISION (arg1_type);


it seems that the check on bitsize should really be a check on the
precision of the variable.   If this seems right, i will correct this on
the trunk and make the appropriate changes to the wide-int branch.

Do you mean

&& width == GET_MODE_PRECISION (TYPE_MODE (arg1_type))

instead?  If so, that would probably make sense, but there are a few other
places with the same TYPE_PRECISION/GET_MODE_BITSIZE check, in particular
the
very similar transformation done in fold_single_bit_test_into_sign_test.


yes.  I understand the need to do this check on the mode rather than the
precision of the type itself.
The point is that if the mode under this type happens to be a partial int
mode, then that sign bit may not even be where the bitsize points to.

However, having just done a few greps, it looks like this case was just the
one that i found while doing the wide-int work, there may be several more of
these cases.   Just in fold-const, there are a couple in fold_binary_loc.
The one in tree.c:int_fits_type_p looks particularly wrong.

I think that there are also several in tree-vect-patterns.c.

Kenny




Index: gcc/tree.c
===
--- gcc/tree.c	(revision 204986)
+++ gcc/tree.c	(working copy)
@@ -8629,7 +8629,7 @@ retry:
   /* Third, unsigned integers with top bit set never fit signed types.  */
   if (! TYPE_UNSIGNED (type) && unsc)
 {
-  int prec = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (c))) - 1;
+  int prec = GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (c))) - 1;
   if (prec < HOST_BITS_PER_WIDE_INT)
 	{
 	  if (unsigned HOST_WIDE_INT) 1) << prec) & dc.low) != 0)
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 204986)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,11 @@
+2013-11-18 Kenneth Zadeck 
+
+	* tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to
+	GET_MODE_PRECISION.
+	* fold-const.c (fold_single_bit_test_into_sign_test)
+	(fold_binary_loc):  Change GET_MODE_BITSIZE to
+	GET_MODE_PRECISION.
+
 2013-11-18  Teresa Johnson  
 
 	* gcc/cfgrtl.c (cfg_layout_initialize): Assert if we
Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 204986)
+++ gcc/fold-const.c	(working copy)
@@ -6593,7 +6593,7 @@ fold_single_bit_test_into_sign_test (loc
 	  /* This is only a win if casting to a signed type is cheap,
 	 i.e. when arg00's type is not a partial mode.  */
 	  && TYPE_PRECISION (TREE_TYPE (arg00))
-	 == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (arg00
+	 == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg00
 	{
 	  tree stype = signed_type_for (TREE_TYPE (arg00));
 	  return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR,
@@ -12049,7 +12049,7 @@ fold_binary_loc (location_t loc,
 	zerobits = unsigned HOST_WIDE_INT) 1) << shiftc) - 1);
 	  else if (TREE_CODE (arg0) == RSHIFT_EXPR
 		   && TYPE_PRECISION (TREE_TYPE (arg0))
-		  == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (arg0
+		  == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg0
 	{
 	  prec = TYPE_PRECISION (TREE_TYPE (arg0));
 	  tree arg00 = TREE_OPERAND (arg0, 0);
@@ -12060,7 +12060,7 @@ fold_binary_loc (location_t loc