Re: Apparent deeply-nested missing error bug with gcc 7.3

2018-06-21 Thread Soul Studios

UPDATE: My bad.
The original compiler feature detection on the test suite was broken/not
matching the correct libstdc++ versions.
Hence the emplace_back/emplace_front tests were not running.


Told you so :-P



However, it does surprise me that GCC doesn't check this code.


It's a dependent expression so can't be fully checked until
instantiated -- and as you've discovered, it wasn't being
instantiated. There's a trade-off between compilation speed and doing
additional work to check uninstantiated templates with arbitrarily
complex expressions in them.



Yeah, I get it - saves a lot of time with heavily-templated setups and 
large projects.


Re: ICE in a testcase, not sure about solution

2018-06-21 Thread Richard Biener
On Wed, Jun 20, 2018 at 8:26 PM Paul Koning  wrote:
>
> I'm running into an ICE in the GIMPLE phase, for gcc.c-torture/compile/386.c, 
> on pdp11 -mint32.  That's an oddball where int is 32 bits (due to the flag) 
> but Pmode is 16 bits (HImode).
>
> The ICE message is:
>
> ../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c: In function ‘main’:
> ../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c:24:1: error: invalid 
> types in nop conversion
>  }
>  ^
> int
> int *
> b_3 = (int) &i;
> during GIMPLE pass: einline
> ../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c:24:1: internal compiler 
> error: verify_gimple failed
>
> The offending code snippet is (I think):
>
> main ()
> {
>   int i;
>   foobar (i, &i);
> }
>
>
> foobar (a, b)
> {
>   int c;
>
>   c = a % b;
>   a = a / b;
>   return a + b;
> }
>
> where the foobar(i, &i) call passes an int* to a (defaulted) int function 
> parameter.  Is there an assumption that sizeof (int*) >= sizeof(int)?
>
> Any idea where to look?  It only shows up with -mint32; if int is 16 bits all 
> is well.  I'm not used to my target breaking things before I even get to 
> RTL...

Inlining allows some type mismatches mainly because at callers FEs may
have done promotion while callees usually
see unpromoted PARM_DECLs.  The inliner then inserts required
conversions.  In this case we do not allow widening conversions
from pointers without intermediate conversions to integers.  The
following ICEs in a similar way on x86 (with -m32):

main ()
{
  int i;
  foobar (i, &i);
}


foobar (int a, long long b)
{
  int c;

  c = a % b;
  a = a / b;
  return a + b;
}

so the inliner should avoid inlining in this case or alternatively
simulate what the target does
(converting according to POINTERS_EXTEND_UNSIGNED).

A fix could be as simple as

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 4568e1e2b57..8476c223e4f 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2358,7 +2358,9 @@ fold_convertible_p (const_tree type, const_tree arg)
 case INTEGER_TYPE: case ENUMERAL_TYPE: case BOOLEAN_TYPE:
 case POINTER_TYPE: case REFERENCE_TYPE:
 case OFFSET_TYPE:
-  return (INTEGRAL_TYPE_P (orig) || POINTER_TYPE_P (orig)
+  return (INTEGRAL_TYPE_P (orig)
+ || (POINTER_TYPE_P (orig)
+ && TYPE_PRECISION (type) <= TYPE_PRECISION (orig))
  || TREE_CODE (orig) == OFFSET_TYPE);

 case REAL_TYPE:

which avoids the inlining (if that is the desired solution).

Can you open a PR please?

Thanks,
Richard.

> paul
>


Re: How to get GCC on par with ICC?

2018-06-21 Thread Richard Biener
On Wed, Jun 20, 2018 at 11:12 PM NightStrike  wrote:
>
> On Wed, Jun 6, 2018 at 11:57 AM, Joel Sherrill  wrote:
> >
> > On Wed, Jun 6, 2018 at 10:51 AM, Paul Menzel <
> > pmenzel+gcc.gnu@molgen.mpg.de> wrote:
> >
> > > Dear GCC folks,
> > >
> > >
> > > Some scientists in our organization still want to use the Intel compiler,
> > > as they say, it produces faster code, which is then executed on clusters.
> > > Some resources on the Web [1][2] confirm this. (I am aware, that it’s
> > > heavily dependent on the actual program.)
> > >
> >
> > Do they have specific examples where icc is better for them? Or can point
> > to specific GCC PRs which impact them?
> >
> >
> > GCC versions?
> >
> > Are there specific CPU model variants of concern?
> >
> > What flags are used to compile? Some times a bit of advice can produce
> > improvements.
> >
> > Without specific examples, it is hard to set goals.
>
> If I could perhaps jump in here for a moment...  Just today I hit upon
> a series of small (in lines of code) loops that gcc can't vectorize,
> and intel vectorizes like a madman.  They all involve a lot of heavy
> use of std::vector>.  Comparisons were with gcc

Ick - C++ ;)

> 8.1, intel 2018.u1, an AMD Opteron 6386 SE, with the program running
> as sched_FIFO, mlockall, affinity set to its own core, and all
> interrupts vectored off that core.  So, as close to not-noisy as
> possible.
>
> I was surprised at the results results, but using each compiler's methods of
> dumping vectorization info, intel wins on two points:
>
> 1) It actually vectorizes
> 2) It's vectorizing output is much more easily readable
>
> Options were:
>
> gcc -Wall -ggdb3 -std=gnu++17 -flto -Ofast -march=native
>
> vs:
>
> icc -Ofast -std=gnu++14
>
>
> So, not exactly exact, but pretty close.
>
>
> So here's an example of a chunk of code (not very readable, sorry
> about that) that intel can vectorize, and subsequently make about 50%
> faster:
>
> std::size_t nLayers { input.nn.size() };
> //std::size_t ySize = std::max_element(input.nn.cbegin(),
> input.nn.cend(), [](auto a, auto b){ return a.size() < b.size();
> })->size();
> std::size_t ySize = 0;
> for (auto const & nn: input.nn)
> ySize = std::max(ySize, nn.size());
>
> float yNorm[ySize];
> for (auto & y: yNorm)
> y = 0.0f;
> for (std::size_t i = 0; i < xSize; ++i)
> yNorm[i] = xNorm[i];
> for (std::size_t layer = 0; layer < nLayers; ++layer) {
> auto & nn = input.nn[layer];
> auto & b = nn.back();
> float y[ySize];
> for (std::size_t i = 0; i < nn[0].size(); ++i) {
> y[i] = b[i];
> for (std::size_t j = 0; j < nn.size() - 1; ++j)
> y[i] += nn.at(j).at(i) * yNorm[j];
> }
> for (std::size_t i = 0; i < ySize; ++i) {
> if (layer < nLayers - 1)
> y[i] = std::max(y[i], 0.0f);
> yNorm[i] = y[i];
> }
> }
>
>
> If I was better at godbolt, I could show the asm, but I'm not.  I'm
> willing to learn, though.

A compilable testcase would be more useful - just file a bugzilla.

Richard.


Question regarding preventing optimizing out of register in expansion

2018-06-21 Thread Peryt, Sebastian
Hi,

I'd appreciate if someone could advise me in builtin expansion I'm currently 
writing.

High level description for what I want to do:

I have 2 operands in my builtin.
First I set register (reg1) with value from operand1 (op1);
Second I call my instruction (reg1 is called implicitly and updated);
At the end I'm setting operand2 (op2) with value from reg1.

Simplified implementation in i386.c I have:

reg1 = gen_reg_rtx (mode);
emit_insn (gen_rtx_SET (reg1, op1);
emit_clobber (reg1);

emit_insn (gen_myinstruction ());

emit_insn (gen_rtx_SET (op2,reg1));

Everything works fine for -O0, but when I move to higher level optimizations
setting value into reg1 (lines before emit_clobber) are optimized out.
I already tried moving emit_clobber just after assignment but it doesn't help.

Could you please suggest how I can prevent it from happening?

Thanks,
Sebastian


Re: Question regarding preventing optimizing out of register in expansion

2018-06-21 Thread Nathan Sidwell

On 06/21/2018 05:20 AM, Peryt, Sebastian wrote:

Hi,

I'd appreciate if someone could advise me in builtin expansion I'm currently 
writing.

High level description for what I want to do:

I have 2 operands in my builtin.


IIUC you're defining an UNSPEC.


First I set register (reg1) with value from operand1 (op1);
Second I call my instruction (reg1 is called implicitly and updated);


Here is your error -- NEVER have implicit register settings.  The data 
flow analysers need accurate information.




Simplified implementation in i386.c I have:

reg1 = gen_reg_rtx (mode);
emit_insn (gen_rtx_SET (reg1, op1); 
emit_clobber (reg1);


At this point reg1 is dead.  That means the previous set of reg1 from 
op1 is unneeded and can be deleted.



emit_insn (gen_myinstruction ());


This instruction has no inputs or outputs, and is not marked volatile(?) 
so can be deleted.



emit_insn (gen_rtx_SET (op2,reg1));


And this is storing a value from a dead register.

You need something like:
  rtx reg1 = force_reg (op1);
  rtx reg2 = gen_reg_rtx (mode);
  emit_insn (gen_my_insn (reg2, reg1));
  emit insn (gen_rtx_SET (op2, reg2));

your instruction should be an UNSPEC showing what the inputs and outputs 
are.  That tells the optimizers what depends on what, but the compiler 
has no clue about what the transform is.


nathan
--
Nathan Sidwell


RE: Question regarding preventing optimizing out of register in expansion

2018-06-21 Thread Peryt, Sebastian
Thank you very much! Your suggestions helped me figure this out.

Sebastian


-Original Message-
From: Nathan Sidwell [mailto:nathanmsidw...@gmail.com] On Behalf Of Nathan 
Sidwell
Sent: Thursday, June 21, 2018 1:43 PM
To: Peryt, Sebastian ; gcc@gcc.gnu.org
Subject: Re: Question regarding preventing optimizing out of register in 
expansion

On 06/21/2018 05:20 AM, Peryt, Sebastian wrote:
> Hi,
> 
> I'd appreciate if someone could advise me in builtin expansion I'm currently 
> writing.
> 
> High level description for what I want to do:
> 
> I have 2 operands in my builtin.

IIUC you're defining an UNSPEC.

> First I set register (reg1) with value from operand1 (op1); Second I 
> call my instruction (reg1 is called implicitly and updated);

Here is your error -- NEVER have implicit register settings.  The data flow 
analysers need accurate information.


> Simplified implementation in i386.c I have:
> 
> reg1 = gen_reg_rtx (mode);
> emit_insn (gen_rtx_SET (reg1, op1); 
> emit_clobber (reg1);

At this point reg1 is dead.  That means the previous set of reg1 from 
op1 is unneeded and can be deleted.

> emit_insn (gen_myinstruction ());

This instruction has no inputs or outputs, and is not marked volatile(?) 
so can be deleted.

> emit_insn (gen_rtx_SET (op2,reg1));

And this is storing a value from a dead register.

You need something like:
   rtx reg1 = force_reg (op1);
   rtx reg2 = gen_reg_rtx (mode);
   emit_insn (gen_my_insn (reg2, reg1));
   emit insn (gen_rtx_SET (op2, reg2));

your instruction should be an UNSPEC showing what the inputs and outputs 
are.  That tells the optimizers what depends on what, but the compiler 
has no clue about what the transform is.

nathan
-- 
Nathan Sidwell


Re: [GSOC] LTO dump tool project

2018-06-21 Thread Martin Liška
On 06/20/2018 07:23 PM, Hrishikesh Kulkarni wrote:
> Hi,
> 
> Please find the diff file for dumping tree type stats attached here with.
> 
> example:
> 
> $ ../stage1-build/gcc/lto1 test_hello.o -fdump-lto-tree-type-stats
> Reading object files: test_hello.o
> integer_type3
> pointer_type3
> array_type1
> function_type4
> 
> I have pushed the changes on Github repo.

Hi.

Good progress here. I would also dump statistics for GIMPLE statements.
If you configure gcc with  --enable-gather-detailed-mem-stats, you should
see:

./xgcc -B. /tmp/main.c -fmem-report -O2
...
GIMPLE statements
Kind   Stmts  Bytes
---
assignments6480
phi nodes  0  0
conditionals   8640
everything else   21   1368
---
Total 35   2488
...

Take a look at dump_gimple_statistics,   gimple_alloc_counts,
gimple_alloc_sizes.

We do the same for trees:

static uint64_t tree_code_counts[MAX_TREE_CODES];
uint64_t tree_node_counts[(int) all_kinds];
uint64_t tree_node_sizes[(int) all_kinds];

I believe the infrastructure should be shared.

Martin


> 
> Regards,
> 
> Hrishikesh
> 
> On Mon, Jun 18, 2018 at 2:15 PM, Martin Jambor  wrote:
>> Hi,
>>
>> On Sun, Jun 17 2018, Hrishikesh Kulkarni wrote:
>>> Hi,
>>>
>>> I am trying to isolate the dump tool into real lto-dump tool. I have
>>> started with the copy&paste of lto.c into lto-dump.c and done the
>>> changes to Make-lang.in and config-lang.in suggested by Martin (patch
>>> attached). However when I try to build, I get the following error:
>>>
>>> In file included from ../../gcc/gcc/lto/lto-dump.c:24:0:
>>>
>>> ../../gcc/gcc/coretypes.h:397:24: fatal error: insn-modes.h: No such
>>>
>>> file or directory
>>>
>>> compilation terminated.
>>>
>>>
>>> I am unable to find the missing dependencies and would be grateful for
>>> suggestions on how to resolve the issue.
>>
>> insn-modes.h is one of header files which are generated at build time,
>> you will find it in the gcc subdirectory of your build directory (as
>> opposed to the source directory).
>>
>> Martin



Re: [GSOC] LTO dump tool project

2018-06-21 Thread Martin Liška
Hi.

There were some questions from Hrishikesh about requested goals
of the project. Thus I would like to specify what I'm aware of:

1) symbol table
   - list all symbols
   - print details info about a symbol (symtab_node::debug)
   - print GIMPLE body of a function
 - I would like to see supporting levels what we have e.g.:

-fdump-tree-optimized-blocks
-fdump-tree-optimized-stats

It's defined in dumpfile.h:
enum dump_flag

   - I would like to see constructor of a global variable:
DECL_INITIAL (...), probably print_generic_expr will work

we can consider adding similar options seen in nm:

   --no-demangle
   Do not demangle low-level symbol names.  This is the default.

   -p
   --no-sort
   Do not bother to sort the symbols in any order; print them in the 
order encountered.
   -S
   --print-size
   Print both value and size of defined symbols for the "bsd" output 
style.  This option has no effect for object formats that do not record symbol 
sizes, unless --size-sort is also used in which case a calculated size is 
displayed.
   -r
   --reverse-sort
   Reverse the order of the sort (whether numeric or alphabetic); let 
the last come first.

   --defined-only
   Display only defined symbols for each object file.

   --size-sort
   Sort symbols by size.  For ELF objects symbol sizes are read from 
the ELF, for other object types the symbol sizes are computed as the difference 
between the value of the symbol and the value of the symbol with the next 
higher value.  If the "bsd" output
   format is used the size of the symbol is printed, rather than the 
value, and -S must be used in order both size and value to be printed.

It's just for inspiration, see man nm

2) statistics
   - GIMPLE and TREE statistics, similar to what we do for -fmem-report

3) LTO objects
   - we should list read files, archives, ... and print some stats about it

4) tree types
   - list types
   - print one (debug_tree) with different verbosity level, again 'enum 
dump_flag'

5) visualization
   - should be done, via -fdump-ipa-icf-graph, generates .dot file. Should be 
easy to use.

6) separation to lto-dump binary
   - here I can help, I'll cook a patch for it

I believe it's a series of small patches that can implement that. I hope you'll
invent even more options as you play with LTO.

Martin





CeMAT - 2018 Attendees List

2018-06-21 Thread Leslie Boyd
 

 

Hi,

I believe that you are one of the Exhibitors of upcoming event "CeMAT -
2018" held on July 24th  to 26th | Melbourne, Australia.

 

If you are interested in acquiring the attendees list of "CeMAT -
2018"please reply to this email and I shall revert back with pricing, counts
and other deliverables.

 

Thank you and I look forward to hear from you soon.

 

Best Regards,

Leslie Boyd |Inside Sales, USA & Europe| 

Email: les...@expolist.us

 

"If you don't wish to receive email from us please reply back with LEAVE
OUT"

 



gcc-7-20180621 is now available

2018-06-21 Thread gccadmin
Snapshot gcc-7-20180621 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/7-20180621/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch 
revision 261867

You'll find:

 gcc-7-20180621.tar.xzComplete GCC

  SHA256=663806e826862f80a6dccf5c111f258fb100d11f5a706a76cf7f9497e6671928
  SHA1=e73136313286a1b65d87b5c5828393bddf78084d

Diffs from 7-20180614 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: How to get GCC on par with ICC?

2018-06-21 Thread Steve Ellcey
On Wed, 2018-06-20 at 17:11 -0400, NightStrike wrote:
> 
> If I could perhaps jump in here for a moment...  Just today I hit upon
> a series of small (in lines of code) loops that gcc can't vectorize,
> and intel vectorizes like a madman.  They all involve a lot of heavy
> use of std::vector>.  Comparisons were with gcc
> 8.1, intel 2018.u1, an AMD Opteron 6386 SE, with the program running
> as sched_FIFO, mlockall, affinity set to its own core, and all
> interrupts vectored off that core.  So, as close to not-noisy as
> possible.

There are a quite a number of bugzilla reports with examples where GCC
does not vectorize a loop.  I wonder if this example is related to PR
61247.

Steve Ellcey


Re: IND: LIQ: Re: VAC

2018-06-21 Thread Dr. Andrew Cardwell
 - This mail is in HTML. Some elements may be ommited in plain text. -

Hello,
I sent you an e-mail last week but did not receive any feedback from you so I 
am sending this reminder hoping to get your response asap.
We are interested in the purchase of a product which we are hoping you can 
assist us in negotiating and procuring, This raw material is used by our 
Company in USA/UK and we are in urgent need of it as we have almost run out of 
stock. I have also sent you earlier the list of the products but am yet to hear 
from you.
I can still resend the full information to you just incase you no longer have 
it. Here is my personal email:
acardwe...@gmail.com
Thanks,
Dr. Andrew Cardwell.