Re: Apparent deeply-nested missing error bug with gcc 7.3
UPDATE: My bad. The original compiler feature detection on the test suite was broken/not matching the correct libstdc++ versions. Hence the emplace_back/emplace_front tests were not running. Told you so :-P However, it does surprise me that GCC doesn't check this code. It's a dependent expression so can't be fully checked until instantiated -- and as you've discovered, it wasn't being instantiated. There's a trade-off between compilation speed and doing additional work to check uninstantiated templates with arbitrarily complex expressions in them. Yeah, I get it - saves a lot of time with heavily-templated setups and large projects.
Re: ICE in a testcase, not sure about solution
On Wed, Jun 20, 2018 at 8:26 PM Paul Koning wrote: > > I'm running into an ICE in the GIMPLE phase, for gcc.c-torture/compile/386.c, > on pdp11 -mint32. That's an oddball where int is 32 bits (due to the flag) > but Pmode is 16 bits (HImode). > > The ICE message is: > > ../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c: In function ‘main’: > ../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c:24:1: error: invalid > types in nop conversion > } > ^ > int > int * > b_3 = (int) &i; > during GIMPLE pass: einline > ../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c:24:1: internal compiler > error: verify_gimple failed > > The offending code snippet is (I think): > > main () > { > int i; > foobar (i, &i); > } > > > foobar (a, b) > { > int c; > > c = a % b; > a = a / b; > return a + b; > } > > where the foobar(i, &i) call passes an int* to a (defaulted) int function > parameter. Is there an assumption that sizeof (int*) >= sizeof(int)? > > Any idea where to look? It only shows up with -mint32; if int is 16 bits all > is well. I'm not used to my target breaking things before I even get to > RTL... Inlining allows some type mismatches mainly because at callers FEs may have done promotion while callees usually see unpromoted PARM_DECLs. The inliner then inserts required conversions. In this case we do not allow widening conversions from pointers without intermediate conversions to integers. The following ICEs in a similar way on x86 (with -m32): main () { int i; foobar (i, &i); } foobar (int a, long long b) { int c; c = a % b; a = a / b; return a + b; } so the inliner should avoid inlining in this case or alternatively simulate what the target does (converting according to POINTERS_EXTEND_UNSIGNED). A fix could be as simple as diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 4568e1e2b57..8476c223e4f 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -2358,7 +2358,9 @@ fold_convertible_p (const_tree type, const_tree arg) case INTEGER_TYPE: case ENUMERAL_TYPE: case BOOLEAN_TYPE: case POINTER_TYPE: case REFERENCE_TYPE: case OFFSET_TYPE: - return (INTEGRAL_TYPE_P (orig) || POINTER_TYPE_P (orig) + return (INTEGRAL_TYPE_P (orig) + || (POINTER_TYPE_P (orig) + && TYPE_PRECISION (type) <= TYPE_PRECISION (orig)) || TREE_CODE (orig) == OFFSET_TYPE); case REAL_TYPE: which avoids the inlining (if that is the desired solution). Can you open a PR please? Thanks, Richard. > paul >
Re: How to get GCC on par with ICC?
On Wed, Jun 20, 2018 at 11:12 PM NightStrike wrote: > > On Wed, Jun 6, 2018 at 11:57 AM, Joel Sherrill wrote: > > > > On Wed, Jun 6, 2018 at 10:51 AM, Paul Menzel < > > pmenzel+gcc.gnu@molgen.mpg.de> wrote: > > > > > Dear GCC folks, > > > > > > > > > Some scientists in our organization still want to use the Intel compiler, > > > as they say, it produces faster code, which is then executed on clusters. > > > Some resources on the Web [1][2] confirm this. (I am aware, that it’s > > > heavily dependent on the actual program.) > > > > > > > Do they have specific examples where icc is better for them? Or can point > > to specific GCC PRs which impact them? > > > > > > GCC versions? > > > > Are there specific CPU model variants of concern? > > > > What flags are used to compile? Some times a bit of advice can produce > > improvements. > > > > Without specific examples, it is hard to set goals. > > If I could perhaps jump in here for a moment... Just today I hit upon > a series of small (in lines of code) loops that gcc can't vectorize, > and intel vectorizes like a madman. They all involve a lot of heavy > use of std::vector>. Comparisons were with gcc Ick - C++ ;) > 8.1, intel 2018.u1, an AMD Opteron 6386 SE, with the program running > as sched_FIFO, mlockall, affinity set to its own core, and all > interrupts vectored off that core. So, as close to not-noisy as > possible. > > I was surprised at the results results, but using each compiler's methods of > dumping vectorization info, intel wins on two points: > > 1) It actually vectorizes > 2) It's vectorizing output is much more easily readable > > Options were: > > gcc -Wall -ggdb3 -std=gnu++17 -flto -Ofast -march=native > > vs: > > icc -Ofast -std=gnu++14 > > > So, not exactly exact, but pretty close. > > > So here's an example of a chunk of code (not very readable, sorry > about that) that intel can vectorize, and subsequently make about 50% > faster: > > std::size_t nLayers { input.nn.size() }; > //std::size_t ySize = std::max_element(input.nn.cbegin(), > input.nn.cend(), [](auto a, auto b){ return a.size() < b.size(); > })->size(); > std::size_t ySize = 0; > for (auto const & nn: input.nn) > ySize = std::max(ySize, nn.size()); > > float yNorm[ySize]; > for (auto & y: yNorm) > y = 0.0f; > for (std::size_t i = 0; i < xSize; ++i) > yNorm[i] = xNorm[i]; > for (std::size_t layer = 0; layer < nLayers; ++layer) { > auto & nn = input.nn[layer]; > auto & b = nn.back(); > float y[ySize]; > for (std::size_t i = 0; i < nn[0].size(); ++i) { > y[i] = b[i]; > for (std::size_t j = 0; j < nn.size() - 1; ++j) > y[i] += nn.at(j).at(i) * yNorm[j]; > } > for (std::size_t i = 0; i < ySize; ++i) { > if (layer < nLayers - 1) > y[i] = std::max(y[i], 0.0f); > yNorm[i] = y[i]; > } > } > > > If I was better at godbolt, I could show the asm, but I'm not. I'm > willing to learn, though. A compilable testcase would be more useful - just file a bugzilla. Richard.
Question regarding preventing optimizing out of register in expansion
Hi, I'd appreciate if someone could advise me in builtin expansion I'm currently writing. High level description for what I want to do: I have 2 operands in my builtin. First I set register (reg1) with value from operand1 (op1); Second I call my instruction (reg1 is called implicitly and updated); At the end I'm setting operand2 (op2) with value from reg1. Simplified implementation in i386.c I have: reg1 = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (reg1, op1); emit_clobber (reg1); emit_insn (gen_myinstruction ()); emit_insn (gen_rtx_SET (op2,reg1)); Everything works fine for -O0, but when I move to higher level optimizations setting value into reg1 (lines before emit_clobber) are optimized out. I already tried moving emit_clobber just after assignment but it doesn't help. Could you please suggest how I can prevent it from happening? Thanks, Sebastian
Re: Question regarding preventing optimizing out of register in expansion
On 06/21/2018 05:20 AM, Peryt, Sebastian wrote: Hi, I'd appreciate if someone could advise me in builtin expansion I'm currently writing. High level description for what I want to do: I have 2 operands in my builtin. IIUC you're defining an UNSPEC. First I set register (reg1) with value from operand1 (op1); Second I call my instruction (reg1 is called implicitly and updated); Here is your error -- NEVER have implicit register settings. The data flow analysers need accurate information. Simplified implementation in i386.c I have: reg1 = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (reg1, op1); emit_clobber (reg1); At this point reg1 is dead. That means the previous set of reg1 from op1 is unneeded and can be deleted. emit_insn (gen_myinstruction ()); This instruction has no inputs or outputs, and is not marked volatile(?) so can be deleted. emit_insn (gen_rtx_SET (op2,reg1)); And this is storing a value from a dead register. You need something like: rtx reg1 = force_reg (op1); rtx reg2 = gen_reg_rtx (mode); emit_insn (gen_my_insn (reg2, reg1)); emit insn (gen_rtx_SET (op2, reg2)); your instruction should be an UNSPEC showing what the inputs and outputs are. That tells the optimizers what depends on what, but the compiler has no clue about what the transform is. nathan -- Nathan Sidwell
RE: Question regarding preventing optimizing out of register in expansion
Thank you very much! Your suggestions helped me figure this out. Sebastian -Original Message- From: Nathan Sidwell [mailto:nathanmsidw...@gmail.com] On Behalf Of Nathan Sidwell Sent: Thursday, June 21, 2018 1:43 PM To: Peryt, Sebastian ; gcc@gcc.gnu.org Subject: Re: Question regarding preventing optimizing out of register in expansion On 06/21/2018 05:20 AM, Peryt, Sebastian wrote: > Hi, > > I'd appreciate if someone could advise me in builtin expansion I'm currently > writing. > > High level description for what I want to do: > > I have 2 operands in my builtin. IIUC you're defining an UNSPEC. > First I set register (reg1) with value from operand1 (op1); Second I > call my instruction (reg1 is called implicitly and updated); Here is your error -- NEVER have implicit register settings. The data flow analysers need accurate information. > Simplified implementation in i386.c I have: > > reg1 = gen_reg_rtx (mode); > emit_insn (gen_rtx_SET (reg1, op1); > emit_clobber (reg1); At this point reg1 is dead. That means the previous set of reg1 from op1 is unneeded and can be deleted. > emit_insn (gen_myinstruction ()); This instruction has no inputs or outputs, and is not marked volatile(?) so can be deleted. > emit_insn (gen_rtx_SET (op2,reg1)); And this is storing a value from a dead register. You need something like: rtx reg1 = force_reg (op1); rtx reg2 = gen_reg_rtx (mode); emit_insn (gen_my_insn (reg2, reg1)); emit insn (gen_rtx_SET (op2, reg2)); your instruction should be an UNSPEC showing what the inputs and outputs are. That tells the optimizers what depends on what, but the compiler has no clue about what the transform is. nathan -- Nathan Sidwell
Re: [GSOC] LTO dump tool project
On 06/20/2018 07:23 PM, Hrishikesh Kulkarni wrote: > Hi, > > Please find the diff file for dumping tree type stats attached here with. > > example: > > $ ../stage1-build/gcc/lto1 test_hello.o -fdump-lto-tree-type-stats > Reading object files: test_hello.o > integer_type3 > pointer_type3 > array_type1 > function_type4 > > I have pushed the changes on Github repo. Hi. Good progress here. I would also dump statistics for GIMPLE statements. If you configure gcc with --enable-gather-detailed-mem-stats, you should see: ./xgcc -B. /tmp/main.c -fmem-report -O2 ... GIMPLE statements Kind Stmts Bytes --- assignments6480 phi nodes 0 0 conditionals 8640 everything else 21 1368 --- Total 35 2488 ... Take a look at dump_gimple_statistics, gimple_alloc_counts, gimple_alloc_sizes. We do the same for trees: static uint64_t tree_code_counts[MAX_TREE_CODES]; uint64_t tree_node_counts[(int) all_kinds]; uint64_t tree_node_sizes[(int) all_kinds]; I believe the infrastructure should be shared. Martin > > Regards, > > Hrishikesh > > On Mon, Jun 18, 2018 at 2:15 PM, Martin Jambor wrote: >> Hi, >> >> On Sun, Jun 17 2018, Hrishikesh Kulkarni wrote: >>> Hi, >>> >>> I am trying to isolate the dump tool into real lto-dump tool. I have >>> started with the copy&paste of lto.c into lto-dump.c and done the >>> changes to Make-lang.in and config-lang.in suggested by Martin (patch >>> attached). However when I try to build, I get the following error: >>> >>> In file included from ../../gcc/gcc/lto/lto-dump.c:24:0: >>> >>> ../../gcc/gcc/coretypes.h:397:24: fatal error: insn-modes.h: No such >>> >>> file or directory >>> >>> compilation terminated. >>> >>> >>> I am unable to find the missing dependencies and would be grateful for >>> suggestions on how to resolve the issue. >> >> insn-modes.h is one of header files which are generated at build time, >> you will find it in the gcc subdirectory of your build directory (as >> opposed to the source directory). >> >> Martin
Re: [GSOC] LTO dump tool project
Hi. There were some questions from Hrishikesh about requested goals of the project. Thus I would like to specify what I'm aware of: 1) symbol table - list all symbols - print details info about a symbol (symtab_node::debug) - print GIMPLE body of a function - I would like to see supporting levels what we have e.g.: -fdump-tree-optimized-blocks -fdump-tree-optimized-stats It's defined in dumpfile.h: enum dump_flag - I would like to see constructor of a global variable: DECL_INITIAL (...), probably print_generic_expr will work we can consider adding similar options seen in nm: --no-demangle Do not demangle low-level symbol names. This is the default. -p --no-sort Do not bother to sort the symbols in any order; print them in the order encountered. -S --print-size Print both value and size of defined symbols for the "bsd" output style. This option has no effect for object formats that do not record symbol sizes, unless --size-sort is also used in which case a calculated size is displayed. -r --reverse-sort Reverse the order of the sort (whether numeric or alphabetic); let the last come first. --defined-only Display only defined symbols for each object file. --size-sort Sort symbols by size. For ELF objects symbol sizes are read from the ELF, for other object types the symbol sizes are computed as the difference between the value of the symbol and the value of the symbol with the next higher value. If the "bsd" output format is used the size of the symbol is printed, rather than the value, and -S must be used in order both size and value to be printed. It's just for inspiration, see man nm 2) statistics - GIMPLE and TREE statistics, similar to what we do for -fmem-report 3) LTO objects - we should list read files, archives, ... and print some stats about it 4) tree types - list types - print one (debug_tree) with different verbosity level, again 'enum dump_flag' 5) visualization - should be done, via -fdump-ipa-icf-graph, generates .dot file. Should be easy to use. 6) separation to lto-dump binary - here I can help, I'll cook a patch for it I believe it's a series of small patches that can implement that. I hope you'll invent even more options as you play with LTO. Martin
CeMAT - 2018 Attendees List
Hi, I believe that you are one of the Exhibitors of upcoming event "CeMAT - 2018" held on July 24th to 26th | Melbourne, Australia. If you are interested in acquiring the attendees list of "CeMAT - 2018"please reply to this email and I shall revert back with pricing, counts and other deliverables. Thank you and I look forward to hear from you soon. Best Regards, Leslie Boyd |Inside Sales, USA & Europe| Email: les...@expolist.us "If you don't wish to receive email from us please reply back with LEAVE OUT"
gcc-7-20180621 is now available
Snapshot gcc-7-20180621 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20180621/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 261867 You'll find: gcc-7-20180621.tar.xzComplete GCC SHA256=663806e826862f80a6dccf5c111f258fb100d11f5a706a76cf7f9497e6671928 SHA1=e73136313286a1b65d87b5c5828393bddf78084d Diffs from 7-20180614 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: How to get GCC on par with ICC?
On Wed, 2018-06-20 at 17:11 -0400, NightStrike wrote: > > If I could perhaps jump in here for a moment... Just today I hit upon > a series of small (in lines of code) loops that gcc can't vectorize, > and intel vectorizes like a madman. They all involve a lot of heavy > use of std::vector>. Comparisons were with gcc > 8.1, intel 2018.u1, an AMD Opteron 6386 SE, with the program running > as sched_FIFO, mlockall, affinity set to its own core, and all > interrupts vectored off that core. So, as close to not-noisy as > possible. There are a quite a number of bugzilla reports with examples where GCC does not vectorize a loop. I wonder if this example is related to PR 61247. Steve Ellcey
Re: IND: LIQ: Re: VAC
- This mail is in HTML. Some elements may be ommited in plain text. - Hello, I sent you an e-mail last week but did not receive any feedback from you so I am sending this reminder hoping to get your response asap. We are interested in the purchase of a product which we are hoping you can assist us in negotiating and procuring, This raw material is used by our Company in USA/UK and we are in urgent need of it as we have almost run out of stock. I have also sent you earlier the list of the products but am yet to hear from you. I can still resend the full information to you just incase you no longer have it. Here is my personal email: acardwe...@gmail.com Thanks, Dr. Andrew Cardwell.