Re: GIMPLE problem
On Wed, Jun 24, 2020 at 9:05 PM Gary Oblock via Gcc wrote: > > Richard, > > First off I did suspect INDIRECT_REF wasn't supported, thanks for > confirming that. > > I tried what you said in the original code before I posted > but I suspect how I went at it is the problem. I'm probably > doing something(s) in a glaringly stupid way. > > Can you spot it, because everything I'm doing makes total sense > to me? Well, read what I wrote ... > Thanks Gary > > -- > > Snippet from the code with MEM_REF: > > tree lhs_ref = build1 ( MEM_REF, field_type, field_addr); MEM_REF has two operands, the second is a byte offset plus encodes TBAA information. > final_set = gimple_build_assign( lhs_ref, field_val_temp); > > field_type is a double * > > field_addr is an address within an malloced array of doubles. > > -- > > Snippet from the code with ARRAY_REF: > > tree rhs_ref = build4 ( ARRAY_REF, field_type, field_arry_addr, index, > NULL_TREE, NULL_TREE); you need to dereference field_arry_addr to produce an array you can reference with the ARRAY_REF. tree arr = build2 (MEM_REF, array_type, field_arry_addr, build_int_cst (ptr_type_node, 0)); rhs_ref = build4 (ARRAY_REF, field_type, arr, index, NULL, NULL); > temp_set = gimple_build_assign( field_val_temp, rhs_ref); > > field type is double > > field_arry_addr is the starting address of an array of malloced doubles. > > index is a pointer_rep (an integer) > details: > tree pointer_rep = make_node ( INTEGER_TYPE); > TYPE_PRECISION (pointer_rep) = TYPE_PRECISION (pointer_sized_int_node); >
Customized coverage instrumentation for multiple C files
Hello, I am working on a basic block coverage counter which mimics -fsanitize-coverage=trace-pc but has more features. My problem is that when instrumenting multiple C files (e.g., test1.c test2.c test3.c), I want to generate correspondingly three coverage logs (test1.log, test2.log, test3.log), so on and so forth. Therefore, my question is 1) how to figure out the instrumented source code file name in GIMPLE plugins (my plugins is after the "optimized" pass), and 2) I want to keep all covered basic block info in memory and write log file *only once* right before finish the profiling (i.e., the instrumented program finish executing the program and is about to exit). Can I somehow set a callback at that point and then flush the coverage record into files? I don't know how/where to "set a callback" like that, if possible at all. Thank you very much. Best, Shuai
Re: Modula-2 into the GCC tree on master?
David Edelsohn writes: > Hi, Gaius > > Thanks for your diligent effort to complete this port of Modula-2 and > prepare it for inclusion in GCC. I have forwarded the proposal to the > GCC Steering Committee. > > Thanks, David Hi David, many thanks for forwarding the proposal - always great fun to work with GCC regards, Gaius
TLS Implementation Across Architectures
Hi RTEMS supports over 15 processor architectures and we would like to ensure that TLS is supported on all rather than just a handful of popular ones (arm, x86, powerpc, sparc, etc). I know of Ulrich Drepper's document ( https://www.akkadia.org/drepper/tls.pdf) but it is a few years old and covers only a subset of architectures. Is TLS supported on all architectures in GCC? Is there some documentation on how it is implemented on architectures not in Ulrich's paper? Or some guidance on how to extract this information from the GCC source? Thanks. --joel
Hoisting DFmode loads out of loops..
I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit data-path between registers and memory; looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the move of a double constant to memory into multiple moves (4 in fact, because I only have a 16-bit immediate mode.) The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”. Is there some other trick I need get the constant hoisted. I have already set the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns) Alan Lehotsky https://codegentllc.com
Re: TLS Implementation Across Architectures
On 6/25/20 2:34 PM, Joel Sherrill wrote: Hi RTEMS supports over 15 processor architectures and we would like to ensure that TLS is supported on all rather than just a handful of popular ones (arm, x86, powerpc, sparc, etc). I know of Ulrich Drepper's document ( https://www.akkadia.org/drepper/tls.pdf) but it is a few years old and covers only a subset of architectures. Is TLS supported on all architectures in GCC? Is there some documentation on how it is implemented on architectures not in Ulrich's paper? Or some guidance on how to extract this information from the GCC source? The ARM (32) abi has some extensions to that, which originally came from Alex Oliva and then I implemented (The GNU2 TLS stuff). I think the smarts is in the linker for that though. IMHO bfd might be a better source of information than gcc. natan -- Nathan Sidwell
Re: TLS Implementation Across Architectures
On Thu, Jun 25, 2020 at 2:54 PM Nathan Sidwell wrote: > On 6/25/20 2:34 PM, Joel Sherrill wrote: > > Hi > > > > RTEMS supports over 15 processor architectures and we would like to > ensure > > that TLS is supported on all rather than just a handful of popular ones > > (arm, x86, powerpc, sparc, etc). I know of Ulrich Drepper's document ( > > https://www.akkadia.org/drepper/tls.pdf) but it is a few years old and > > covers only a subset of architectures. > > > > Is TLS supported on all architectures in GCC? > > > > Is there some documentation on how it is implemented on architectures not > > in Ulrich's paper? Or some guidance on how to extract this information > from > > the GCC source? > > The ARM (32) abi has some extensions to that, which originally came from > Alex Oliva and then I implemented (The GNU2 TLS stuff). I think the > smarts is in the linker for that though. > > IMHO bfd might be a better source of information than gcc. > BFD would know the section and attribute part but isn't gcc responsible for generating the code to dereference into it? It could be a specific base register or an invalid instruction fault (MIPS) or something else. I'm wondering how one knows what that magic to look up the base is for a specific architecture. Or if there is an easy way for a target to change say the MIPS bad instruction to a subroutine call? It would seem that GCC would have an architecture independent base lookup alternative. --joel --joel > > natan > -- > Nathan Sidwell >
Re: TLS Implementation Across Architectures
On Thu, Jun 25, 2020 at 1:34 PM Joel Sherrill wrote: > > On Thu, Jun 25, 2020 at 2:54 PM Nathan Sidwell wrote: > > > On 6/25/20 2:34 PM, Joel Sherrill wrote: > > > Hi > > > > > > RTEMS supports over 15 processor architectures and we would like to > > ensure > > > that TLS is supported on all rather than just a handful of popular ones > > > (arm, x86, powerpc, sparc, etc). I know of Ulrich Drepper's document ( > > > https://www.akkadia.org/drepper/tls.pdf) but it is a few years old and > > > covers only a subset of architectures. > > > > > > Is TLS supported on all architectures in GCC? > > > > > > Is there some documentation on how it is implemented on architectures not > > > in Ulrich's paper? Or some guidance on how to extract this information > > from > > > the GCC source? > > > > The ARM (32) abi has some extensions to that, which originally came from > > Alex Oliva and then I implemented (The GNU2 TLS stuff). I think the > > smarts is in the linker for that though. > > > > IMHO bfd might be a better source of information than gcc. > > > > BFD would know the section and attribute part but isn't gcc responsible for > generating the code to dereference into it? It could be a specific base > register > or an invalid instruction fault (MIPS) or something else. I'm wondering how > one knows what that magic to look up the base is for a specific > architecture. > > Or if there is an easy way for a target to change say the MIPS bad > instruction > to a subroutine call? It would seem that GCC would have an architecture > independent base lookup alternative. NOTE MIPS32/64r3 says that system register is implemented. I know of a few implementations that implement that register as a register (Octeon 2 and Octeon3 for an example). Thanks, Andrew > > --joel > > --joel > > > > > natan > > -- > > Nathan Sidwell > >
gcc-8-20200625 is now available
Snapshot gcc-8-20200625 is now available on https://gcc.gnu.org/pub/gcc/snapshots/8-20200625/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 8 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-8 revision e39a8763c4dead0f448981d9488d7b264db9da55 You'll find: gcc-8-20200625.tar.xzComplete GCC SHA256=42902a890c439f669d634072dda9f81526a6fa6ff8402133385b754df82db951 SHA1=d32fce3dc58568e79edda0925c03b66a28d821c8 Diffs from 8-20200618 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Hoisting DFmode loads out of loops..
On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote: > I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit > data-path between registers and memory; > > looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the > move of a double constant to memory into multiple moves (4 in fact, because I > only have a 16-bit immediate mode.) > > The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”. > > Is there some other trick I need get the constant hoisted. I have already > set the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns) Hi Alan, it's been a long time... We'd probably need to set the RTL. A variety of things can get in the way of LICM. For example, I'd expect subregs to be problematical because they can look like RMW operations. jeff
Re: Hoisting DFmode loads out of loops..
On Jun 25, 2020, at 6:37 PM, Jeff Law mailto:l...@redhat.com>> wrote: On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote: I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit data-path between registers and memory; looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the move of a double constant to memory into multiple moves (4 in fact, because I only have a 16-bit immediate mode.) The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”. Is there some other trick I need get the constant hoisted. I have already set the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns) Hi Alan, it's been a long time... We'd probably need to set the RTL. A variety of things can get in the way of LICM. For example, I'd expect subregs to be problematical because they can look like RMW operations. jeff Hello to you too, Jeff…. I’ve been lurking for the last decade or so, last port I actually did was was GCC 4 based, so lots of new stuff to try and wrap my head around. I certainly am grateful for anybody with suggestions as to how to track down this problem (I’m not terribly eager to do a parallel stepping thru a x86 gcc in parallel with my port to see where they diverge in the loop-invariant recognition.) Although in crafting this expanded email, I see that the x86 has already decided to store the constant 18.4242 in the .rodata section by the start of loop-invariance so there’s a (set (reg:DF…. ) (mem:DF (symbol_ref ….))) and I bet that’s far easier to move out of the loop than it would be to split the original (set (mem:DF…) (const_double:DF ….)) — Al == Source code is void f (double *a) { int i; for (i = 0; i < 100; i++_ a[i] = 18.4242; } == Here’s the dump from loop-9.c.252r.loop2-invariant (compiled -O1) ;; Function f (f, funcdef_no=0, decl_uid=1458, cgraph_uid=0, symbol_order=0) *starting processing of loop 1 ** starting the processing of deferred insns ending the processing of deferred insns setting blocks to analyze 3, 5 starting the processing of deferred insns ending the processing of deferred insns df_analyze called df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( 0.33) df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( 0.33) df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 3 ( 0.5) starting region dump f Dataflow summary: def_info->table_size = 3, use_info->table_size = 23 ;; invalidated by call 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 [d7] 8 [d8] 9 [d9] 14 [d14] 15 [d15] 16 [a0] 19 [a3] 20 [a4] 24 [acc0_hi] 25 [acc0_lo] 26 [acc1_hi] 27 [acc1_lo] 28 [source3] 30 [cc] 31 [int_set0] 32 [int_set1] 33 [int_clr0] 34 [int_clr1] 35 [scratchpad0] 36 [scratchpad1] 37 [scratchpad2] 38 [scratchpad3] ;; hardware regs used 23 [sp] 29 [arg] 39 [sfp] ;; regular block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp] ;; eh block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp] ;; entry block defs 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 [d7] 8 [d8] 9 [d9] 21 [a5] 22 [a6] 23 [sp] 29 [arg] 39 [sfp] ;; exit block uses 22 [a6] 23 [sp] 39 [sfp] ;; regs ever live 0 [d0] 30 [cc] ;; ref usage r0={1d,1u} r1={1d} r2={1d} r3={1d} r4={1d} r5={1d} r6={1d} r7={1d} r8={1d} r9={1d} r21={1d} r22={1d,5u} r23={1d,5u} r29={1d,4u} r30={3d,1u} r39={1d,5u} r46={2d,4u} r48={1d,1u} ;;total ref usage 47{21d,26u,0e} in 6{6 regular + 0 call} insns. ;; Reaching defs: ;; sparse invalidated ;; dense invalidated 0, 1 ;; reg->defs[] map: 30[0,1] 46[2,2] ;; bb 3 artificial_defs: { } ;; bb 3 artificial_uses: { u7(22){ }u8(23){ }u9(29){ }u10(39){ }} ;; lr in 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48 ;; lr use 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48 ;; lr def 30 [cc] 46 ;; live in 46 ;; live gen 30 [cc] 46 ;; live kill 30 [cc] ;; rd in (1) 46[2] ;; rd gen (2) 30[1],46[2] ;; rd kill (3) 30[0,1],46[2] ;; UD chains for artificial uses at top (code_label 11 7 8 3 2 (nil) [0 uses]) (note 8 11 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK) ;; UD chains for insn luid 0 uid 9 ;; reg 46 { d2(bb 3 insn 10) } (insn 9 8 10 3 (set (mem:DF (reg:SI 46 [ ivtmp___6 ]) [0 MEM[base: _15, offset: 0B]+0 S8 A32]) (const_double:DF 1.842419990222931955941021442413330078125e+1 [0x0.9364c2f837b4ap+5])) "loop-9.c":9 19 {movdf} (nil)) ;; UD chains for insn luid 1 uid 10 ;; reg 46 { d2(bb 3 insn 10) } (insn 10 9 12 3 (parallel [ (set (reg:SI 46 [ ivtmp___6 ]) (plus:SI (reg:SI 46 [ ivtmp___6 ]) (const_int 8 [0x8]))) (clobber (reg:CC 30 cc)) ]) 81 {addsi3_1v5} (expr_list:REG_UNUSED (reg:CC 30 cc) (nil))) ;; UD chains for insn luid 2 uid 12 ;; reg 46 { d2(bb 3 insn 10) } ;; reg 48 { } (insn 12 10 13 3 (set (reg:CCWZ 30 cc) (compare:CCWZ (reg:SI 46 [ ivtmp___6 ]) (reg:SI 48 [ _17 ]))) "loop-9.c":8
Re: Hoisting DFmode loads out of loops..
On June 26, 2020 3:24:24 AM GMT+02:00, Alan Lehotsky wrote: >On Jun 25, 2020, at 6:37 PM, Jeff Law >mailto:l...@redhat.com>> wrote: > >On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote: >I’m working on a GCC 8.3 port to a load/store architecture with a >32-bit data-path between registers and memory; > >looking at the gcc.dg/loop-9.c test, I fail to pass because I have >split the move of a double constant to memory into multiple moves (4 in >fact, because I only have a 16-bit immediate mode.) > >The (define_insn_and_split “movdf” …) is conditioned on >“reload_completed”. > >Is there some other trick I need get the constant hoisted. I have >already set the rtx cost of the CONST_DOUBLE ridiculously high (like 10 >insns) >Hi Alan, it's been a long time... > >We'd probably need to set the RTL. A variety of things can get in the >way of >LICM. For example, I'd expect subregs to be problematical because they >can look >like RMW operations. > >jeff > > > >Hello to you too, Jeff…. I’ve been lurking for the last decade or so, >last port I actually did was was GCC 4 based, so lots of new stuff to >try and wrap my head around. I certainly am grateful for anybody with >suggestions as to how to track down this problem (I’m not terribly >eager to do a >parallel stepping thru a x86 gcc in parallel with my port to see where >they diverge in the loop-invariant recognition.) > >Although in crafting this expanded email, I see that the x86 has >already decided to store the constant 18.4242 in the .rodata section by >the start of loop-invariance so there’s a > >(set (reg:DF…. ) (mem:DF (symbol_ref ….))) > >and I bet that’s far easier to move out of the loop than it would be to >split the original > >(set (mem:DF…) (const_double:DF ….)) Immediate operands are never moved or CSEd by either RTL nor GIMPLE so if you do not have const_double immediates the best thing to do is not make them legitimate. Richard. >— Al > >== > >Source code is > >void f (double *a) >{ >int i; >for (i = 0; i < 100; i++_ >a[i] = 18.4242; >} >== > >Here’s the dump from loop-9.c.252r.loop2-invariant (compiled -O1) > > >;; Function f (f, funcdef_no=0, decl_uid=1458, cgraph_uid=0, >symbol_order=0) > >*starting processing of loop 1 ** >starting the processing of deferred insns >ending the processing of deferred insns >setting blocks to analyze 3, 5 >starting the processing of deferred insns >ending the processing of deferred insns >df_analyze called >df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( >0.33) >df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( >0.33) >df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 3 ( >0.5) > > >starting region dump > > >f > >Dataflow summary: >def_info->table_size = 3, use_info->table_size = 23 >;; invalidated by call 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 >[d6] 7 [d7] 8 [d8] 9 [d9] 14 [d14] 15 [d15] 16 [a0] 19 [a3] 20 [a4] 24 >[acc0_hi] 25 [acc0_lo] 26 [acc1_hi] 27 [acc1_lo] 28 [source3] 30 [cc] >31 [int_set0] 32 [int_set1] 33 [int_clr0] 34 [int_clr1] 35 >[scratchpad0] 36 [scratchpad1] 37 [scratchpad2] 38 [scratchpad3] >;; hardware regs used 23 [sp] 29 [arg] 39 [sfp] >;; regular block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp] >;; eh block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp] >;; entry block defs 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 >[d7] 8 [d8] 9 [d9] 21 [a5] 22 [a6] 23 [sp] 29 [arg] 39 [sfp] >;; exit block uses 22 [a6] 23 [sp] 39 [sfp] >;; regs ever live 0 [d0] 30 [cc] >;; ref usage r0={1d,1u} r1={1d} r2={1d} r3={1d} r4={1d} r5={1d} >r6={1d} r7={1d} r8={1d} r9={1d} r21={1d} r22={1d,5u} r23={1d,5u} >r29={1d,4u} r30={3d,1u} r39={1d,5u} r46={2d,4u} r48={1d,1u} >;;total ref usage 47{21d,26u,0e} in 6{6 regular + 0 call} insns. >;; Reaching defs: >;; sparse invalidated >;; dense invalidated 0, 1 >;; reg->defs[] map: 30[0,1] 46[2,2] >;; bb 3 artificial_defs: { } >;; bb 3 artificial_uses: { u7(22){ }u8(23){ }u9(29){ }u10(39){ }} >;; lr in 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48 >;; lr use 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48 >;; lr def 30 [cc] 46 >;; live in 46 >;; live gen 30 [cc] 46 >;; live kill 30 [cc] >;; rd in (1) 46[2] >;; rd gen (2) 30[1],46[2] >;; rd kill (3) 30[0,1],46[2] >;; UD chains for artificial uses at top > >(code_label 11 7 8 3 2 (nil) [0 uses]) >(note 8 11 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK) >;; UD chains for insn luid 0 uid 9 >;; reg 46 { d2(bb 3 insn 10) } >(insn 9 8 10 3 (set (mem:DF (reg:SI 46 [ ivtmp___6 ]) [0 MEM[base: _15, >offset: 0B]+0 S8 A32]) >(const_double:DF 1.842419990222931955941021442413330078125e+1 >[0x0.9364c2f837b4ap+5])) "loop-9.c":9 19 {movdf} > (nil)) >;; UD chains for insn luid 1 uid 10 >;; reg 46 { d2(bb 3 insn 10) } >(insn 10 9 12 3 (parallel [ >(set (reg:SI 46 [ ivtmp___6 ]) >(plus:SI (reg:SI 46 [ ivtmp___6 ]) >(const_int 8 [0x8]))) >