Re: GIMPLE problem

2020-06-25 Thread Richard Biener via Gcc
On Wed, Jun 24, 2020 at 9:05 PM Gary Oblock via Gcc  wrote:
>
> Richard,
>
> First off I did suspect INDIRECT_REF wasn't supported, thanks for
> confirming that.
>
> I tried what you said in the original code before I posted
> but I suspect how I went at it is the problem. I'm probably
> doing something(s) in a glaringly stupid way.
>
> Can you spot it, because everything I'm doing makes total sense
> to me?

Well, read what I wrote ...

> Thanks Gary
>
> --
>
> Snippet from the code with MEM_REF:
>
>   tree lhs_ref = build1 ( MEM_REF, field_type, field_addr);

MEM_REF has two operands, the second is a byte offset
plus encodes TBAA information.

>   final_set = gimple_build_assign( lhs_ref, field_val_temp);
>
> field_type is a double *
>
> field_addr is an address within an malloced array of doubles.
>
> --
>
> Snippet from the code with ARRAY_REF:
>
>   tree rhs_ref = build4 ( ARRAY_REF, field_type, field_arry_addr, index,
>   NULL_TREE, NULL_TREE);

you need to dereference field_arry_addr to produce an array you
can reference with the ARRAY_REF.

 tree arr =  build2 (MEM_REF, array_type, field_arry_addr,
build_int_cst (ptr_type_node, 0));
 rhs_ref = build4 (ARRAY_REF, field_type, arr, index, NULL, NULL);

>   temp_set = gimple_build_assign( field_val_temp, rhs_ref);
>
> field type is double
>
> field_arry_addr is the starting address of an array of malloced doubles.
>
> index is a pointer_rep (an integer)
>   details:
> tree pointer_rep = make_node ( INTEGER_TYPE);
> TYPE_PRECISION (pointer_rep) = TYPE_PRECISION (pointer_sized_int_node);
>


Customized coverage instrumentation for multiple C files

2020-06-25 Thread Shuai Wang via Gcc
Hello,

I am working on a basic block coverage counter which
mimics -fsanitize-coverage=trace-pc but has more features. My problem is
that when instrumenting multiple C files (e.g., test1.c test2.c test3.c), I
want to generate correspondingly three coverage logs (test1.log, test2.log,
test3.log), so on and so forth.

Therefore, my question is 1) how to figure out the instrumented source code
file name in GIMPLE plugins (my plugins is after the "optimized" pass), and
2) I want to keep all covered basic block info in memory and write log
file *only
once* right before finish the profiling (i.e., the instrumented program
finish executing the program and is about to exit). Can I somehow set a
callback at that point and then flush the coverage record into files? I
don't know how/where to "set a callback" like that, if possible at all.

Thank you very much.

Best,
Shuai


Re: Modula-2 into the GCC tree on master?

2020-06-25 Thread Gaius Mulley via Gcc
David Edelsohn  writes:

> Hi, Gaius
>
> Thanks for your diligent effort to complete this port of Modula-2 and
> prepare it for inclusion in GCC.  I have forwarded the proposal to the
> GCC Steering Committee.
>
> Thanks, David

Hi David,

many thanks for forwarding the proposal - always great fun to work with
GCC

regards,
Gaius


TLS Implementation Across Architectures

2020-06-25 Thread Joel Sherrill
Hi

RTEMS supports over 15 processor architectures and we would like to ensure
that TLS is supported on all  rather than just a handful of popular ones
(arm, x86, powerpc, sparc, etc). I know of Ulrich Drepper's document (
https://www.akkadia.org/drepper/tls.pdf) but it is a few years old and
covers only a subset of architectures.

Is TLS supported on all architectures in GCC?

Is there some documentation on how it is implemented on architectures not
in Ulrich's paper? Or some guidance on how to extract this information from
the GCC source?

Thanks.

--joel


Hoisting DFmode loads out of loops..

2020-06-25 Thread Alan Lehotsky
I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit 
data-path between registers and memory;  

looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the 
move of a double constant to memory into multiple moves (4 in fact, because I 
only have a 16-bit immediate mode.)

The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”.

Is there some other trick I need get the constant hoisted.  I have already set 
the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns)


Alan Lehotsky
https://codegentllc.com





Re: TLS Implementation Across Architectures

2020-06-25 Thread Nathan Sidwell

On 6/25/20 2:34 PM, Joel Sherrill wrote:

Hi

RTEMS supports over 15 processor architectures and we would like to ensure
that TLS is supported on all  rather than just a handful of popular ones
(arm, x86, powerpc, sparc, etc). I know of Ulrich Drepper's document (
https://www.akkadia.org/drepper/tls.pdf) but it is a few years old and
covers only a subset of architectures.

Is TLS supported on all architectures in GCC?

Is there some documentation on how it is implemented on architectures not
in Ulrich's paper? Or some guidance on how to extract this information from
the GCC source?


The ARM (32) abi has some extensions to that, which originally came from 
Alex Oliva and then I implemented (The GNU2 TLS stuff).  I think the 
smarts is in the linker for that though.


IMHO bfd might be a better source of information than gcc.

natan
--
Nathan Sidwell


Re: TLS Implementation Across Architectures

2020-06-25 Thread Joel Sherrill
On Thu, Jun 25, 2020 at 2:54 PM Nathan Sidwell  wrote:

> On 6/25/20 2:34 PM, Joel Sherrill wrote:
> > Hi
> >
> > RTEMS supports over 15 processor architectures and we would like to
> ensure
> > that TLS is supported on all  rather than just a handful of popular ones
> > (arm, x86, powerpc, sparc, etc). I know of Ulrich Drepper's document (
> > https://www.akkadia.org/drepper/tls.pdf) but it is a few years old and
> > covers only a subset of architectures.
> >
> > Is TLS supported on all architectures in GCC?
> >
> > Is there some documentation on how it is implemented on architectures not
> > in Ulrich's paper? Or some guidance on how to extract this information
> from
> > the GCC source?
>
> The ARM (32) abi has some extensions to that, which originally came from
> Alex Oliva and then I implemented (The GNU2 TLS stuff).  I think the
> smarts is in the linker for that though.
>
> IMHO bfd might be a better source of information than gcc.
>

BFD would know the section and attribute part but isn't gcc responsible for
generating the code to dereference into it?  It could be a specific base
register
or an invalid instruction fault (MIPS) or something else. I'm wondering how
one knows what that magic to look up the base is for a specific
architecture.

Or if there is an easy way for a target to change say the MIPS bad
instruction
to a subroutine call? It would seem that GCC would have an architecture
independent base lookup alternative.

--joel

--joel

>
> natan
> --
> Nathan Sidwell
>


Re: TLS Implementation Across Architectures

2020-06-25 Thread Andrew Pinski via Gcc
On Thu, Jun 25, 2020 at 1:34 PM Joel Sherrill  wrote:
>
> On Thu, Jun 25, 2020 at 2:54 PM Nathan Sidwell  wrote:
>
> > On 6/25/20 2:34 PM, Joel Sherrill wrote:
> > > Hi
> > >
> > > RTEMS supports over 15 processor architectures and we would like to
> > ensure
> > > that TLS is supported on all  rather than just a handful of popular ones
> > > (arm, x86, powerpc, sparc, etc). I know of Ulrich Drepper's document (
> > > https://www.akkadia.org/drepper/tls.pdf) but it is a few years old and
> > > covers only a subset of architectures.
> > >
> > > Is TLS supported on all architectures in GCC?
> > >
> > > Is there some documentation on how it is implemented on architectures not
> > > in Ulrich's paper? Or some guidance on how to extract this information
> > from
> > > the GCC source?
> >
> > The ARM (32) abi has some extensions to that, which originally came from
> > Alex Oliva and then I implemented (The GNU2 TLS stuff).  I think the
> > smarts is in the linker for that though.
> >
> > IMHO bfd might be a better source of information than gcc.
> >
>
> BFD would know the section and attribute part but isn't gcc responsible for
> generating the code to dereference into it?  It could be a specific base
> register
> or an invalid instruction fault (MIPS) or something else. I'm wondering how
> one knows what that magic to look up the base is for a specific
> architecture.
>
> Or if there is an easy way for a target to change say the MIPS bad
> instruction
> to a subroutine call? It would seem that GCC would have an architecture
> independent base lookup alternative.

NOTE MIPS32/64r3 says that system register is implemented.  I know of
a few implementations that implement that register as a register
(Octeon 2 and Octeon3 for an example).

Thanks,
Andrew


>
> --joel
>
> --joel
>
> >
> > natan
> > --
> > Nathan Sidwell
> >


gcc-8-20200625 is now available

2020-06-25 Thread GCC Administrator via Gcc
Snapshot gcc-8-20200625 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/8-20200625/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 8 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-8 
revision e39a8763c4dead0f448981d9488d7b264db9da55

You'll find:

 gcc-8-20200625.tar.xzComplete GCC

  SHA256=42902a890c439f669d634072dda9f81526a6fa6ff8402133385b754df82db951
  SHA1=d32fce3dc58568e79edda0925c03b66a28d821c8

Diffs from 8-20200618 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Hoisting DFmode loads out of loops..

2020-06-25 Thread Jeff Law via Gcc
On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote:
> I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit 
> data-path between registers and memory;  
> 
> looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the 
> move of a double constant to memory into multiple moves (4 in fact, because I 
> only have a 16-bit immediate mode.)
> 
> The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”.
> 
> Is there some other trick I need get the constant hoisted.  I have already 
> set the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns)
Hi Alan, it's been a long time...

We'd probably need to set the RTL.  A variety of things can get in the way of
LICM.  For example, I'd expect subregs to be problematical because they can look
like RMW operations.

jeff



Re: Hoisting DFmode loads out of loops..

2020-06-25 Thread Alan Lehotsky
On Jun 25, 2020, at 6:37 PM, Jeff Law mailto:l...@redhat.com>> 
wrote:

On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote:
I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit 
data-path between registers and memory;

looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the 
move of a double constant to memory into multiple moves (4 in fact, because I 
only have a 16-bit immediate mode.)

The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”.

Is there some other trick I need get the constant hoisted.  I have already set 
the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns)
Hi Alan, it's been a long time...

We'd probably need to set the RTL.  A variety of things can get in the way of
LICM.  For example, I'd expect subregs to be problematical because they can look
like RMW operations.

jeff



Hello to you too, Jeff….   I’ve been lurking for the last decade or so, last 
port I actually did was was GCC 4 based, so lots of new stuff to try and wrap 
my head around.  I certainly am grateful for anybody with suggestions as to how 
to track down this problem (I’m not terribly eager to do a
parallel stepping thru a x86 gcc in parallel with my port to see where they 
diverge in the loop-invariant recognition.)

Although in crafting this expanded email, I see that the x86 has already 
decided to store the constant 18.4242 in the .rodata section by the start of 
loop-invariance so there’s a

(set (reg:DF…. ) (mem:DF  (symbol_ref ….)))

and I bet that’s far easier to move out of the loop than it would be to split 
the original

(set (mem:DF…) (const_double:DF ….))

— Al

==

Source code is

void f (double *a)
{
int i;
for (i = 0; i < 100; i++_
a[i] = 18.4242;
}
==

Here’s the dump from loop-9.c.252r.loop2-invariant  (compiled -O1)


;; Function f (f, funcdef_no=0, decl_uid=1458, cgraph_uid=0, symbol_order=0)

*starting processing of loop 1 **
starting the processing of deferred insns
ending the processing of deferred insns
setting blocks to analyze 3, 5
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( 0.33)
df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( 0.33)
df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 3 (  0.5)


starting region dump


f

Dataflow summary:
def_info->table_size = 3, use_info->table_size = 23
;;  invalidated by call 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 [d7] 
8 [d8] 9 [d9] 14 [d14] 15 [d15] 16 [a0] 19 [a3] 20 [a4] 24 [acc0_hi] 25 
[acc0_lo] 26 [acc1_hi] 27 [acc1_lo] 28 [source3] 30 [cc] 31 [int_set0] 32 
[int_set1] 33 [int_clr0] 34 [int_clr1] 35 [scratchpad0] 36 [scratchpad1] 37 
[scratchpad2] 38 [scratchpad3]
;;  hardware regs used 23 [sp] 29 [arg] 39 [sfp]
;;  regular block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
;;  eh block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
;;  entry block defs 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 [d7] 8 
[d8] 9 [d9] 21 [a5] 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
;;  exit block uses 22 [a6] 23 [sp] 39 [sfp]
;;  regs ever live 0 [d0] 30 [cc]
;;  ref usage r0={1d,1u} r1={1d} r2={1d} r3={1d} r4={1d} r5={1d} r6={1d} 
r7={1d} r8={1d} r9={1d} r21={1d} r22={1d,5u} r23={1d,5u} r29={1d,4u} 
r30={3d,1u} r39={1d,5u} r46={2d,4u} r48={1d,1u}
;;total ref usage 47{21d,26u,0e} in 6{6 regular + 0 call} insns.
;; Reaching defs:
;;  sparse invalidated
;;  dense invalidated 0, 1
;;  reg->defs[] map: 30[0,1] 46[2,2]
;; bb 3 artificial_defs: { }
;; bb 3 artificial_uses: { u7(22){ }u8(23){ }u9(29){ }u10(39){ }}
;; lr  in   22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48
;; lr  use 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48
;; lr  def 30 [cc] 46
;; live  in   46
;; live  gen 30 [cc] 46
;; live  kill 30 [cc]
;; rd  in   (1) 46[2]
;; rd  gen (2) 30[1],46[2]
;; rd  kill (3) 30[0,1],46[2]
;;  UD chains for artificial uses at top

(code_label 11 7 8 3 2 (nil) [0 uses])
(note 8 11 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
;;   UD chains for insn luid 0 uid 9
;;  reg 46 { d2(bb 3 insn 10) }
(insn 9 8 10 3 (set (mem:DF (reg:SI 46 [ ivtmp___6 ]) [0 MEM[base: _15, offset: 
0B]+0 S8 A32])
(const_double:DF 1.842419990222931955941021442413330078125e+1 
[0x0.9364c2f837b4ap+5])) "loop-9.c":9 19 {movdf}
 (nil))
;;   UD chains for insn luid 1 uid 10
;;  reg 46 { d2(bb 3 insn 10) }
(insn 10 9 12 3 (parallel [
(set (reg:SI 46 [ ivtmp___6 ])
(plus:SI (reg:SI 46 [ ivtmp___6 ])
(const_int 8 [0x8])))
(clobber (reg:CC 30 cc))
]) 81 {addsi3_1v5}
 (expr_list:REG_UNUSED (reg:CC 30 cc)
(nil)))
;;   UD chains for insn luid 2 uid 12
;;  reg 46 { d2(bb 3 insn 10) }
;;  reg 48 { }
(insn 12 10 13 3 (set (reg:CCWZ 30 cc)
(compare:CCWZ (reg:SI 46 [ ivtmp___6 ])
(reg:SI 48 [ _17 ]))) "loop-9.c":8 

Re: Hoisting DFmode loads out of loops..

2020-06-25 Thread Richard Biener via Gcc
On June 26, 2020 3:24:24 AM GMT+02:00, Alan Lehotsky  wrote:
>On Jun 25, 2020, at 6:37 PM, Jeff Law
>mailto:l...@redhat.com>> wrote:
>
>On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote:
>I’m working on a GCC 8.3 port to a load/store architecture with a
>32-bit data-path between registers and memory;
>
>looking at the gcc.dg/loop-9.c test, I fail to pass because I have
>split the move of a double constant to memory into multiple moves (4 in
>fact, because I only have a 16-bit immediate mode.)
>
>The (define_insn_and_split “movdf” …) is conditioned on
>“reload_completed”.
>
>Is there some other trick I need get the constant hoisted.  I have
>already set the rtx cost of the CONST_DOUBLE ridiculously high (like 10
>insns)
>Hi Alan, it's been a long time...
>
>We'd probably need to set the RTL.  A variety of things can get in the
>way of
>LICM.  For example, I'd expect subregs to be problematical because they
>can look
>like RMW operations.
>
>jeff
>
>
>
>Hello to you too, Jeff….   I’ve been lurking for the last decade or so,
>last port I actually did was was GCC 4 based, so lots of new stuff to
>try and wrap my head around.  I certainly am grateful for anybody with
>suggestions as to how to track down this problem (I’m not terribly
>eager to do a
>parallel stepping thru a x86 gcc in parallel with my port to see where
>they diverge in the loop-invariant recognition.)
>
>Although in crafting this expanded email, I see that the x86 has
>already decided to store the constant 18.4242 in the .rodata section by
>the start of loop-invariance so there’s a
>
>(set (reg:DF…. ) (mem:DF  (symbol_ref ….)))
>
>and I bet that’s far easier to move out of the loop than it would be to
>split the original
>
>(set (mem:DF…) (const_double:DF ….))

Immediate operands are never moved or CSEd by either RTL nor GIMPLE so if you 
do not have const_double immediates the best thing to do is not make them 
legitimate. 

Richard. 

>— Al
>
>==
>
>Source code is
>
>void f (double *a)
>{
>int i;
>for (i = 0; i < 100; i++_
>a[i] = 18.4242;
>}
>==
>
>Here’s the dump from loop-9.c.252r.loop2-invariant  (compiled -O1)
>
>
>;; Function f (f, funcdef_no=0, decl_uid=1458, cgraph_uid=0,
>symbol_order=0)
>
>*starting processing of loop 1 **
>starting the processing of deferred insns
>ending the processing of deferred insns
>setting blocks to analyze 3, 5
>starting the processing of deferred insns
>ending the processing of deferred insns
>df_analyze called
>df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 (
>0.33)
>df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 (
>0.33)
>df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 3 ( 
>0.5)
>
>
>starting region dump
>
>
>f
>
>Dataflow summary:
>def_info->table_size = 3, use_info->table_size = 23
>;;  invalidated by call 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6
>[d6] 7 [d7] 8 [d8] 9 [d9] 14 [d14] 15 [d15] 16 [a0] 19 [a3] 20 [a4] 24
>[acc0_hi] 25 [acc0_lo] 26 [acc1_hi] 27 [acc1_lo] 28 [source3] 30 [cc]
>31 [int_set0] 32 [int_set1] 33 [int_clr0] 34 [int_clr1] 35
>[scratchpad0] 36 [scratchpad1] 37 [scratchpad2] 38 [scratchpad3]
>;;  hardware regs used 23 [sp] 29 [arg] 39 [sfp]
>;;  regular block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
>;;  eh block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
>;;  entry block defs 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7
>[d7] 8 [d8] 9 [d9] 21 [a5] 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
>;;  exit block uses 22 [a6] 23 [sp] 39 [sfp]
>;;  regs ever live 0 [d0] 30 [cc]
>;;  ref usage r0={1d,1u} r1={1d} r2={1d} r3={1d} r4={1d} r5={1d}
>r6={1d} r7={1d} r8={1d} r9={1d} r21={1d} r22={1d,5u} r23={1d,5u}
>r29={1d,4u} r30={3d,1u} r39={1d,5u} r46={2d,4u} r48={1d,1u}
>;;total ref usage 47{21d,26u,0e} in 6{6 regular + 0 call} insns.
>;; Reaching defs:
>;;  sparse invalidated
>;;  dense invalidated 0, 1
>;;  reg->defs[] map: 30[0,1] 46[2,2]
>;; bb 3 artificial_defs: { }
>;; bb 3 artificial_uses: { u7(22){ }u8(23){ }u9(29){ }u10(39){ }}
>;; lr  in   22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48
>;; lr  use 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48
>;; lr  def 30 [cc] 46
>;; live  in   46
>;; live  gen 30 [cc] 46
>;; live  kill 30 [cc]
>;; rd  in   (1) 46[2]
>;; rd  gen (2) 30[1],46[2]
>;; rd  kill (3) 30[0,1],46[2]
>;;  UD chains for artificial uses at top
>
>(code_label 11 7 8 3 2 (nil) [0 uses])
>(note 8 11 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
>;;   UD chains for insn luid 0 uid 9
>;;  reg 46 { d2(bb 3 insn 10) }
>(insn 9 8 10 3 (set (mem:DF (reg:SI 46 [ ivtmp___6 ]) [0 MEM[base: _15,
>offset: 0B]+0 S8 A32])
>(const_double:DF 1.842419990222931955941021442413330078125e+1
>[0x0.9364c2f837b4ap+5])) "loop-9.c":9 19 {movdf}
> (nil))
>;;   UD chains for insn luid 1 uid 10
>;;  reg 46 { d2(bb 3 insn 10) }
>(insn 10 9 12 3 (parallel [
>(set (reg:SI 46 [ ivtmp___6 ])
>(plus:SI (reg:SI 46 [ ivtmp___6 ])
>(const_int 8 [0x8])))
>