How about providing an interface to fusing instructions via scheduling

2021-09-03 Thread gengqi via Gcc
When I was adding pipeline to my backend, some instructions needed to be
fused and I found that there was no suitable interface to implement my
requirements.

 

My hope is that

1. Do instruction scheduling and combine any two instructions, and sometimes
the two instructions can be treated as 1 when they are issued

2. The two instructions only work better when they are immediately adjacent
to each other

3. An instruction can only be fused once, i.e. if the current instruction
has been fused with the previous one, the next one cannot be fused with the
current one.

 

I have referred to numerous interfaces in the “GCC INTERNALS” which
implement some of my requirements, but all of which just happen not to cover
my needs completely.

 

These interfaces are:

-  bool TARGET_SCHED_MACRO_FUSION_PAIR_P (rtx insn *prev, rtx insn
*curr)

The name of the interface looks a lot like what I need. But in reality I
found that this interface only fuses instructions that are already adjacent
to each other and does not do scheduling (not satisfy 1). And this interface
may fuse 3 or more instructions (not satisfy 3).

 

-  void TARGET_SCHED_FUSION_PRIORITY (rtx insn *insn, int max_pri, int
*fusion_pri, int *pri)

This interface is very powerful, but with only one insn being processed at a
time, this interface does not seem to be suitable for context sensitive
situations.

 

-  Use (define_bypass number out_insn_names in_insn_names [guard])

The “bypass” does not guarantee that the instruction being dispatched is
immediately adjacent to (not satisfy 2). Moreover, bypass only handles
instructions with true dependence.

 

-  int TARGET_SCHED_REORDER (FILE *file, int verbose, rtx insn **ready,
int *n_readyp, int clock) and TARGET_SCHED_REORDER2()

This interface allows free adjustment of ready instructions, but it is not
eay to get the last scheduled instruction. The last scheduled instruction
needs to be taken into account for fusion.

 

-  Use define_peephole2

Since the fused instructions are somehow identical to one instruction, it is
thought that a peephole might be a good choice. But “define_peephole2”
also does not schedule instructions.

 

In summary, I have not found an interface that does both scheduling and
fusion. Maybe we should enhance one of the above interfaces, or maybe we
should provide a new one. I think it is necessary and beneficial to have an
interface that does both scheduling and fusion.



RE: How about providing an interface to fusing instructions via scheduling

2021-09-03 Thread Kyrylo Tkachov via Gcc
Hi,

> -Original Message-
> From: Gcc  On Behalf
> Of gengqi via Gcc
> Sent: 03 September 2021 11:56
> To: gcc@gcc.gnu.org
> Subject: How about providing an interface to fusing instructions via
> scheduling
> 
> When I was adding pipeline to my backend, some instructions needed to be
> fused and I found that there was no suitable interface to implement my
> requirements.
> 
> 
> 
> My hope is that
> 
> 1. Do instruction scheduling and combine any two instructions, and
> sometimes
> the two instructions can be treated as 1 when they are issued
> 
> 2. The two instructions only work better when they are immediately adjacent
> to each other
> 
> 3. An instruction can only be fused once, i.e. if the current instruction
> has been fused with the previous one, the next one cannot be fused with the
> current one.
> 
> 
> 
> I have referred to numerous interfaces in the “GCC INTERNALS” which
> implement some of my requirements, but all of which just happen not to
> cover
> my needs completely.

Indeed, there are a few places in GCC that help, but not a clean catch-all 
solution.

> 
> 
> 
> These interfaces are:
> 
> -  bool TARGET_SCHED_MACRO_FUSION_PAIR_P (rtx insn *prev, rtx insn
> *curr)
> 
> The name of the interface looks a lot like what I need. But in reality I
> found that this interface only fuses instructions that are already adjacent
> to each other and does not do scheduling (not satisfy 1). And this interface
> may fuse 3 or more instructions (not satisfy 3).

Indeed, this interface ensures that instructions that are already adjacent are 
kept together, but doesn't bring them together from far away.

> 
> 
> 
> -  void TARGET_SCHED_FUSION_PRIORITY (rtx insn *insn, int max_pri, int
> *fusion_pri, int *pri)
> 
> This interface is very powerful, but with only one insn being processed at a
> time, this interface does not seem to be suitable for context sensitive
> situations.
> 

This is likely more appropriate for your needs. You may want to look in the 
implementation of this (and related) hook in the aarch64 backend.
We use it there to bring certain loads and stores together with the intent to 
form special load/store-pair instructions.
The scheduler brings them insns together, but we rely on post-scheduling 
peepholes to actually combine the two together into a single instruction.
Although there are a few cases where it misses opportunities, it works pretty 
well.

Thanks,
Kyrill

> 
> 
> -  Use (define_bypass number out_insn_names in_insn_names [guard])
> 
> The “bypass” does not guarantee that the instruction being dispatched is
> immediately adjacent to (not satisfy 2). Moreover, bypass only handles
> instructions with true dependence.
> 
> 
> 
> -  int TARGET_SCHED_REORDER (FILE *file, int verbose, rtx insn **ready,
> int *n_readyp, int clock) and TARGET_SCHED_REORDER2()
> 
> This interface allows free adjustment of ready instructions, but it is not
> eay to get the last scheduled instruction. The last scheduled instruction
> needs to be taken into account for fusion.
> 
> 
> 
> -  Use define_peephole2
> 
> Since the fused instructions are somehow identical to one instruction, it is
> thought that a peephole might be a good choice. But “define_peephole2”
> also does not schedule instructions.
> 
> 
> 
> In summary, I have not found an interface that does both scheduling and
> fusion. Maybe we should enhance one of the above interfaces, or maybe we
> should provide a new one. I think it is necessary and beneficial to have an
> interface that does both scheduling and fusion.



Re: s390 port

2021-09-03 Thread Ulrich Weigand via Gcc


"Paul Edwards"  wrote on 02.09.2021 22:05:39:
> > Is this about supporting a 4GB address space instead
> > of a 2GB space?
>
> Yes, correct.

OK, that makes things clearer.  This implies in particular:

- 4GB address space means you need to run in AMODE64

- AMODE64 means the native address size is 64 bits.  This
  implies that Pmode has to be DImode, since Pmode tells
  the compiler what the native address size is.

  Specifically, if you try to run AMODE64 with Pmode equals
  SImode, the compiler will not be aware that the hardware
  uses the high 32 bits of base and index registers, and
  will not necessarily keep them zero.  Also, the compiler
  will assume the base + index (+ displacement) arithmetic
  will operate in 32 bits -- I'm pretty sure this is
  actually the root cause of your "negative index" problem.

> > Is it about supporting a 32-bit pointer type in an
> > otherwise AM64 environment?  (This is already used
> > by the TPF target, but the 32-bit pointer will still
> > refer to a 2GB address space.)
> Yes, all pointers will be 32-bit – a normal 32-bit system.

Note that even if Pmode == DImode, you can still use 32-bit
*pointer* sizes.  This is exactly what e.g. the Intel x32
mode does (as was mentioned by Andreas).

> I’d like to approach the problem from the other
> direction – what modifications are required to
> be made to “-m31” so that it does “-m32” instead?
> I’m happy to simply retire “-m31”, but I don’t care
> if both exist.

If you want to go for an "x32" like mode, I think this
is wrong approach.  The right approach would be to
start from "-m64", and simply modify the pointer size
to be 32 bits.

This would work by setting POINTER_SIZE to 32, while
leaving everything else like for -m64.  I'm sure there
will be a few other places that need adaptation, but
it should be pretty straightforward.  You can also
check the Intel back-end where they're using the
TARGET_X32 macro.


We've thought about implementing this mode for Linux,
but decided not to do it, since it would only provide
marginal performance improvements, and has the drawback
of being another new ABI that would be incompatible to
the whole existing software ecosystem.

(The latter point may not be an issue for you if you're
looking into a completely new OS anyway.)

Bye,
Ulrich


Re: s390 port

2021-09-03 Thread Paul Edwards via Gcc
> - AMODE64 means the native address size is 64 bits.  This
>  implies that Pmode has to be DImode, since Pmode tells
>  the compiler what the native address size is.

>  Specifically, if you try to run AMODE64 with Pmode equals
>  SImode, the compiler will not be aware that the hardware
>  uses the high 32 bits of base and index registers, and
>  will not necessarily keep them zero.

The compiler naturally keeps them zero. The

instructions that are used to load registers

do not pollute the high-order 32 bits.



>  Also, the compiler
>  will assume the base + index (+ displacement) arithmetic
>  will operate in 32 bits -- I'm pretty sure this is
>  actually the root cause of your "negative index" problem.


Where is this logic please? Can I do a #if 0 or similar

to disable it?


> Note that even if Pmode == DImode, you can still use 32-bit
> *pointer* sizes.  This is exactly what e.g. the Intel x32
> mode does (as was mentioned by Andreas).


I’m happy to try the approach from BOTH directions

and see which one hits “-m32” first.


>> I’d like to approach the problem from the other
>> direction – what modifications are required to
>> be made to “-m31” so that it does “-m32” instead?
>> I’m happy to simply retire “-m31”, but I don’t care
>> if both exist.

> If you want to go for an "x32" like mode, I think this
> is wrong approach.  The right approach would be to
> start from "-m64", and simply modify the pointer size
> to be 32 bits.


> This would work by setting POINTER_SIZE to 32, while
> leaving everything else like for -m64.



That will generate 64-bit z/Arch instructions.

I wish to generate ESA/390 instructions.



> I'm sure there
> will be a few other places that need adaptation, but
> it should be pretty straightforward.

No, modifying GCC is beyond my ability. I

need 20 lines of code from someone who is

familiar with the system.



>  You can also
> check the Intel back-end where they're using the
> TARGET_X32 macro.


See above about beyond my ability.

> We've thought about implementing this mode for Linux,
> but decided not to do it, since it would only provide
> marginal performance improvements, and has the drawback
> of being another new ABI that would be incompatible to
> the whole existing software ecosystem.


Shouldn’t the end user be able to decide this

for themselves? No-one at all is interested in

32-bit mainframes?


> (The latter point may not be an issue for you if you're
> looking into a completely new OS anyway.)


Correct.

Thanks. Paul.


Re: s390 port

2021-09-03 Thread Ulrich Weigand via Gcc


"Paul Edwards"  wrote on 03.09.2021 13:35:10:
> >  Specifically, if you try to run AMODE64 with Pmode equals
> >  SImode, the compiler will not be aware that the hardware
> >  uses the high 32 bits of base and index registers, and
> >  will not necessarily keep them zero.
> The compiler naturally keeps them zero. The
> instructions that are used to load registers
> do not pollute the high-order 32 bits.

While this is true for most instructions, the compiler will not
restrict itself to using only those.  (As just one obvious
example, the compiler may use "lay" with a negative displacement,
which will set the high bits of a GPR in AMODE64.)

It is of course possible to change the back-end to ensure that
SImode operations always leave the high part unmodified; for
example LLVM does that, because it wants to allocate the high
parts seperately for use with the high-word facility instructions.
But GCC currently does not do so.

> >  Also, the compiler
> >  will assume the base + index (+ displacement) arithmetic
> >  will operate in 32 bits -- I'm pretty sure this is
> >  actually the root cause of your "negative index" problem.
> Where is this logic please? Can I do a #if 0 or similar
> to disable it?

This is not in one single place, but spread throughout the
compiler, both common code and back-end.  I do not think it will
be possible to get the compiler to generate correct code if
you do not specify the address size correctly.  AMODE64 will
require Pmode == DImode.

(And, b.t.w. not the -m31 DImode, which is a pair of 32-bit
GPRs, but rather the -m64 DImode, which is a single 64-bit GPR.)

> > If you want to go for an "x32" like mode, I think this
> > is wrong approach.  The right approach would be to
> > start from "-m64", and simply modify the pointer size
> > to be 32 bits.
> > This would work by setting POINTER_SIZE to 32, while
> > leaving everything else like for -m64.
>
> That will generate 64-bit z/Arch instructions.
> I wish to generate ESA/390 instructions.

Why? AMODE64 exists only in z/Arch, so of course there
will be z/Arch instructions available ...

> > We've thought about implementing this mode for Linux,
> > but decided not to do it, since it would only provide
> > marginal performance improvements, and has the drawback
> > of being another new ABI that would be incompatible to
> > the whole existing software ecosystem.
> Shouldn’t the end user be able to decide this
> for themselves?

It's open source, of course everybode can decide what they
want to work on themselves.  But we decide what we spend
our own time on based on we think is useful ...

> No-one at all is interested in 32-bit mainframes?

Not any more, at least not in Linux.  Linux is pretty much
64-bit only at this point.


Bye,
Ulrich


Re: s390 port

2021-09-03 Thread Paul Edwards via Gcc
>> >  Also, the compiler
>> >  will assume the base + index (+ displacement) arithmetic
>> >  will operate in 32 bits -- I'm pretty sure this is
>> >  actually the root cause of your "negative index" problem.

>> Where is this logic please? Can I do a #if 0 or similar
>> to disable it?

> This is not in one single place, but spread throughout the
> compiler, both common code and back-end.  I do not think it will
> be possible to get the compiler to generate correct code if
> you do not specify the address size correctly.
1. Is there any way to put a constraint on index
registers, to say that a particular machine can
only index in the range of –512 to +512 or some
other arbitrary set? If so, I can do 0 to 2 GiB.
2. Is there a way of saying a machine doesn’t
support indexing at all?
>> > If you want to go for an "x32" like mode, I think this
>> > is wrong approach.  The right approach would be to
>> > start from "-m64", and simply modify the pointer size
>> > to be 32 bits.
>> > This would work by setting POINTER_SIZE to 32, while
>> > leaving everything else like for -m64.
>  
>> That will generate 64-bit z/Arch instructions.
>> I wish to generate ESA/390 instructions.

> Why? AMODE64 exists only in z/Arch, so of course there
> will be z/Arch instructions available ...

For the same reason people constructed Babbage’s
invention, I wish to demonstrate the minor changes
that would have been required to the S/360 so that
we would never have arrived at a 31-bit black hole,
and we could have in fact had the perfect 32-bit
machine. Almost identical to the 31-bit machine.
A S/360+, a S/370+ and a S/390+. 

>> > We've thought about implementing this mode for Linux,
>> > but decided not to do it, since it would only provide
>> > marginal performance improvements, and has the drawback
>> > of being another new ABI that would be incompatible to
>> > the whole existing software ecosystem.
>> Shouldn’t the end user be able to decide this
>> for themselves?

> It's open source, of course everybode can decide what they
> want to work on themselves.  But we decide what we spend
> our own time on based on we think is useful ...

Sure.

>> No-one at all is interested in 32-bit mainframes?

> Not any more, at least not in Linux.  Linux is pretty much
> 64-bit only at this point.

I think z/OS is pretty much still 31-bit only,
as far as apps are concerned, right? I’d like to
bump that up to 32-bit.

BFN. Paul.


Re: s390 port

2021-09-03 Thread Jakub Jelinek via Gcc
On Fri, Sep 03, 2021 at 10:38:36PM +1000, Paul Edwards via Gcc wrote:
> > This is not in one single place, but spread throughout the
> > compiler, both common code and back-end.  I do not think it will
> > be possible to get the compiler to generate correct code if
> > you do not specify the address size correctly.
> 1. Is there any way to put a constraint on index
> registers, to say that a particular machine can
> only index in the range of –512 to +512 or some
> other arbitrary set? If so, I can do 0 to 2 GiB.
> 2. Is there a way of saying a machine doesn’t
> support indexing at all?

There is a way to do that, but it isn't about changing a single or a couple
of spots, one needs to change a lot of *.md patterns, a lot of macros,
target hooks and as Ulrich said, most important is to use the right Pmode
which can differ from ptr_mode provided one e.g. defines ptr_extend pattern
etc.
Just look at the amount of work needed for the x32 or aarch64 ilp32 support,
and not just work spent one time on adding that support, but the continuous
amount of work on maintaining it.  The initial work is certainly a few
weeks if not months of work, then there needs to be somebody who regularly
tests gcc trunk and branches in such configuration so that it doesn't
bitrot, and not just that but somebody who actually fixes bugs in it.

If something doesn't fit into 2GB of address space, isn't it likely it won't
fit into 4GB of address space in a year or two?

Jakub



Re: s390 port

2021-09-03 Thread Paul Edwards via Gcc

> This is not in one single place, but spread throughout the
> compiler, both common code and back-end.  I do not think it will
> be possible to get the compiler to generate correct code if
> you do not specify the address size correctly.



1. Is there any way to put a constraint on index
registers, to say that a particular machine can
only index in the range of –512 to +512 or some
other arbitrary set? If so, I can do 0 to 2 GiB.



2. Is there a way of saying a machine doesn’t
support indexing at all?


There is a way to do that, but it isn't about changing a single or a 
couple

of spots, one needs to change a lot of *.md patterns, a lot of macros,
target hooks and as Ulrich said, most important is to use the right Pmode
which can differ from ptr_mode provided one e.g. defines ptr_extend 
pattern

etc.


Pardon? All that is required just to put a constraint
on an index register? If a range of a machine is
limited to -512 to +512, it shouldn't be necessary
to change md patterns etc etc.

Just look at the amount of work needed for the x32 or aarch64 ilp32 
support,


That's different. That's because Intel stuffed up.
IBM didn't. IBM came within an ace of a perfect
architecture. It's as if Intel had created an x32
instead of an 80386 in 1986.

IBM got it almost right in the 1960s.

and not just work spent one time on adding that support, but the 
continuous

amount of work on maintaining it.  The initial work is certainly a few
weeks if not months of work,


I've been trying to figure out how to lift the 31-bit
restriction on mainframes since around 1987.

If I have to pay someone for 2 month of work, at
this stage, I'm willing to do that, but:

1. I would like it done on GCC 3.2.3 plus maybe
GCC 3.4.6.

2. How much will it cost in US$?


then there needs to be somebody who regularly
tests gcc trunk and branches in such configuration so that it doesn't
bitrot, and not just that but somebody who actually fixes bugs in it.


I'll take responsibility for giving the GCC 3.X.X
releases the TLC they deserve. And I'll encourage
my daughter to maintain them after I've kicked
the bucket.


If something doesn't fit into 2GB of address space,
isn't it likely it won't fit into 4GB of address space
in a year or two?


Nope. 2 GiB is already a shitload of memory. It only
takes something like 23 MB for GCC 3.2.3 to recompile
itself, and I think 60 MB for GCC 3.4.6 to recompile
itself. That's the heaviest real workload I do. A 4 GiB
limitation instead of 2 GiB makes it just that much
less likely I'll ever hit a real limit.

Someone told me that the only non-scientific application
they knew of that came close to hitting the 2 GiB limit
was IBM's C compiler. I doubt that IBM's C compiler
technology is evolving at such a rate that it only takes
1-2 years for them to subsequently hit 4 GiB. Quite
apart from the fact that I don't really trust that even
IBM C is hitting a 2 GiB limit for what GCC can do in
23 MiB. But it could be true - I'm not familiar with
compiler internals.

BFN. Paul.



gcc-10-20210903 is now available

2021-09-03 Thread GCC Administrator via Gcc
Snapshot gcc-10-20210903 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20210903/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision de2114d2f1792beae55dccb512c4c521b934e72b

You'll find:

 gcc-10-20210903.tar.xz   Complete GCC

  SHA256=bc19b8711cb7759f87a4ceae2d0a19037a7e29ed61392a528a9e13a6984f71aa
  SHA1=85580d14f5bb35881e12d1f9ceb55c01f337c026

Diffs from 10-20210827 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Sυbmіtting Μanuѕcripts to a Multidisciplinary Scientific Јournаl

2021-09-03 Thread Elliana Joseph
If you no longer wish to receive ҽmαils from us, please ᴄlіᴄk here
 to unѕubѕсribe.
Рαper Invіtatіon
Main Reasons for Рαper Publіcatіоn with the Јournаl
1. Fast Schedule Overview:

   1. Sυbmіt a ραρer
   2. Pre-rеᴠiеwed result sent within 1-2 weekdays
   3. Peer-rеᴠiеwed result within 2-4 weeks
   4. Revision of your artіᴄle
   5. Sυbmіtted ραρer accepted
   6. Αrtiсle ρսblished in 40-60 days from submіssіоn


2. Μanuѕcript Transferal to Relevant Јournаl:
Under one discipline, some interrelated јοurnals are listed, which offers
you the opportunity to transfer the mаnuscriрt if you decide not to
continue with the sυbmіtted јοurnal.

3. Prominent Edіtоrial Team Μеmbеrs:
*American Јournаl of Aerospace Engineering* has an eligible Edіtоrial
Cοmmіttee with extensive aᴄadҽmiᴄ qualifications and this guarantees that
the јοurnal has high scientific standards and wide international coverage.
American Јournаl of Aerospace Engineering
Dear Hohnka, MJ; Miller, JA; Dacumos, KM...,
Great aᴄadҽmiᴄ value has been seen in your work ρսblished before so that we
hope to have the opportunity to ρսblish your other ραρers in this
peer-rеᴠiеwed јοurnal: American Јournаl of Aerospace Engineering (IՏՏΝ
Online: 2376-4821).
PuƄlisհ with the Јournаl
If you have any interest, kindly ᴄlіᴄk the following weƄѕite to sυbmіt the
ραρer:
http://www.ajaeroe.org/sfln/ycuEA
Јournаl's Main Information

   1. Peer-rеᴠiеwing ѕᴄholarly јοurnal in the rеsеαrch field of aerospace
   engineering.
   2. With јοurnal famed databases, such as CΝКI, Crossref, ЈournаlSeek,
   etc.
   3. With a professional crew of еdіtorial commіttее mеmƄеrs.
   4. PuƄlisհing rеsеαrch works under CC BY 4.0, under which you maintain
   the ᴄοpyright of your ρսblished works.

We strongly believe your significant ᴄontribυtion will make this јοurnal
reach a higher level of standards.
Here enϲloѕed the details of your rеsеαrch which has impressed us most:
Your study's tіtlҽ: Evaluation of Compiler-Induced Vulnerabilities
Your study's αƄstrαct: This ραρer explores computer security
vulnerabilities that are generated inadvertently by a compiler. By using a
novel approach of examining the assembly language and other intermediate
files generated by the compilation process, it has been successfully
demonstrated that the compiler's processing of the high-level source code
can create a vulnerable end product. Proper software assurance is intended
to provide confidence that software is frее from vulnerabilities, and
compiler-induced vulnerabilities reduce this confidence level. The
discovered vulnerabilities can be related to standard vulnerability
classes, side channel attacks, undefined behavior, and persistent state
violations. Additionally, the rеsеαrch revealed that the executable machine
code generated by the compiler can differ in structure from the original
source code due to simplifications and optimizations performed during the
compilation process that cannot be disabled. This rеsеαrch examined both
the open-source GNU C compiler and the Microsoft C/C++ compiler that is
part of the Microsoft Visual Studio package. Both of these compilers are
widely used and represent typical compilers in use today.