Re: Clobber REG_CC only for some constraint alternatives?

2020-08-17 Thread Senthil Kumar Selvaraj via Gcc


Pip Cet writes:

> On Sun, Aug 16, 2020 at 12:50 AM Segher Boessenkool
>  wrote:
>> On Sat, Aug 15, 2020 at 10:18:27AM +, Pip Cet wrote:
>> > > > What I'm currently doing is this:
>> > > >
>> > > > (define_split
>> > > >   [(set (match_operand 0 "nonimmediate_operand")
>> > > > (match_operand 1 "nox_general_operand"))]
>> > > >   "reload_completed && mov_clobbers_cc (insn, operands)"
>> > > >   [(parallel [(set (match_dup 0) (match_dup 1))
>> > > >   (clobber (reg:CC REG_CC))])])
>> > > >
>> > > > which, if I understand correctly, introduces the parallel-clobber
>> > > > expression only when necessary, at the same time as post-reload
>> > > > splitters introduce REG_CC references. Is that correct?
>> > >
>> > > Yes.  And this will work correctly if somehow you ensure that REG_CC
>> > > isn't live anywhere you introduce such a clobber.
>> >
>> > IIUC, the idea is that references to REG_CC, except for clobbers, are
>> > only introduced in the post-reload split pass, so it cannot be live
>> > before our define_split runs. Does that make sense?
>>
>> Yes, it does.  It has some huge restrictions (using the reg in inline
>> assembler can never work reliably, for example, so you'll have to
>> disallow that).
>
> Is there any approach that doesn't suffer from that problem? My
> understanding was that we need to allow reload to insert CC-clobbering
> insns on this (and many other) architectures, and that there are so
> many places where reload might choose to do so (including before and
> after inline asm) that using the register prior to reload just isn't
> possible. I'd be glad to be wrong, though :-)
>
> Is it true that reload chooses which constraint alternative is used
> for each insn? Is that information available somehow to post-reload
> splits? The problem is that the "X" constraint matches whatever's
> there already, and doesn't replace it with the (scratch:CC) expression
> that would work, so I can't rewrite those insns not to clobber the CC
> reg.
>
> For example, here's what I currently have:
>
> (define_expand "mov"
>   [(parallel [(set (match_operand:MOVMODE 0 "nonimmediate_operand" "")
>(match_operand:MOVMODE 1 "general_operand" ""))
>   (clobber (reg:CC REG_CC))])]
>   ...)
>
> (define_insn "mov_insn"
>[(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,r  ,d,Qm
>   ,r ,q,r,*r")
> (match_operand:ALL1 1 "nox_general_operand"   "r,Y00,n Ynn,r
> Y00,Qm,r,q,i"))
>(clobber (match_scratch:CC 2 "=X,c,X,c,c,X,X,c"))]
>   ...)
>
> That works, but it results in an incorrect CC clobber for, say,
> register-to-register movqi. For that, I'd need something like
>
> (define_split
>   [(parallel [(set (match_operand:ALL1 0 "nonimmediate_operand")
>  (match_operand:ALL1 1 "nox_general_operand"))
>   (clobber (reg:CC REG_CC))])]
>   "reload_completed && REG_P (operands[0]) && REG_P (operands[1])"
>   [(parallel [(set (match_dup 0)
>(match_dup 1))
>   (clobber (scratch:CC))])])
>
> and so on, for all four constraint alternatives that don't actually
> clobber REG_CC (and also for a fifth which only rarely clobbers
> REG_CC). That's duplicated code, of course.

The (setattr "cc" ...) that is currently present for all
patterns accounts for the constraint alternatives,so  using
get_attr_cc to split to a (clobber) of either cc_reg_rtx or a
gen_rtx_SCRATCH (CCmode) appears to work.

(define_insn_and_split "mov_insn"
  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
,q,r,*r")
(match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
  "register_operand (operands[0], mode)
|| reg_or_0_operand (operands[1], mode)"
  "#"
  "&& reload_completed"
  [(parallel [(set (match_dup 0)
   (match_dup 1))
  (clobber (match_dup 2))])]
  {
operands[2] = get_cc_reg_clobber_rtx (curr_insn);
  }
  [(set_attr "cc" "ldi,none,clobber,clobber,none,none,clobber")])

(define_insn "*mov_insn"
  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
,q,r,*r")
(match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))
   (clobber (match_scratch:CC 2  "=X,X ,c   ,c 
,X,X,c"))]
  "(register_operand (operands[0], mode)
|| reg_or_0_operand (operands[1], mode))
   && reload_completed"
  {
return output_movqi (insn, operands, NULL);
  }
  [(set_attr "length" "1,1,5,5,1,1,4")
   (set_attr "adjust_len" "mov8")])

As you mentioned elsewhere, keeping the clobber with "X" helps keep a
single pattern for both CC reg clobbering and non-clobbering insns.


>
> All this is at least somewhat academic: the code produced isn't
> drastically better after my cc conversion, but it's not usually any
> worse, and I'm still looking at assembler examples that are pessimized
> a little to find out how to fix them...
>
>> And earlier passes (like combine) cannot optimise this,
>
> I'm not sure what "this" refe

Re: Clobber REG_CC only for some constraint alternatives?

2020-08-17 Thread Pip Cet via Gcc
On Mon, Aug 17, 2020 at 7:31 AM Senthil Kumar Selvaraj
 wrote:
> > (define_split
> >   [(parallel [(set (match_operand:ALL1 0 "nonimmediate_operand")
> >  (match_operand:ALL1 1 "nox_general_operand"))
> >   (clobber (reg:CC REG_CC))])]
> >   "reload_completed && REG_P (operands[0]) && REG_P (operands[1])"
> >   [(parallel [(set (match_dup 0)
> >(match_dup 1))
> >   (clobber (scratch:CC))])])
> >
> > and so on, for all four constraint alternatives that don't actually
> > clobber REG_CC (and also for a fifth which only rarely clobbers
> > REG_CC). That's duplicated code, of course.
>
> The (setattr "cc" ...) that is currently present for all
> patterns accounts for the constraint alternatives,so  using
> get_attr_cc to split to a (clobber) of either cc_reg_rtx or a
> gen_rtx_SCRATCH (CCmode) appears to work.

Thanks! Using an insn attribute is actually what I ended up doing
(https://github.com/pipcet/gcc/commit/d4509afae9238e0ade4d3e1e97dd8577dae96115)
:-)

It's still confusing, IMHO, that insn attributes (but not the
get_attr_alternative attribute which is mentioned in the
documentation) are available when which_alternative is not.

> (define_insn "*mov_insn"
>   [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
> ,q,r,*r")
> (match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
> Y00,Qm,r,q,i"))
>(clobber (match_scratch:CC 2  "=X,X ,c   ,c 
> ,X,X,c"))]

Hmm. Technically, of course, clearing register 0 (a special case of
the first alternative) would clobber the flags, but as it happens, the
rewrite above produces the right clobber expression which is simply
accepted by the "X"... I'm not sure whether anything else is trying to
recognize such insns, but as it stands that define_insn would
recognize the incorrect insn:

[(set (reg:QI 0) (const_int 0))
 (clobber (scratch:CC))]


Not

2020-08-17 Thread Henry Mandigma via Gcc
Henrymandigmaagmail.com


Without

2020-08-17 Thread Henry Mandigma via Gcc
aHenrymandigmaagmail.com


Question about IPA-PTA and build_alias

2020-08-17 Thread Erick Ochoa

Hello,

I'm looking to understand better the points-to analysis (IPA-PTA) and 
the alias analysis (build_alias).


How is the information produced by IPA-PTA consumed?

Are alias sets in build_alias computed by the intersections of the 
points_to_set(s) (computed by IPA-PTA)?


My intuition tells me that it could be relatively simple to move 
build_alias to be an SIMPLE_IPA_PASS performed just after IPA-PTA, but I 
do not have enough experience in GCC to tell if this is correct. What 
could be some difficulties which I am not seeing? (Either move, or 
create a new IPA-ALIAS SIMPLE_IPA_PASS.) This pass would have the same 
sensitivity as IPA-PTA { flow-insensitive, context-insensitive, 
field-sensitive } because the alias sets could be computed by the 
intersection of points-to-sets.


Thanks!


Re: Peephole optimisation: isWhitespace()

2020-08-17 Thread Stefan Kanthak
"Nathan Sidwell" 

> On 8/16/20 9:54 AM, Stefan Kanthak wrote:
>> "Nathan Sidwell"  wrote:

[...]

>>> Have you benchmarked it?
>> 
>> Of course! Did you?

[...]

> you seem very angry about being asked for data.

As much as you hallucinated about CMOV or processors not speculating beyond
SETNZ?

>  As I said, I couldn't benchmark your code, because of the incorrect assembly.

As I wrote, that's your fault: every x86 and its evolution x64 know SHR EAX, CL

[...]

> anyway, you've made it clear you do not wish to engage in constructive 
> discussion.

Use a mirror!

> BTW, I have come up with a sequence as short as GCC's but without the 
> conditional branch.  Sadly the margin is too small to write it.

Reality surely bites!

Stefan


[RFC] LTO Dead Field Elimination and LTO Field Reordering

2020-08-17 Thread Erick Ochoa

Hello again,

I have committed a couple of changes to the branch

refs/vendors/ARM/heads/arm-struct-reorg-wip

where my transformations are currently residing. This week I'll be 
working on fixing several warnings and fuzzying these passes.


2020-08-17  Erick Ochoa

* Can be bootstrapped
* Fixes bug that prevented CPU 2017 gcc benchmark from compiling

2020-08-11  Erick Ochoa

* Initial contribution of field reordering and dead field elimination.

Thanks again for your time. If you have any question or comments, let me 
know.


Re: Peephole optimisation: isWhitespace()

2020-08-17 Thread Allan Sandfeld Jensen
On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote:
> Hi @ll,
> 
> in his ACM queue article ,
> Matt Godbolt used the function
> 
> | bool isWhitespace(char c)
> | {
> | 
> | return c == ' '
> | 
> |   || c == '\r'
> |   || c == '\n'
> |   || c == '\t';
> | 
> | }
> 
> as an example, for which GCC 9.1 emits the following assembly for AMD64
> 
> processors (see ):
> |xoreax, eax  ; result = false
> |cmpdil, 32   ; is c > 32
> |ja .L4   ; if so, exit with false
> |movabs rax, 4294977024   ; rax = 0x12600
> |shrx   rax, rax, rdi ; rax >>= c
> |andeax, 1; result = rax & 1
> |
> |.L4:
> |ret
> 
No it doesn't. As your example shows if you took the time to read it, it is 
what gcc emit when generating code to run on a _haswell_ architecture. If you 
remove -march=haswell from the command line you get:

xor eax, eax
cmp dil, 32
ja  .L1
movabs  rax, 4294977024
mov ecx, edi
shr rax, cl
and eax, 1

It uses one mov more, but no shrx. 

'Allan




TTG Travel Experience 2020

2020-08-17 Thread mary smith
Greetings,

Wonderful day to you

Just wanted to probe if you would be interested in acquiring TTG Travel 
Experience 2020 (International Trade Show for Travel Industry, Meet 
Professionals and Experts) attendee companies to increase prospect flow at your 
booth - Product launch- Brand awareness Etc.

Date: 14 - 16 Oct 2020
Venue: Rimini Fiera Spa, Rimini, Italy

Information Provided: - Company name, URL, Contact name, Job title, number, fax 
number, physical address, Industry, Company size, Email address.

We also provide customized lists as per your target audience

Please advise, so I can provide you available cost and counts and more 
information for your approval.

Look forward to your reply.

Best Regards,
Mary Smith
B2B Business & Tradeshow Specialist

If you don't wish to hear again about the offer, kindly click here for 
"Opt-out"



Re: Clobber REG_CC only for some constraint alternatives?

2020-08-17 Thread Jeff Law via Gcc
On Fri, 2020-08-14 at 16:46 +0530, Senthil Kumar Selvaraj wrote:
> Hi,
> 
>   I'm working on porting AVR to MODE_CC, and there are quite a few
>   patterns that clobber the condition code reg only for certain
>   constraint alternatives. For e.g.,
Yea.  H8 has the same issue.  It's worth noting that showing a clobber for the 
CC
register even when the bits aren't clobbered is safe.

So I started with that on the H8, even though it did result in some code
efficiency regressions.  Then I started pulling apart those patterns when 
testing
showed it was worth the effort.

Jeff



Re: Peephole optimisation: isWhitespace()

2020-08-17 Thread Stefan Kanthak
"Allan Sandfeld Jensen"  wrote:

> On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote:
>> Hi @ll,
>> 
>> in his ACM queue article ,
>> Matt Godbolt used the function
>> 
>> | bool isWhitespace(char c)
>> | {
>> | 
>> | return c == ' '
>> | 
>> |   || c == '\r'
>> |   || c == '\n'
>> |   || c == '\t';
>> | 
>> | }
>> 
>> as an example, for which GCC 9.1 emits the following assembly for AMD64
>> 
>> processors (see ):
>> |xoreax, eax  ; result = false
>> |cmpdil, 32   ; is c > 32
>> |ja .L4   ; if so, exit with false
>> |movabs rax, 4294977024   ; rax = 0x12600
>> |shrx   rax, rax, rdi ; rax >>= c
>> |andeax, 1; result = rax & 1
>> |
>> |.L4:
>> |ret
>> 
> No it doesn't. As your example shows if you took the time to read it, it is 
> what gcc emit when generating code to run on a _haswell_ architecture.

Matt's article does NOT specify the architecture for THIS example.
He specified it for another example he named "(q)":

| When targeting the Haswell microarchitecture, GCC 8.2 compiles this code
| to the assembly in (q) (https://godbolt.org/z/acm19_bits):

WHat about CAREFUL reading?

> If you remove -march=haswell from the command line you get:
> 
>xor eax, eax
>cmp dil, 32
>ja  .L1
>movabs  rax, 4294977024
>mov ecx, edi
>shr rax, cl
>and eax, 1
> 
> It uses one mov more, but no shrx. 

The SHRX is NOT the point here; its the avoidable conditional branch that
matters!

 mov ecx, edi
 movabs  rax, 4294977024
 shr rax, cl
 xor edi, edi
 cmp ecx, 33
 setbdil
 and eax, edi

Stefan


Re: Clobber REG_CC only for some constraint alternatives?

2020-08-17 Thread Segher Boessenkool
Hi!

On Sun, Aug 16, 2020 at 06:28:44PM +, Pip Cet wrote:
> > > IIUC, the idea is that references to REG_CC, except for clobbers, are
> > > only introduced in the post-reload split pass, so it cannot be live
> > > before our define_split runs. Does that make sense?
> >
> > Yes, it does.  It has some huge restrictions (using the reg in inline
> > assembler can never work reliably, for example, so you'll have to
> > disallow that).
> 
> Is there any approach that doesn't suffer from that problem? My
> understanding was that we need to allow reload to insert CC-clobbering
> insns on this (and many other) architectures, and that there are so
> many places where reload might choose to do so (including before and
> after inline asm) that using the register prior to reload just isn't
> possible. I'd be glad to be wrong, though :-)

You can insert the clobber at expand time already.  That probably works
worse if most of your insns clobber the flags, but better if many do
not.

> Is it true that reload chooses which constraint alternative is used
> for each insn?

Register allocation does, yes (reload (or LRA) is the last step of that,
it has the last say).

> Is that information available somehow to post-reload
> splits?

In the "which_alternative" variable (not an attribute, an actual
variable; so use it from a C block or C expression).  There also is the
"alternative" attribute (which uses "which_alternative" under the hood),
which is handy for defining *other* attributes.

> The problem is that the "X" constraint matches whatever's
> there already, and doesn't replace it with the (scratch:CC) expression
> that would work, so I can't rewrite those insns not to clobber the CC
> reg.
> 
> For example, here's what I currently have:
> 
> (define_expand "mov"
>   [(parallel [(set (match_operand:MOVMODE 0 "nonimmediate_operand" "")
>(match_operand:MOVMODE 1 "general_operand" ""))
>   (clobber (reg:CC REG_CC))])]
>   ...)

Yeah, if you use a hard register at expand time already, it won't be
changed by register allocation.

> (define_insn "mov_insn"
>[(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,r  ,d,Qm
>   ,r ,q,r,*r")
> (match_operand:ALL1 1 "nox_general_operand"   "r,Y00,n Ynn,r
> Y00,Qm,r,q,i"))
>(clobber (match_scratch:CC 2 "=X,c,X,c,c,X,X,c"))]
>   ...)
> 
> That works, but it results in an incorrect CC clobber for, say,
> register-to-register movqi. For that, I'd need something like
> 
> (define_split
>   [(parallel [(set (match_operand:ALL1 0 "nonimmediate_operand")
>  (match_operand:ALL1 1 "nox_general_operand"))
>   (clobber (reg:CC REG_CC))])]
>   "reload_completed && REG_P (operands[0]) && REG_P (operands[1])"
>   [(parallel [(set (match_dup 0)
>(match_dup 1))
>   (clobber (scratch:CC))])])
> 
> and so on, for all four constraint alternatives that don't actually
> clobber REG_CC (and also for a fifth which only rarely clobbers
> REG_CC). That's duplicated code, of course.

You can make it a define_insn_and_split where the condition can look
at the actual operands, or can use which_alternative.

> All this is at least somewhat academic: the code produced isn't
> drastically better after my cc conversion, but it's not usually any
> worse, and I'm still looking at assembler examples that are pessimized
> a little to find out how to fix them...

Great :-)

> > And earlier passes (like combine) cannot optimise this,
> 
> I'm not sure what "this" refers to: my understanding is that the idea
> is to let combine see CC-free code, or code with CC clobbers only,
> which it can then optimize, and only add "real" CC references after
> reload, so many optimizations should just work.

"This" was meant to refer to "anything to do with the CC reg".  If it
doesn't yet exist until after reload, of course no pass before it can do
anything with it :-)

> > But it is a pretty straightforward way to move from CC0 to the
> > modern world!
> 
> With the limitations of the reload pass being as they are, I don't
> really see a dramatically better alternative. I suppose we could
> explicitly save and restore CC flags around insns emitted when
> reload_in_progress is true, to simulate non-CC-clobbering add/mov insn
> patterns? That sounds like it would reduce code quality a lot, unless
> great care were taken to make sure all of the save/restore CC flags
> insns were optimized away in later passes.

Do you have move insns that do not clobber the CC reg?  If not, good
luck :-)

> In any case, thanks for the answers so far!

My pleasure.


Segher


Re: Silly question about pass numbers

2020-08-17 Thread Martin Jambor
Hi,

On Tue, Aug 11 2020, Gary Oblock via Gcc wrote:
> For these two dump files:
>
> exe.ltrans0.ltrans.074i.cp
>
> and
>
> exe.ltrans0.ltrans.087i.structure-reorg
>
> doesn't the ".074i." mean that this dump was created
> before the ".087i." dump?
>
> If so then why does the ".074i." show GIMPLE that was
> created in the structure-reorg pass?

Honza will correct me if I am wrong but this is basically an unfortunate
effect of allowing "simple" "late" IPA passes while simultaneously
trying to achieve memory locality - i.e. after IPA we'd like to process
one function at a time and only access that function in memory, as much
as possible.  Therefore, transformation phases of proper IPA passes are
run just before the late normal intra-procedural on the same function.

But late_ipa_passes are run before that, so that they have a chance to
do their thing.  Because they cannot do WPA, they are really considered
second class citizens (that is at least my impression) and because they
wreck havoc to memory locality, we'd like to have as few of them as
possible - at least turned on by default.

Consider this another incentive to eventually upgrade your passes to
full IPA ones.

Martin



Re: Why am I seeing free.2 instead of free in exe.ltrans0.ltrans.s??

2020-08-17 Thread Martin Jambor
Hi,

On Tue, Aug 11 2020, Gary Oblock via Gcc wrote:
> Note, I'm getting close to getting my part of the structure reorganization
> optimization minimally functional (my question about value range propagation 
> remains open since I re-enabled a couple of optimizations to bypass it.) 
> Therefore this is actually important for me to resolve.
>
> I obviously generated calls to the standard library function "free."
> Also, similar calls to malloc didn't have that ".2" appended on them.
> I'd normally handle a problem like this with a Google search but
> "free.2" gets turned into "free to" and I get an insane number of
> junk search results.
>
> This is obviously an easy question to answer for those that
> have seen something similar in the past.

This looks like the name has been privatized for LTRANS compilation (see
promote_symbol and privatize_symbol_name in lto/lto-partition.c).  Does
the decl you call have DECL_EXTERN and DECL_PUBLIC set?  Are all of the
relevant flags of the associated call graph node set as you'd expect?

Martin


Re: Clobber REG_CC only for some constraint alternatives?

2020-08-17 Thread Senthil Kumar Selvaraj via Gcc


Pip Cet writes:

> On Mon, Aug 17, 2020 at 7:31 AM Senthil Kumar Selvaraj
>  wrote:
>> > (define_split
>> >   [(parallel [(set (match_operand:ALL1 0 "nonimmediate_operand")
>> >  (match_operand:ALL1 1 "nox_general_operand"))
>> >   (clobber (reg:CC REG_CC))])]
>> >   "reload_completed && REG_P (operands[0]) && REG_P (operands[1])"
>> >   [(parallel [(set (match_dup 0)
>> >(match_dup 1))
>> >   (clobber (scratch:CC))])])
>> >
>> > and so on, for all four constraint alternatives that don't actually
>> > clobber REG_CC (and also for a fifth which only rarely clobbers
>> > REG_CC). That's duplicated code, of course.
>>
>> The (setattr "cc" ...) that is currently present for all
>> patterns accounts for the constraint alternatives,so  using
>> get_attr_cc to split to a (clobber) of either cc_reg_rtx or a
>> gen_rtx_SCRATCH (CCmode) appears to work.
>
> Thanks! Using an insn attribute is actually what I ended up doing
> (https://github.com/pipcet/gcc/commit/d4509afae9238e0ade4d3e1e97dd8577dae96115)
> :-)
>
> It's still confusing, IMHO, that insn attributes (but not the
> get_attr_alternative attribute which is mentioned in the
> documentation) are available when which_alternative is not.
>
>> (define_insn "*mov_insn"
>>   [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
>> ,q,r,*r")
>> (match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
>> Y00,Qm,r,q,i"))
>>(clobber (match_scratch:CC 2  "=X,X ,c   ,c 
>> ,X,X,c"))]
>
> Hmm. Technically, of course, clearing register 0 (a special case of
> the first alternative) would clobber the flags, but as it happens, the
> rewrite above produces the right clobber expression which is simply
> accepted by the "X"... I'm not sure whether anything else is trying to
> recognize such insns, but as it stands that define_insn would
> recognize the incorrect insn:
>
> [(set (reg:QI 0) (const_int 0))
>  (clobber (scratch:CC))]

get_cc_reg_clobber_rtx also looks at the insn itself (i.e. whatever
the erstwhile NOTICE_UPDATE_CC hook did), so if the cc attribute is LDI,
and operands[1] is const0_rtx, it would return cc_reg_rtx as the clobber reg.

AFAICT, some passes after reload verify if operands match constraints,
and that would work in this case - I'm not sure if the pattern you
mentioned could show up, outside of wrong splitters/peepholes in the md file.

Another approach would be to conservatively use a 'c' constraint and
clobber REG_CC for all reg-reg moves.

Regards
Senthil