Problem with peephole to peephole2 conversion

2005-08-24 Thread Ashwin
I am converting peepholes to peephole2's. However, after compiling the 
kernel with the patched build, I get size regressions. Although the 
obj-dump shows that the peephole should have matched, it is actually not 
getting matched. This is because the pattern to be matched is not 
present at the time of peephole2 pass (it is a 3 insn pattern, out of 
which 2 are together, but the third one is put in later by the 
scheduler) The pattern matches in the original peephole pass because the 
peephole pass happens just before assembly generation, when the 3 insns 
are present together.  Has anybody encountered a similar problem? Does 
this mean that peephole2 and peephole both should be kept on.


Secondly, after taking a look at other ports, i realised that all are 
using peep2_dead_reg_p instead of dead_or_set_p to check if a register 
is dead. The former is smarter than the later in the sense that it also 
checks if the current insn "sets" the register which is to be verified 
as dead. So, why do other ports use peep2_reg_dead_p instead of 
dead_or_set_p. Pls help me to find the advantages of using 
peep2_reg_dead_p over its counterpart.





Re: Problem with peephole to peephole2 conversion

2005-08-25 Thread Ashwin

The problem in front of me is something like this..

op0 = 1
op1 = op0 leftshift op2

The above is a part of the template to be matched, with one of the 
conditions that op0 should be dead after the 2nd insn.

This will be optimized to
op1 = 1 leftshift op2 (by the peephole2 pass to the rtl pattern 
corresponding to this and hence the requirement that op0 should be dead 
after insn2)
Under normal conditions (op0 dead after the pattern) both 
peep2_dead_reg_p and dead_or_set_p give same results.



However if the pattern happens to be something like this :

op0 = 1
op0 = op0 leftshift op2
...
use of op0 here
...

The pattern matches in any case.
The condition should also match, because although op0 is not dead after 
the pattern, the 2nd insn in the pattern itself sets op0. Hence, it can 
be optimized. This is done by dead_or_set_p.


However, peep2_reg_dead_p checks only the liveness information and says 
op0 is not dead after the pattern and says that the condition is not 
satisfied, so although the pattern matches, the conditions don't and 
hence the peephole2 also doesn't. In this sense, i thought dead_or_set_p 
is smarter than peep2_dead_reg_p since it also checks if the insn sets 
the operand.



Regards,
Ashwin



Richard Henderson wrote:


On Wed, Aug 24, 2005 at 06:50:25PM +0530, Ashwin wrote:
 

The pattern matches in the original peephole pass because the 
peephole pass happens just before assembly generation, when the 3 insns 
are present together.  Has anybody encountered a similar problem?
   



No, or at least havn't looked.

 


Does this mean that peephole2 and peephole both should be kept on.
   



No.  It would be possible to run peep2 more than once.
Noone has shown a need for it yet.

 

Secondly, after taking a look at other ports, i realised that all are 
using peep2_dead_reg_p instead of dead_or_set_p to check if a register 
is dead. The former is smarter than the later in the sense that it also 
checks if the current insn "sets" the register which is to be verified 
as dead. So, why do other ports use peep2_reg_dead_p instead of 
dead_or_set_p. Pls help me to find the advantages of using 
peep2_reg_dead_p over its counterpart.
   



My guess is that your misunderstanding is that you're not
realizing that you can ask peep2_dead_reg_p about the 
state of a register at the beginning of the N+1 insn in

the sequence.  That is, after the entire sequence is over.

peep2_dead_reg_p *is* smarter than dead_or_set_p.



r~

 





Problem regarding canonicalization

2006-04-04 Thread Ashwin
I have a combiner pattern that converts a sub-cmp pattern to a cmp insn, 
something like this -

"if (a-1 < 0)"
is converted to
"if (a<1)"

Now consider the following test case -


f(long a){return (--a > 0);}
main(){if(f(0x8000L)==0)abort();exit(0);}


The compiler generates the following code for f()
  cmp r0, 1   ;;canonicalized
  mov r0,1
  sub.le r0,r0,r0

This works fine under normal circumstances. However, in the testcase, 
the least negative no. i.e. 0x8000 (hereafer referred to as MIN) is 
passed. When 1 is subtracted from MIN, by --a, it becomes positive and 
the conditions get reversed thus leading to failure during execution.


Similar problem seems to arise when MAX is passed to a function that 
does "return (++a < 0).


How do I tackle this problem? Anything I may be missing?

Thanks in advance.
Ashwin


Re: Problem with peephole to peephole2 conversion

2005-08-25 Thread Ashwin Kolhe
> You'll have something like this in your test
> 
> operands[0] == operands[1] || peep2_regno_dead_p (2, operands[0])
> 
> i.e. you only need to test for op0's death if it is different from op1.
> 
> Paolo
> 

Exactly.. this is the same thing as calling dead_or_set_p(insn,
operands[0]).  If it can be done by dead_or_set_p, why use
peep2_dead_reg_p? Other ports also support the use of peep2_dead_reg_p
instead of dead_or_set_p. What is the basic difference between the
two? I mean, there must be something the former can do which the
latter can't.

Please correct me if i am wrong.

Thanks in advance,
Ashwin.


Re: Problem with peephole to peephole2 conversion

2005-08-25 Thread Ashwin Kolhe
On 8/25/05, Ashwin Kolhe <[EMAIL PROTECTED]> wrote:
> > You'll have something like this in your test
> >
> > operands[0] == operands[1] || peep2_regno_dead_p (2, operands[0])
> >
> > i.e. you only need to test for op0's death if it is different from op1.
> >
> > Paolo
> >
> 
> Exactly.. this is the same thing as calling dead_or_set_p(insn,
> operands[0]). 


i am sorry, since we are using peephole2, the variable "insn" points
to the first insn in the template and not the last. so the call should
be
dead_or_set_p(next_nonnote_insn(insn),operands[0])


> If it can be done by dead_or_set_p, why use
> peep2_dead_reg_p? Other ports also support the use of peep2_dead_reg_p
> instead of dead_or_set_p. What is the basic difference between the
> two? I mean, there must be something the former can do which the
> latter can't.
> 
> Please correct me if i am wrong.
> 
> Thanks in advance,
> Ashwin.
>


Re: Problem with peephole to peephole2 conversion

2005-08-25 Thread Ashwin Kolhe
On 8/25/05, Paolo Bonzini <[EMAIL PROTECTED]> wrote:
> I consider this to be less readable than the peep2 way to do it,
> especially if your peephole2 had three or four instructions.  And
> peep2_regno_dead_p uses an array (a circular buffer) so it's more
> efficient.  Indeed dead_or_set_p even has a loop inside it.
> 
> Paolo


Thanks a lot for ur help. can u give me a pointer to the origin of
peep2_reg_dead_p?

Regards,
Ashwin


Re: Problem with peephole to peephole2 conversion

2005-08-25 Thread Ashwin Kolhe
> Do you mean the source code?  A hint: grep ^func_name *.c will get to it
> for every function in gcc.
> 
> In this case it is in recog.c, look at peep2_reg_dead_p but also
> peep2_regno_dead_p.  There are other peep2_* functions you may use.
> 
> Paolo

I am sorry.. I think u got me wrong. I have had a look at the source
code. I am actually trying to find out WHY and WHEN peep2_reg_dead_p
was introduced. I checked the mailing list but dint find anything
relavent. Are cleanliness of the code and compile time efficiency the
only reasons? I mean is that why they introduced peep2_reg_dead_p?

Ashwin


Separating c++ parser

2005-09-12 Thread Ashwin Bharambe
Hi all,

I intend to use gcc's C++ parser and the intermediate representation
it creates for use in source browsing, etc. I have a few questions
regarding this: firstly, is it possible to plug out the parser and
intermediate representation code (presumably only in the front-end?)
relatively easily? If so, can somebody offer hints on where I could
start? I am currently looking into the gcc/cp front-end subdirectory,
but clearly, there are a number of dependencies inside the main gcc
code as well.

The other question is: the build process for gcc is quite hairy -
stage1, etc. etc. Since I am not concerned with code generation or
optimization at all, I don't think I would need this. How would I
begin simplifying the auto* and Makefile.in's to allow building the
parser as a stand-alone entity?

Thanks in advance for any help or pointers!
Ashwin

-- 
Ashwin Bharambe,  Ph.D. Candidate, Carnegie Mellon University.
Office: 412-268-7555Web: http://www.cs.cmu.edu/~ashu


Re: Separating c++ parser

2005-09-12 Thread Ashwin Bharambe
Hmm. Ok fine, I can live with having to keep all extraneous code lying
around. But it seems like there must be a way to:

 - stop gcc once the cp frontend parses the code and generates the
parse tree structure.
 - disable the stage1,stage2 compilation etc. during the build process? 

Or, is there something I am still missing? :)

Thanks,
Ashwin

On 9/12/05, Diego Novillo <[EMAIL PROTECTED]> wrote:
> On 09/12/05 15:30, Ashwin Bharambe wrote:
> 
> >is it possible to plug out the parser and intermediate representation code 
> >(presumably only in the front-end?) relatively easily?
> >
> Not really.  Though we have been re-designing the internal architecture
> to be more modular, all the components are meant to be used together.
> 
> At most, you could plug your own transformation/analysis inside the
> compiler.
> 


-- 
Ashwin Bharambe,  Ph.D. Candidate, Carnegie Mellon University.
Office: 412-268-7555Web: http://www.cs.cmu.edu/~ashu


[Aarch64] Vector Function Application Binary Interface Specification for OpenMP

2017-03-15 Thread Sekhar, Ashwin
Hi GCC Team, Aarch64 Maintainers,


The rules in Vector Function Application Binary Interface Specification  for 
OpenMP  
(https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt)
  is used in x86 for generating the simd clones of a function.


Is there a similar one defined for Aarch64?


If not, would like to start a discussion on the same for Aarch64. To  kick 
start the same, a draft proposal for Aarch64 (on the same lines as  x86 ABI) is 
included below. The only change from x86 ABI is in the  function name mangling. 
Here the letter 'b' is used for indicating the  ASIMD isa.


Please review and comment.


Thanks and Regards,

Ashwin Sekhar T K



 CUT HERE --




 Aarch64 Vector Function Application Binary Interface Specification for OpenMP


1. Vector Function ABI Overview

Aarch64 Vector Function ABI provides ABI for the vector functions generated by
compiler supporting SIMD constructs of OpenMP 4.0 [1] in Aarch64. This is
based on the x86 Vector Function Application Binary Interface Specification for
OpenMP [2].



2. Vector Function ABI

Vector Function ABI defines a set of rules that the caller and the callee
functions must obey.

These rules consist of:
  * Calling convention
  * Vector length (the number of concurrent scalar invocations to be processed
    per invocation of the vector function)
  * Mapping from element data types to vector data types
  * Ordering of vector arguments
  * Vector function masking
  * Vector function name mangling
  * Compiler generated variants of vector function



2.1. Calling Convention

The vector functions should use calling convention described in Procedure Call
Standard for the ARM 64-bit Architecture (AArch64) [3].



2.2. Vector Length

Every vector variant of a SIMD-enabled function has a vector length (VLEN). If
OpenMP clause "simdlen" is used, the VLEN is the value of the argument of that
clause. The VLEN value must be power of 2. In other case the notion of the
function`s "characteristic data type" (CDT) is used to compute the vector
length.

CDT is defined in the following order:
  a) For non-void function, the CDT is the return type.
  b) If the function has any non-uniform, non-linear parameters, then the CDT
 is the type of the first such parameter.
  c) If the CDT determined by a) or b) above is struct, union, or class type
 which is pass-by-value (except for the type that maps to the built-in
 complex data type), the characteristic data type is int.
  d) If none of the above three cases is applicable, the CDT is int.

VLEN  = sizeof(vector_register) / sizeof(CDT),

For example, if ISA is ASIMD, sizeof(vector_register) = 16, as the vector
registers are 128 bit. And if the CDT of the function is "int", sizeof(CDT) = 4.
So, VLEN = 4.



2.3. Element Data Type to Vector Data Type Mapping

The vector data types for parameters are selected depending on ISA, vector
length, data type of original parameter, and parameter specification.

For uniform and linear parameters (detailed description could be found in [1]),
the original data type is preserved.

For vector parameters, vector data types are selected by the compiler. The
mapping from element data type to vector data type is described as below.

  * The bit size of vector data type of parameter is computed as:

    size_of_vector_data_type = VLEN * sizeof(original_parameter_data_type) * 8

    For instance, for ASIMD version of vector function with parameter data type
    "int": If VLEN = 4, size_of_vector_data_type = 4 * 4 * 8 = 128 (bits), which
    means one argument of type __m128 to be passed.

  * If the size_of_vector_data_type is greater than the width of the vector
    register, multiple vector registers are selected and the parameter will be
    passed in multiple vector registers.

    For instance, for ASIMD version of vector function with parameter data type
    "int":

    If VLEN = 8, size_of_vector_data_type = 8 * 4 * 8 = 256 (bits), so the
    vector data type is __m256, which means 2 arguments of type __m128 are to
    be passed.



2.4. Ordering of Vector Arguments

  * When a parameter in the original data type results in one argument in the
    vector function, the ordering rule is a simple one to one match with the
    original argument order.
    

Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP

2017-03-19 Thread Sekhar, Ashwin
On Friday 17 March 2017 07:31 PM, James Greenhalgh wrote:
> On Wed, Mar 15, 2017 at 09:50:18AM +, Sekhar, Ashwin wrote:
>> Hi GCC Team, Aarch64 Maintainers,
>>
>>
>> The rules in Vector Function Application Binary Interface Specification  for
>> OpenMP
>> (https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt)
>> is used in x86 for generating the simd clones of a function.
>>
>> Is there a similar one defined for Aarch64?
>>
>> If not, would like to start a discussion on the same for Aarch64. To  kick
>> start the same, a draft proposal for Aarch64 (on the same lines as  x86 ABI)
>> is included below. The only change from x86 ABI is in the  function name
>> mangling. Here the letter 'b' is used for indicating the  ASIMD isa.
>
> Hi Ashwin,
>
> Thanks for the question. ARM has defined a vector function ABI, based
> on the Vector Function ABI Specification you linked below, which
> is designed to be suitable for both the Advanced SIMD and Scalable
> Vector Extensions. There has not yet been a release of this document
> which I can point you at, nor can I give you an estimate of when the
> document will be published.
>
> However, Francesco Petrogalli has recently made a proposal to the
> LLVM mailing list ( https://reviews.llvm.org/D30739 ) which I would
> note conflicts with your proposal in one way. You choose 'b' for name
> mangling for a vector function using Advanced SIMD, while Francesco
> uses 'n', which is the agreed character in the Vector Function ABI
> Specification we have been working on.
>
> I'd encourage you to wait for formal publication of the ARM Vector
> Function ABI to prevent any unexpected divergence between
> implementations.
Thanks for the information. We at Cavium are also working on libraries 
which requires this ABI specification. So we would like to see this 
published as early as possible.

>
> Thanks,
> James
>
>
Thanks
Ashwin