Re: [RFC][AArch64] function prologue analyzer in linux kernel

2016-01-18 Thread AKASHI Takahiro

On 01/16/2016 01:56 AM, Will Deacon wrote:

On Wed, Jan 13, 2016 at 05:13:29PM +0900, AKASHI Takahiro wrote:

On 01/13/2016 03:04 AM, Will Deacon wrote:

On Tue, Jan 12, 2016 at 03:11:29PM +0900, AKASHI Takahiro wrote:

On 01/09/2016 12:53 AM, Will Deacon wrote:

I still don't understand why you can't use fstack-usage. Can you please
tell me why that doesn't work? Am I missing something?


I don't know how gcc calculates the usage here, but I guess it would be more
robust than my analyzer.

The issues, that come up to my mind, are
- -fstack-usage generates a separate output file, *.su and so we have to
   manage them to be incorporated in the kernel binary.


That doesn't sound too bad to me. How much data are we talking about here?


   This implies that (common) kernel makefiles might have to be a bit changed.
- more worse, what if kernel module case? We will have no way to let the kernel
   know the stack usage without adding an extra step at loading.


We can easily add a new __init section to modules, which is a table
representing the module functions and their stack sizes (like we do
for other things like alternatives). We'd just then need to slurp this
information at load time and throw it into an rbtree or something.


I found another issue.
Let's think about 'dynamic storage' case like:
$ cat stack.c
extern long fooX(long a);
extern long fooY(long b[]);

long foo1(long a) {

if (a > 1) {
long b[a];  <== Here

return a + fooY(b);
} else {
return a + fooX(a);
}
}

Then, -fstack-usage returns 48 for foo1():
$ aarch64-linux-gnu-gcc -fno-omit-frame-pointer -fstack-usage main.c stack.c \
   -pg -O2 -fasynchronous-unwind-tables
$ cat stack.su
stack.c:4:6:foo148  dynamic

This indicates that foo1() may use 48 bytes or more depending on a condition.
But in my case (ftrace-based stack tracer), I always expect 32 whether we're
backtracing from fooY() or from fooX() because my stack tracer estimates:
(stack pointer) = (callee's frame pointer) + (callee's stack usage)
(in my previous e-mail, '-(minus)' was wrong.)

where (callee's stack usage) is, as I described in my previous e-mail, a size of
memory which is initially allocated on a stack in a function prologue, and 
should not
contain a size of dynamically allocate area.


According to who? What's the use in reporting only the prologue size?


Me :)
(I'm afraid that my wording, "stack usage", might confuse you.)

My arm64-specifc check_patch() expects this in order to estimate caller's 
correct stack
pointer address from a callee's frame pointer, which does not contain any of 
callee's
dynamically (so probably after a prologue) allocated variables.
Please take a close look at my patch #5[1].

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html

-Takahiro AKASHI


Will



Re: help with PR69133

2016-01-18 Thread Prathamesh Kulkarni
On 17 January 2016 at 14:56, Prathamesh Kulkarni
 wrote:
> Hi,
> I was having a look at PR69133.
> It appears that with -flto-partition=none,
> cgraph_node::get_untransformed_body ()
> is called twice for node with asm_name _ZThn4_N11xercesc_3_11C5m_fn6ERKi.
> c++filt says it is: non-virtual thunk to xercesc_3_1::C::m_fn6(int const&)
>
> get_untransformed_body() calls
> lto_free_function_in_decl_state_for_node() which sets
> node->lto_file_data to NULL.
> Now 2nd time get_untransformed_body is called for the same node,
> node->lto_file_data is NULL and lto_get_decl_name_mapping() dereferences
> lto_file_data which results in segfault.
>
> I was wondering if gating on lto_file_data could be a reasonable solution ?
> if (!lto_file_data)
>   return false;
> I am not sure what value to return for this case (I chose false
> because of early exit).
> It prevents the ICE, and the patch passes bootstrap+test on
> x86_64-unknown-linux-gnu.
> With partitoning enabled, for the above test-case,
> get_untransformed_body () is called only once per node. However I
> don't understand why it gets called twice with -flto-partition=none.
Kugan pointed out to me the test-case doesn't ICE by passing -fno-ipa-icf.

Thanks,
Prathamesh
>
> Thank you,
> Prathamesh


Re: help with PR69133

2016-01-18 Thread Richard Biener
On Mon, Jan 18, 2016 at 10:28 AM, Prathamesh Kulkarni
 wrote:
> On 17 January 2016 at 14:56, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> I was having a look at PR69133.
>> It appears that with -flto-partition=none,
>> cgraph_node::get_untransformed_body ()
>> is called twice for node with asm_name _ZThn4_N11xercesc_3_11C5m_fn6ERKi.
>> c++filt says it is: non-virtual thunk to xercesc_3_1::C::m_fn6(int const&)
>>
>> get_untransformed_body() calls
>> lto_free_function_in_decl_state_for_node() which sets
>> node->lto_file_data to NULL.
>> Now 2nd time get_untransformed_body is called for the same node,
>> node->lto_file_data is NULL and lto_get_decl_name_mapping() dereferences
>> lto_file_data which results in segfault.
>>
>> I was wondering if gating on lto_file_data could be a reasonable solution ?
>> if (!lto_file_data)
>>   return false;
>> I am not sure what value to return for this case (I chose false
>> because of early exit).
>> It prevents the ICE, and the patch passes bootstrap+test on
>> x86_64-unknown-linux-gnu.
>> With partitoning enabled, for the above test-case,
>> get_untransformed_body () is called only once per node. However I
>> don't understand why it gets called twice with -flto-partition=none.
> Kugan pointed out to me the test-case doesn't ICE by passing -fno-ipa-icf.

It looks to me we are not supposed to call this twice on the same node
but nothing guards against that (and we don't release the untransformed
body either).  -flto-partition=none is special here becaue it combines WPA
and late stage.  IPA ICF is the only IPA pass needing untransformed bodies
during WPA.

Richard.

> Thanks,
> Prathamesh
>>
>> Thank you,
>> Prathamesh



Re: ipa vrp implementation in gcc

2016-01-18 Thread Richard Biener
On Mon, Jan 18, 2016 at 12:00 AM, Kugan
 wrote:
> Hi,
>
>> Another potential use of value ranges is the profile estimation.
>> http://www.lighterra.com/papers/valuerangeprop/Patterson1995-ValueRangeProp.pdf
>> It seems to me that we may want to have something that can feed sane loop
>> bounds for profile estimation as well and we can easily store the known
>> value ranges to SSA name annotations.
>> So I think separate local pass to compute value ranges (perhaps with less
>> accuracy than full blown VRP) is desirable.
>
> Thanks for the reference. I am looking at implementing a local pass for
> VRP. The value range computation in tree-vrp is based on the above
> reference and uses ASSERT_EXPR insertion (I understand that you posted
> the reference above for profile estimation). As Richard mentioned in his
> reply, the local pass should not rely on ASSERT_EXPR insertion.
> Therefore, do you have any specific algorithm in mind (i.e. Any
> published paper or reference from book)?. Of course we can tweak the
> algorithm from the reference above but would like to understand what
> your intension are.

I have (very incomplete) prototype patches to do a dominator-based
approach instead (what is refered to downthread as non-iterating approach).
That's cheaper and is what I'd like to provide as an "utility style" interface
to things liker niter analysis which need range-info based on a specific
dominator (the loop header for example).

Richard.

>> I think the ipa-prop.c probably won't need any siginificant changes.  The
>> code basically analyze what values are passed thorugh the function and
>> this works for constants as well as for intervals. In fact ipa-cp already
>> uses the same ipa-prop analysis for
>>  1) constant propagation
>>  2) alignment propagation
>>  3) propagation of known polymorphic call contextes.
>>
>> So replacing 1) by value range propagation should be easily doable.
>> I would also like to replace alignment propagation by bitwise constant
>> propagation (i.e. propagating what bits are known to be zero and what
>> bits are known to be one). We already do have bitwise CCP, so we could
>> deal with this basically in the same way as we deal with value ranges.
>>
>> ipa-prop could use bit of clenaing up and modularizing that I hope will
>> be done next stage1 :)
>
> We (Myself and Prathamesh) are interested in working on LTO
> improvements. Let us have a look at this.
>
>>>
 - Once we have the value ranges for parameter/return values, we could
 rely on tree-vrp to use this and do the optimizations
>>>
>>> Yep.  IPA transform phase should annotate parameter default defs with
>>> computed ranges.
>>
>> Yep, in addition we will end up with known value ranges stored in aggregates
>> for that we need better separate representaiton
>>
>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68930
>>>
>
> Thanks,
> Kugan


Re: help with PR69133

2016-01-18 Thread Jan Hubicka
> On Mon, Jan 18, 2016 at 10:28 AM, Prathamesh Kulkarni
>  wrote:
> > On 17 January 2016 at 14:56, Prathamesh Kulkarni
> >  wrote:
> >> Hi,
> >> I was having a look at PR69133.
> >> It appears that with -flto-partition=none,
> >> cgraph_node::get_untransformed_body ()
> >> is called twice for node with asm_name _ZThn4_N11xercesc_3_11C5m_fn6ERKi.
> >> c++filt says it is: non-virtual thunk to xercesc_3_1::C::m_fn6(int const&)
> >>
> >> get_untransformed_body() calls
> >> lto_free_function_in_decl_state_for_node() which sets
> >> node->lto_file_data to NULL.
> >> Now 2nd time get_untransformed_body is called for the same node,
> >> node->lto_file_data is NULL and lto_get_decl_name_mapping() dereferences
> >> lto_file_data which results in segfault.
> >>
> >> I was wondering if gating on lto_file_data could be a reasonable solution ?
> >> if (!lto_file_data)
> >>   return false;
> >> I am not sure what value to return for this case (I chose false
> >> because of early exit).
> >> It prevents the ICE, and the patch passes bootstrap+test on
> >> x86_64-unknown-linux-gnu.
> >> With partitoning enabled, for the above test-case,
> >> get_untransformed_body () is called only once per node. However I
> >> don't understand why it gets called twice with -flto-partition=none.
> > Kugan pointed out to me the test-case doesn't ICE by passing -fno-ipa-icf.
> 
> It looks to me we are not supposed to call this twice on the same node
> but nothing guards against that (and we don't release the untransformed
> body either).  -flto-partition=none is special here becaue it combines WPA

You are suppose to read every function body just once from LTO stream. Doing
it multiple times is a waste.  You can call get_body and get_untransformed_body
as many times as you want as tHere is guard:
bool
cgraph_node::get_untransformed_body (void)
{
  lto_file_decl_data *file_data;
  const char *data, *name;
  size_t len;
  tree decl = this->decl;

  if (DECL_RESULT (decl))
return false;

once you read the body DECL_RESULT is set.  Only way you can get past this
to ICE should be the case where you read the body and then trhow it away.

> and late stage.  IPA ICF is the only IPA pass needing untransformed bodies
> during WPA.

That is not exactly true.  We also read untransformed bodies when merging
profiles.

I will check the PR. ipa-icf is not supposed to get body of thunks.

Honza


Re: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register

2016-01-18 Thread Andreas Arnez
On Sat, Jan 16 2016, Joel Brobecker wrote:

>> After analyzing some test case failures in GCC and GDB I realized that
>> there are various problems with the handling of DWARF pieces
>> (particularly from registers) in the current implementations of GCC and
>> GDB.  I'm working on a fix for the GDB part, but first I'd like to check
>> whether I'm heading into the right direction -- or what the right
>> direction is supposed to be.  The article below outlines these issues
>> and the suggested solution options.
>
> This is a very nice and detailed analysis of the current situation.
> Thank You!
>
> I admit that I read through the document fairly rapidly; it does
> seem to me, at this point, that the first step might be to get
> clarification from the DWARF committee? Or is input from the GCC/GDB
> community going to help the discussion with the DWARF committee?

I think it would be helpful to form an opinion within the GCC/GDB
community first.  Then we can open a DWARF issue with a specific change
request, if necessary.

FWIW, here's my (current) opinion:

I don't like option 4.2 ("loose interpretation"), because it doesn't
seem to have any significant advantages over 4.3 and is more complex.
I'm less sure about 4.3 versus 4.1.  Option 4.3 seems more intuitive and
may require a bit less code than 4.1, but is not compliant with the
current standards' wording and does not support the SPU "preferred
slots".

And regarding the padding support, I prefer 5.3.1 ("no padding
support").

--
Andreas



Re: distro test rebuild using GCC 6

2016-01-18 Thread James Greenhalgh
On Fri, Jan 15, 2016 at 05:52:49PM +, James Greenhalgh wrote:
>
>   libbpp-qt_2.1.0-1ubuntu2
> 
> [ ICE: Looks like: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68068
>   but reproduces on current trunk. Testcase reducer is in progress. ]

This turned out to be https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68148 ,
now fixed for trunk with r232410.

Author: hubicka 
Date:   Fri Jan 15 11:00:24 2016 +

PR ipa/68148
* ipa-icf.c (sem_function::merge): Virtual functions may become
reachable even if they address is not taken and there are no
idrect calls.
* g++.dg/ipa/devirt-49.C: New testcase.

Thanks,
James



Re: ipa vrp implementation in gcc

2016-01-18 Thread Jan Hubicka
> On Mon, Jan 18, 2016 at 12:00 AM, Kugan
>  wrote:
> > Hi,
> >
> >> Another potential use of value ranges is the profile estimation.
> >> http://www.lighterra.com/papers/valuerangeprop/Patterson1995-ValueRangeProp.pdf
> >> It seems to me that we may want to have something that can feed sane loop
> >> bounds for profile estimation as well and we can easily store the known
> >> value ranges to SSA name annotations.
> >> So I think separate local pass to compute value ranges (perhaps with less
> >> accuracy than full blown VRP) is desirable.
> >
> > Thanks for the reference. I am looking at implementing a local pass for
> > VRP. The value range computation in tree-vrp is based on the above
> > reference and uses ASSERT_EXPR insertion (I understand that you posted
> > the reference above for profile estimation). As Richard mentioned in his
> > reply, the local pass should not rely on ASSERT_EXPR insertion.
> > Therefore, do you have any specific algorithm in mind (i.e. Any
> > published paper or reference from book)?. Of course we can tweak the
> > algorithm from the reference above but would like to understand what
> > your intension are.
> 
> I have (very incomplete) prototype patches to do a dominator-based
> approach instead (what is refered to downthread as non-iterating approach).
> That's cheaper and is what I'd like to provide as an "utility style" interface
> to things liker niter analysis which need range-info based on a specific
> dominator (the loop header for example).

In general, given that we have existing VRP implementation I would suggest
first implementing the IPA propagation and profile estimation bits using
existing VRP pass and then try to compare the simple dominator based approach
with the VRP we have and see what are the compile time/code quality effects
of both. Based on that we can decide how complex VRP we really want.

It will be probably also more fun to implement it this way :)
I plan to collect some data on early VRP and firefox today or tomorrow.

Honza
> 
> Richard.
> 
> >> I think the ipa-prop.c probably won't need any siginificant changes.  The
> >> code basically analyze what values are passed thorugh the function and
> >> this works for constants as well as for intervals. In fact ipa-cp already
> >> uses the same ipa-prop analysis for
> >>  1) constant propagation
> >>  2) alignment propagation
> >>  3) propagation of known polymorphic call contextes.
> >>
> >> So replacing 1) by value range propagation should be easily doable.
> >> I would also like to replace alignment propagation by bitwise constant
> >> propagation (i.e. propagating what bits are known to be zero and what
> >> bits are known to be one). We already do have bitwise CCP, so we could
> >> deal with this basically in the same way as we deal with value ranges.
> >>
> >> ipa-prop could use bit of clenaing up and modularizing that I hope will
> >> be done next stage1 :)
> >
> > We (Myself and Prathamesh) are interested in working on LTO
> > improvements. Let us have a look at this.
> >
> >>>
>  - Once we have the value ranges for parameter/return values, we could
>  rely on tree-vrp to use this and do the optimizations
> >>>
> >>> Yep.  IPA transform phase should annotate parameter default defs with
> >>> computed ranges.
> >>
> >> Yep, in addition we will end up with known value ranges stored in 
> >> aggregates
> >> for that we need better separate representaiton
> >>
> >> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68930
> >>>
> >
> > Thanks,
> > Kugan


x86 interrupt attribute

2016-01-18 Thread Wink Saville
What is the status of the x86 interrupt attribute patch?

One of the last references I see is here and an attempt to update the
middle-end here.

-- Wink


Re: x86 interrupt attribute

2016-01-18 Thread H.J. Lu
On Mon, Jan 18, 2016 at 9:39 AM, Wink Saville  wrote:
> What is the status of the x86 interrupt attribute patch?
>
> One of the last references I see is here and an attempt to update the
> middle-end here.
>
> -- Wink

You can try hjl/interrupt/stage1 branch in git repo, which is queued for
GCC 7.

-- 
H.J.


Re: x86 interrupt attribute

2016-01-18 Thread Wink Saville
It looks like it will be awhile before its included, what's your guess
on when stage1 will commence for GCC 7?

About how often will you be rebasing hjl/interrupt/stage1 onto master?

On Mon, Jan 18, 2016 at 9:47 AM H.J. Lu  wrote:
>
> On Mon, Jan 18, 2016 at 9:39 AM, Wink Saville  wrote:
> > What is the status of the x86 interrupt attribute patch?
> >
> > One of the last references I see is here and an attempt to update the
> > middle-end here.
> >
> > -- Wink
>
> You can try hjl/interrupt/stage1 branch in git repo, which is queued for
> GCC 7.
>
> --
> H.J.


Re: x86 interrupt attribute

2016-01-18 Thread H.J. Lu
On Mon, Jan 18, 2016 at 10:27 AM, Wink Saville  wrote:
> It looks like it will be awhile before its included, what's your guess
> on when stage1 will commence for GCC 7?
>
> About how often will you be rebasing hjl/interrupt/stage1 onto master?

I update hjl/interrupt/stage1 branch about once a week.

> On Mon, Jan 18, 2016 at 9:47 AM H.J. Lu  wrote:
>>
>> On Mon, Jan 18, 2016 at 9:39 AM, Wink Saville  wrote:
>> > What is the status of the x86 interrupt attribute patch?
>> >
>> > One of the last references I see is here and an attempt to update the
>> > middle-end here.
>> >
>> > -- Wink
>>
>> You can try hjl/interrupt/stage1 branch in git repo, which is queued for
>> GCC 7.
>>
>> --
>> H.J.



-- 
H.J.