Re: [RFC][AArch64] function prologue analyzer in linux kernel

AKASHI Takahiro Mon, 18 Jan 2016 01:28:16 -0800

On 01/16/2016 01:56 AM, Will Deacon wrote:

On Wed, Jan 13, 2016 at 05:13:29PM +0900, AKASHI Takahiro wrote:

On 01/13/2016 03:04 AM, Will Deacon wrote:

On Tue, Jan 12, 2016 at 03:11:29PM +0900, AKASHI Takahiro wrote:

On 01/09/2016 12:53 AM, Will Deacon wrote:

I still don't understand why you can't use fstack-usage. Can you please
tell me why that doesn't work? Am I missing something?


I don't know how gcc calculates the usage here, but I guess it would be more
robust than my analyzer.

The issues, that come up to my mind, are
- -fstack-usage generates a separate output file, *.su and so we have to
   manage them to be incorporated in the kernel binary.


That doesn't sound too bad to me. How much data are we talking about here?

   This implies that (common) kernel makefiles might have to be a bit changed.
- more worse, what if kernel module case? We will have no way to let the kernel
   know the stack usage without adding an extra step at loading.


We can easily add a new __init section to modules, which is a table
representing the module functions and their stack sizes (like we do
for other things like alternatives). We'd just then need to slurp this
information at load time and throw it into an rbtree or something.


I found another issue.
Let's think about 'dynamic storage' case like:
$ cat stack.c
extern long fooX(long a);
extern long fooY(long b[]);

long foo1(long a) {

        if (a > 1) {
                long b[a];  <== Here

                return a + fooY(b);
        } else {
                return a + fooX(a);
        }
}

Then, -fstack-usage returns 48 for foo1():
$ aarch64-linux-gnu-gcc -fno-omit-frame-pointer -fstack-usage main.c stack.c \
       -pg -O2 -fasynchronous-unwind-tables
$ cat stack.su
stack.c:4:6:foo1        48      dynamic

This indicates that foo1() may use 48 bytes or more depending on a condition.
But in my case (ftrace-based stack tracer), I always expect 32 whether we're
backtracing from fooY() or from fooX() because my stack tracer estimates:
        (stack pointer) = (callee's frame pointer) + (callee's stack usage)
(in my previous e-mail, '-(minus)' was wrong.)

where (callee's stack usage) is, as I described in my previous e-mail, a size of
memory which is initially allocated on a stack in a function prologue, and 
should not
contain a size of dynamically allocate area.


According to who? What's the use in reporting only the prologue size?


Me :)
(I'm afraid that my wording, "stack usage", might confuse you.)

My arm64-specifc check_patch() expects this in order to estimate caller's 
correct stack
pointer address from a callee's frame pointer, which does not contain any of 
callee's
dynamically (so probably after a prologue) allocated variables.
Please take a close look at my patch #5[1].

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html

-Takahiro AKASHI

Will

Re: [RFC][AArch64] function prologue analyzer in linux kernel

Reply via email to