On Tue, 13 Aug 2013 07:46:46 -0700
"H. Peter Anvin" wrote:
> > On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
> >> Since we really doesn't want to...
>
> Ow. Can't believe I wrote that.
>
All your base are belong to us!
-- Steve
> On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
>> Since we really doesn't want to...
Ow. Can't believe I wrote that.
-hpa
On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
> On 08/12/2013 09:09 AM, Peter Zijlstra wrote:
> >>
> >> On the majority of architectures, including x86, you cannot simply copy
> >> a piece of code elsewhere and have it still work.
> >
> > I thought we used -fPIC which would allow
On 08/12/2013 09:09 AM, Peter Zijlstra wrote:
>>
>> On the majority of architectures, including x86, you cannot simply copy
>> a piece of code elsewhere and have it still work.
>
> I thought we used -fPIC which would allow just that.
>
Doubly wrong. The kernel is not compiled with -fPIC, nor do
On Mon, Aug 12, 2013 at 09:02:02AM -0700, Andi Kleen wrote:
> "H. Peter Anvin" writes:
>
> > However, I would really like to
> > understand what the value is.
>
> Probably very little. When I last looked at it, the main overhead in
> perf currently seems to be backtraces and the ring buffer, not
On Mon, Aug 12, 2013 at 07:56:10AM -0700, H. Peter Anvin wrote:
> On 08/12/2013 02:17 AM, Peter Zijlstra wrote:
> >
> > I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
> > if-forest functions like perf_prepare_sample() and perf_output_sample().
> >
> > They are of the form:
> >
>
"H. Peter Anvin" writes:
> However, I would really like to
> understand what the value is.
Probably very little. When I last looked at it, the main overhead in
perf currently seems to be backtraces and the ring buffer, not this
code.
-Andi
--
a...@linux.intel.com -- Speaking for myself only
On 08/12/2013 02:17 AM, Peter Zijlstra wrote:
>
> I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
> if-forest functions like perf_prepare_sample() and perf_output_sample().
>
> They are of the form:
>
> void func(obj, args..)
> {
> unsigned long f = ...;
>
> if (f &
On Mon, Aug 05, 2013 at 12:55:15PM -0400, Steven Rostedt wrote:
> [ sent to both Linux kernel mailing list and to gcc list ]
>
Let me hijack this thread for something related...
I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
if-forest functions like perf_prepare_sample() and per
On Wed, 2013-08-07 at 12:03 -0400, Mathieu Desnoyers wrote:
> You might want to try creating a global array of counters (accessible
> both from C for printout and assembly for update).
>
> Index the array from assembly using: (2f - 1f)
>
> 1:
> jmp ...;
> 2:
>
> And put an atomic incr
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote:
>
> > Add short_counter,long_counter and before increment counter before each
> > jump. That way we will know how many short/long jumps were taken.
>
> That's not trivial at all. The jump is a
On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote:
> Add short_counter,long_counter and before increment counter before each
> jump. That way we will know how many short/long jumps were taken.
That's not trivial at all. The jump is a single location (in an asm
goto() statement) that happens
On Tue, Aug 06, 2013 at 08:56:00PM -0400, Steven Rostedt wrote:
> On Tue, 2013-08-06 at 20:45 -0400, Steven Rostedt wrote:
>
> > [3.387362] short jumps: 106
> > [3.390277] long jumps: 330
> >
> > Thus, approximately 25%. Not bad.
>
> Also, where these happen to be is probably even more
On Tue, 2013-08-06 at 20:45 -0400, Steven Rostedt wrote:
> [3.387362] short jumps: 106
> [3.390277] long jumps: 330
>
> Thus, approximately 25%. Not bad.
Also, where these happen to be is probably even more important than how
many. If all the short jumps happen in slow paths, it's rathe
On Tue, 2013-08-06 at 16:43 -0400, Steven Rostedt wrote:
> On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote:
>
> > Steve, perhaps you could add a mode to your binary rewriting program
> > that counts the number of 2-byte vs 5-byte jumps found, and if possible
> > get a breakdown of those
On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote:
> Steve, perhaps you could add a mode to your binary rewriting program
> that counts the number of 2-byte vs 5-byte jumps found, and if possible
> get a breakdown of those per subsystem ?
I actually started doing that, as I was curious t
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Tue, 2013-08-06 at 10:48 -0700, Linus Torvalds wrote:
>
> > So I wonder if this is a "ok, let's not bother, it's not worth the
> > pain" issue. 128 bytes of offset is very small, so there probably
> > aren't all that many cases that would use it.
On Tue, 2013-08-06 at 10:48 -0700, Linus Torvalds wrote:
> So I wonder if this is a "ok, let's not bother, it's not worth the
> pain" issue. 128 bytes of offset is very small, so there probably
> aren't all that many cases that would use it.
OK, I'll forward port the original patches for the hell
On Tue, Aug 6, 2013 at 7:19 AM, Steven Rostedt wrote:
>
> After playing with the patches again, I now understand why I did that.
> It wasn't just for optimization.
[explanation snipped]
> Anyway, if you feel that update_jump_label is too complex, I can go the
> "update at early boot" route and s
On 08/06/2013 09:26 AM, Steven Rostedt wrote:
>>
>> No, but if we ever end up doing MPX in the kernel, for example, we would
>> have to put an MPX prefix on the jmp.
>
> Well then we just have to update the rest of the jump label code :-)
>
For MPX in the kernel, this would be a small part of th
On Tue, 2013-08-06 at 09:19 -0700, H. Peter Anvin wrote:
> On 08/06/2013 09:15 AM, Steven Rostedt wrote:
> > On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
> >
> >> For unconditional jmp that should be pretty safe barring any fundamental
> >> changes to the instruction set, in which case
On 08/06/2013 09:15 AM, Steven Rostedt wrote:
> On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
>
>> For unconditional jmp that should be pretty safe barring any fundamental
>> changes to the instruction set, in which case we can enable it as
>> needed, but for extra robustness it probabl
On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
> For unconditional jmp that should be pretty safe barring any fundamental
> changes to the instruction set, in which case we can enable it as
> needed, but for extra robustness it probably should skip prefix bytes.
Would the assembler add
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
> Ugh. Why the crazy update_jump_label script stuff?
After playing with the patches again, I now understand why I did that.
It wasn't just for optimization.
Currently the way jump labels work is that we use asm goto() and place a
5 byte no
On 08/05/2013 09:14 PM, Mathieu Desnoyers wrote:
>>
>> For unconditional jmp that should be pretty safe barring any fundamental
>> changes to the instruction set, in which case we can enable it as
>> needed, but for extra robustness it probably should skip prefix bytes.
>
> On x86-32, some prefixe
* H. Peter Anvin (h...@linux.intel.com) wrote:
> On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote:
> > * Linus Torvalds (torva...@linux-foundation.org) wrote:
> >> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
> >> wrote:
> >>>
> >>> I remember that choosing between 2 and 5 bytes nop in the as
On Mon, 2013-08-05 at 22:26 -0400, Jason Baron wrote:
> I think if the 'cold' attribute on the default disabled static_key
> branch moved the text completely out-of-line, it would satisfy your
> requirement here?
>
> If you like this approach, perhaps we can make something like this work
> wit
On 08/05/2013 04:35 PM, Richard Henderson wrote:
On 08/05/2013 09:57 AM, Jason Baron wrote:
On 08/05/2013 03:40 PM, Marek Polacek wrote:
On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
wrote:
Ugh. I can see the attraction of you
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote:
>
[...]
> > My though is that the code above does not cover all jump encodings that
> > can be generated by past, current and future x86 assemblers.
> >
> > Another way around this issue mi
On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote:
> Another thing that bothers me with Steven's approach is that decoding
> jumps generated by the compiler seems fragile IMHO.
The encodings wont change. If they do, then old kernels will not run on
new hardware.
Now if it adds a third o
On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote:
> * Linus Torvalds (torva...@linux-foundation.org) wrote:
>> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
>> wrote:
>>>
>>> I remember that choosing between 2 and 5 bytes nop in the asm goto was
>>> tricky: it had something to do with the fact
* Linus Torvalds (torva...@linux-foundation.org) wrote:
> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
> wrote:
> >
> > I remember that choosing between 2 and 5 bytes nop in the asm goto was
> > tricky: it had something to do with the fact that gcc doesn't know the
> > exact size of each ins
On 08/05/2013 09:57 AM, Jason Baron wrote:
> On 08/05/2013 03:40 PM, Marek Polacek wrote:
>> On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
>>> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
>>> wrote:
Ugh. I can see the attraction of your section thing for that case, I
On 08/05/2013 02:39 PM, Steven Rostedt wrote:
On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote:
Of course, it would be good to optimize static_key_false() itself -
right now those static key jumps are always five bytes, and while they
get nopped out, it would still be nice if there was s
On Mon, 2013-08-05 at 12:57 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
> wrote:
> >
> > I remember that choosing between 2 and 5 bytes nop in the asm goto was
> > tricky: it had something to do with the fact that gcc doesn't know the
> > exact size of each in
On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
wrote:
>
> I remember that choosing between 2 and 5 bytes nop in the asm goto was
> tricky: it had something to do with the fact that gcc doesn't know the
> exact size of each instructions until further down within compilation
Oh, you can't do it
On 08/05/2013 03:40 PM, Marek Polacek wrote:
On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
wrote:
Ugh. I can see the attraction of your section thing for that case, I
just get the feeling that we should be able to do better some
On Mon, Aug 5, 2013 at 12:40 PM, Marek Polacek wrote:
>
> FWIW, we also support hot/cold attributes for labels, thus e.g.
>
> if (bar ())
> goto A;
> /* ... */
> A: __attribute__((cold))
> /* ... */
>
> I don't know whether that might be useful for what you want or not though...
Steve?
* Linus Torvalds (torva...@linux-foundation.org) wrote:
[...]
> With two-byte jumps, you'd still get the I$ fragmentation (the
> argument generation and the call and the branch back would all be in
> the same code segment as the hot code), but that would be offset by
> the fact that at least the ho
On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
> wrote:
> >
> > Ugh. I can see the attraction of your section thing for that case, I
> > just get the feeling that we should be able to do better somehow.
>
> Hmm.. Quite frankly, St
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt wrote:
> >
> > I had patches that did exactly this:
> >
> > https://lkml.org/lkml/2012/3/8/461
> >
> > But it got dropped for some reason. I don't remember why. Maybe because
> > of the comp
On Mon, Aug 5, 2013 at 12:16 PM, Steven Rostedt wrote:
> On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote:
>> Steven Rostedt writes:
>>
>> Can't you just use -freorder-blocks-and-partition?
>
> Yeah, I'm familiar with this option.
>
This option works best with FDO. FDOed linux kernel rocks
On Mon, Aug 5, 2013 at 12:04 PM, Andi Kleen wrote:
> Steven Rostedt writes:
>
> Can't you just use -freorder-blocks-and-partition?
>
> This should already partition unlikely blocks into a
> different section. Just a single one of course.
That's horrible. Not because of dwarf problems, but exactl
On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote:
> Steven Rostedt writes:
>
> Can't you just use -freorder-blocks-and-partition?
Yeah, I'm familiar with this option.
>
> This should already partition unlikely blocks into a
> different section. Just a single one of course.
>
> FWIW the dis
On Mon, 2013-08-05 at 11:51 -0700, H. Peter Anvin wrote:
> On 08/05/2013 11:49 AM, Steven Rostedt wrote:
> > On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
> >
> >> Traps nest, that's why there is a stack. (OK, so you don't want to take
> >> the same trap inside the trap handler, but th
Steven Rostedt writes:
Can't you just use -freorder-blocks-and-partition?
This should already partition unlikely blocks into a
different section. Just a single one of course.
FWIW the disadvantage is that multiple code sections tends
to break various older dwarf unwinders, as it needs
dwarf3 la
On Mon, 2013-08-05 at 11:34 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
> wrote:
> >
> > Ugh. I can see the attraction of your section thing for that case, I
> > just get the feeling that we should be able to do better somehow.
>
> Hmm.. Quite frankly, Steven, f
On Mon, Aug 5, 2013 at 11:51 AM, H. Peter Anvin wrote:
>>
>> Also, how would you pass the parameters? Every tracepoint has its own
>> parameters to pass to it. How would a trap know what where to get "prev"
>> and "next"?
>
> How do you do that now?
>
> You have to do an IP lookup to find out what
On 08/05/2013 11:49 AM, Steven Rostedt wrote:
> On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
>
>> Traps nest, that's why there is a stack. (OK, so you don't want to take
>> the same trap inside the trap handler, but that code should be very
>> limited.) The trap instruction just beco
On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt wrote:
>
> I had patches that did exactly this:
>
> https://lkml.org/lkml/2012/3/8/461
>
> But it got dropped for some reason. I don't remember why. Maybe because
> of the complexity?
Ugh. Why the crazy update_jump_label script stuff? I'd go "Eww"
On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
> Traps nest, that's why there is a stack. (OK, so you don't want to take
> the same trap inside the trap handler, but that code should be very
> limited.) The trap instruction just becomes very short, but rather
> slow, call-return.
>
>
On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote:
> Of course, it would be good to optimize static_key_false() itself -
> right now those static key jumps are always five bytes, and while they
> get nopped out, it would still be nice if there was some way to have
> just a two-byte nop (turn
On 08/05/2013 11:34 AM, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
> wrote:
>>
>> Ugh. I can see the attraction of your section thing for that case, I
>> just get the feeling that we should be able to do better somehow.
>
> Hmm.. Quite frankly, Steven, for your use ca
On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
wrote:
>
> Ugh. I can see the attraction of your section thing for that case, I
> just get the feeling that we should be able to do better somehow.
Hmm.. Quite frankly, Steven, for your use case I think you actually
want the C goto *labels* associat
On 08/05/2013 11:20 AM, Linus Torvalds wrote:
>
> Of course, it would be good to optimize static_key_false() itself -
> right now those static key jumps are always five bytes, and while they
> get nopped out, it would still be nice if there was some way to have
> just a two-byte nop (turning into
On 08/05/2013 11:23 AM, Steven Rostedt wrote:
> On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote:
>> On 08/05/2013 10:55 AM, Steven Rostedt wrote:
>>>
>>> Well, as tracepoints are being added quite a bit in Linux, my concern is
>>> with the inlined functions that they bring. With jump labels
On Mon, Aug 5, 2013 at 11:20 AM, Linus Torvalds
wrote:
>
> The static_key_false() approach with minimal inlining sounds like a
> much better approach overall.
Sorry, I misunderstood your thing. That's actually what you want that
section thing for, because right now you cannot generate the argumen
On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote:
> On 08/05/2013 10:55 AM, Steven Rostedt wrote:
> >
> > Well, as tracepoints are being added quite a bit in Linux, my concern is
> > with the inlined functions that they bring. With jump labels they are
> > disabled in a very unlikely way (t
On Mon, Aug 5, 2013 at 10:55 AM, Steven Rostedt wrote:
>
> My main concern is with tracepoints. Which on 90% (or more) of systems
> running Linux, is completely off, and basically just dead code, until
> someone wants to see what's happening and enables them.
The static_key_false() approach with
On 08/05/2013 10:55 AM, Steven Rostedt wrote:
>
> Well, as tracepoints are being added quite a bit in Linux, my concern is
> with the inlined functions that they bring. With jump labels they are
> disabled in a very unlikely way (the static_key_false() is a nop to skip
> the code, and is dynamical
On Mon, 2013-08-05 at 13:55 -0400, Steven Rostedt wrote:
> The difference between this and the
> "section" hack I suggested, is that this would use a "call"/"ret" when
> enabled instead of a "jmp"/"jmp".
I wonder if this is what Kris Kross meant in their song?
/me goes back to work...
-- Steve
On Mon, 2013-08-05 at 10:12 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt wrote:
> First off, we have very few things that are *so* unlikely that they
> never get executed. Putting things in a separate section would
> actually be really bad.
My main concern is wit
On Mon, 2013-08-05 at 10:02 -0700, H. Peter Anvin wrote:
> > if (x) __attibute__((section(".foo"))) {
> > /* do something */
> > }
> >
>
> One concern I have is how this kind of code would work when embedded
> inside a function which already has a section attribute. This could
> easily caus
On Mon, Aug 5, 2013 at 10:12 AM, Linus Torvalds
wrote:
>
> Secondly, you don't want a separate section anyway for any normal
> kernel code, since you want short jumps if possible
Just to clarify: the short jump is important regardless of how
unlikely the code you're jumping is, since even if you'
On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt wrote:
>
> Almost a full year ago, Mathieu suggested something like:
>
> if (unlikely(x)) __attribute__((section(".unlikely"))) {
> ...
> } else __attribute__((section(".likely"))) {
> ...
> }
It's almost certainly a horrible idea.
F
On 08/05/2013 09:55 AM, Steven Rostedt wrote:
>
> Almost a full year ago, Mathieu suggested something like:
>
> if (unlikely(x)) __attribute__((section(".unlikely"))) {
> ...
> } else __attribute__((section(".likely"))) {
> ...
> }
>
> https://lkml.org/lkml/2012/8/9/658
>
> Whic
66 matches
Mail list logo