Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-04 Thread H. Peter Anvin
On 05/04/2015 11:07 AM, Vladimir Makarov wrote: >>> >>>So I could implement the output reloads in LRA, probably for the >>> next GCC release. How to enable and mostly use it for multi-target >>> code like the kernel is another question. >> Pretty much all inline asm is in per arch code; so one

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-04 Thread Vladimir Makarov
On 02/05/15 08:43 AM, Peter Zijlstra wrote: On Fri, May 01, 2015 at 03:02:24PM -0400, Vladimir Makarov wrote: Currently LRA is used by x86/x86-64, ARM, AARCH64, s390, and MIPS. PPC, SH, and ARC are moving to LRA. All other targets are still reload based. So I could implement the output

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-04 Thread Richard Henderson
On 05/02/2015 05:39 AM, Peter Zijlstra wrote: > On Fri, May 01, 2015 at 01:49:52PM -0700, Linus Torvalds wrote: >> On Fri, May 1, 2015 at 12:02 PM, Vladimir Makarov >> wrote: >>> >>> GCC RA is a major reason to prohibit output operands for asm goto. >> >> Hmm.. Thinking some more about it, I th

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-04 Thread Peter Zijlstra
On Fri, May 01, 2015 at 05:16:30PM +0200, Peter Zijlstra wrote: > diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h > index cfe3b954d5e4..bcf4fa77c04f 100644 > --- a/arch/x86/include/asm/bitops.h > +++ b/arch/x86/include/asm/bitops.h > @@ -313,6 +313,15 @@ static __always_

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-02 Thread Peter Zijlstra
On Fri, May 01, 2015 at 03:02:24PM -0400, Vladimir Makarov wrote: > Currently LRA is used by x86/x86-64, ARM, AARCH64, s390, and MIPS. > PPC, SH, and ARC are moving to LRA. All other targets are still > reload based. > > So I could implement the output reloads in LRA, probably for the > next

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-02 Thread Peter Zijlstra
On Fri, May 01, 2015 at 01:49:52PM -0700, Linus Torvalds wrote: > On Fri, May 1, 2015 at 12:02 PM, Vladimir Makarov wrote: > > > > GCC RA is a major reason to prohibit output operands for asm goto. > > Hmm.. Thinking some more about it, I think that what would actually > work really well at lea

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Vladimir Makarov
On 01/05/15 04:49 PM, Linus Torvalds wrote: On Fri, May 1, 2015 at 12:02 PM, Vladimir Makarov wrote: GCC RA is a major reason to prohibit output operands for asm goto. Hmm.. Thinking some more about it, I think that what would actually work really well at least for the kernel is: (a) all

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Linus Torvalds
On Fri, May 1, 2015 at 12:02 PM, Vladimir Makarov wrote: > > GCC RA is a major reason to prohibit output operands for asm goto. Hmm.. Thinking some more about it, I think that what would actually work really well at least for the kernel is: (a) allow *memory* operands (ie "=m") as outputs and

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Vladimir Makarov
On 01/05/15 12:33 PM, Jakub Jelinek wrote: On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote: PPS. Jakub, I see gcc5.1 still hasn't got output operands for asm goto; is this something we can get 'fixed' ? CCing Richard as author of asm goto and Vlad as register allocator ma

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Ingo Molnar
* Peter Zijlstra wrote: > On Fri, May 01, 2015 at 06:33:29PM +0200, Jakub Jelinek wrote: > > On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote: > > > > PPS. Jakub, I see gcc5.1 still hasn't got output operands for asm goto; > > > > is this something we can get 'fixed' ? > > >

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Peter Zijlstra
On Fri, May 01, 2015 at 06:33:29PM +0200, Jakub Jelinek wrote: > On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote: > > > PPS. Jakub, I see gcc5.1 still hasn't got output operands for asm goto; > > > is this something we can get 'fixed' ? > > CCing Richard as author of asm goto a

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Linus Torvalds
On Fri, May 1, 2015 at 9:33 AM, Jakub Jelinek wrote: > > CCing Richard as author of asm goto and Vlad as register allocator > maintainer. There are a few enhancement requests to support this, like > http://gcc.gnu.org/PR59615 and http://gcc.gnu.org/PR52381 , but indeed the > reason why no outputs

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Jakub Jelinek
On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote: > > PPS. Jakub, I see gcc5.1 still hasn't got output operands for asm goto; > > is this something we can get 'fixed' ? CCing Richard as author of asm goto and Vlad as register allocator maintainer. There are a few enhancement re

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Peter Zijlstra
On Fri, May 01, 2015 at 06:16:54PM +0200, Peter Zijlstra wrote: > On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote: > > On Fri, May 1, 2015 at 8:16 AM, Peter Zijlstra wrote: > > > > > > Since test_bit() doesn't actually have any output variables, we can use > > > asm goto without hav

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Peter Zijlstra
On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote: > On Fri, May 1, 2015 at 8:16 AM, Peter Zijlstra wrote: > > PPS. Jakub, I see gcc5.1 still hasn't got output operands for asm goto; > > is this something we can get 'fixed' ? > > I suspect the problem is that now the particular

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Peter Zijlstra
On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote: > On Fri, May 1, 2015 at 8:16 AM, Peter Zijlstra wrote: > > > > Since test_bit() doesn't actually have any output variables, we can use > > asm goto without having to add a memory clobber. This reduces the code > > to something sensib

Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Linus Torvalds
On Fri, May 1, 2015 at 8:16 AM, Peter Zijlstra wrote: > > Since test_bit() doesn't actually have any output variables, we can use > asm goto without having to add a memory clobber. This reduces the code > to something sensible: Yes, looks good, except if we have anything that actually wants to us