Re: [regression] Re: brk randomization breaks columns

2008-02-05 Thread Jakub Jelinek
On Tue, Feb 05, 2008 at 01:54:26PM +0100, Ingo Molnar wrote:
> * Jiri Kosina <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 5 Feb 2008, Pavel Machek wrote:
> > 
> > > > Actually, this clearly shows that either prehistoric libc.so.5 or the 
> > > > program itself are broken.
> > > I believe it shows clear regression in latest 2.6.25 kernel.
> > 
> > I am still not completely sure. It might be a regression, but it also 
> > might just trigger the bug in ancient version in libc.so.5 which might 
> > be fixed in some later version [...]
> 
> which too is a regression ...
> 
> really, lets add a sysctl for this, and a .config option that either 
> disables or enables it. Then we will default to disabled. (but users can 
> enable it - and distros can build their kernels with this .config option 
> enabled)

I don't think kernel should care about programs which are buggy and make invalid
assumptions, and that's the case here.  I remember we have been through this
5 years ago when brk randomization has been added to Red Hat kernels.  There was
one or two broken programs which made assumptions on what brk(0) is supposed
to return at program startup, everything else was ok.
For the buggy apps there is always setarch i386 -R ./the_buggy_program
so I don't think we need to add another sysctl for this.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


asm-x86/sigcontext.h changes break userland

2008-02-12 Thread Jakub Jelinek
Hi!

The

x86: use generic register names in struct sigcontext
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=742fa54a62be6a263df14a553bf832724471dfbe

changeset breaks userland, e.g. it is not possible to compile gcc anymore
(both 32-bit and 64-bit libgcc), and I expect any other program which pokes
into struct sigcontext.  The register names with e resp. r have been in use
for years, what's the point breaking it now?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: asm-x86/sigcontext.h changes break userland

2008-02-13 Thread Jakub Jelinek
On Wed, Feb 13, 2008 at 08:26:50AM +0100, Ingo Molnar wrote:
> 
> * Jakub Jelinek <[EMAIL PROTECTED]> wrote:
> 
> > x86: use generic register names in struct sigcontext 
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=742fa54a62be6a263df14a553bf832724471dfbe
> > 
> > changeset breaks userland, e.g. it is not possible to compile gcc 
> > anymore (both 32-bit and 64-bit libgcc), and I expect any other 
> > program which pokes into struct sigcontext.  The register names with e 
> > resp. r have been in use for years, what's the point breaking it now?
> 
> ok - does the patch below solve the problem for you?

Yes, this fixes it.  Thanks.

FYI, gcc uses glibc headers to get at struct sigcontext, but
on i386 (and many other arches) glibc's  just includes
.  On x86_64, ia64 and sparc* glibc doesn't include
asm/sigcontext.h, but provides its own definitions, so for gcc itself
only changing 32-bit parts woiuld be enough.  That said, there are certainly
programs which include asm/sigcontext.h directly (plus there are other c
libraries, some of which may use asm/sigcontext.h).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86.git#mm] stack protector fixes, vmsplice exploit

2008-02-14 Thread Jakub Jelinek
On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote:
> The per function call overhead from stackprotector is already pretty 
> serious IMO, but at least that's something that GCC _could_ be doing 
> (much) smarter (why doesnt it jne forward out to __check_stk_failure, 
> instead of generating 4 instructions, one of them a default-mispredicted 
> branch instruction??), so that overhead could in theory be something 
> like 4 fall-through instructions per function, instead of the current 6.

Where do you see a mispredicted branch?
int foo (void)
{
  char buf[64];
  bar (buf);
  return 6;
}

-O2 -fstack-protector -m64:
subq$88, %rsp
movq%fs:40, %rax
movq%rax, 72(%rsp)
xorl%eax, %eax
movq%rsp, %rdi
callbar
movq72(%rsp), %rdx
xorq%fs:40, %rdx
movl$6, %eax
jne .L5
addq$88, %rsp
ret
.L5:
.p2align 4,,6
.p2align 3
call__stack_chk_fail
-O2 -fstack-protector -m32:
pushl   %ebp
movl%esp, %ebp
subl$88, %esp
movl%gs:20, %eax
movl%eax, -4(%ebp)
xorl%eax, %eax
leal-68(%ebp), %eax
movl%eax, (%esp)
callbar
movl$6, %eax
movl-4(%ebp), %edx
xorl%gs:20, %edx
jne .L5
leave
ret
.L5:
.p2align 4,,7
.p2align 3
call__stack_chk_fail
-O2 -fstack-protector -m64 -mcmodel=kernel:
subq$88, %rsp
movq%gs:40, %rax
movq%rax, 72(%rsp)
xorl%eax, %eax
movq%rsp, %rdi
callbar
movq72(%rsp), %rdx
xorq%gs:40, %rdx
movl$6, %eax
jne .L5
addq$88, %rsp
ret
.L5:
.p2align 4,,6
.p2align 3
call__stack_chk_fail

both with gcc 4.1.x and 4.3.0.
BTW, you can use -fstack-protector --param=ssp-buffer-size=4
etc. to tweak the size of buffers to trigger stack protection, the
default is 8, but e.g. whole Fedora is compiled with 4.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Jakub Jelinek
On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
> On 10/08, Linus Torvalds wrote:
> >
> > (not yet merged), see:
> >
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d
> 
> I do not really understand inline assembly constraints, but I'll ask
> anyway.
> 
>   +#define __GEN_RMWcc(fullop, var, cc, ...) \
>   +do { \
>   + asm volatile goto (fullop "; j" cc " %l[cc_label]" \
>   + : : "m" (var), ## __VA_ARGS__ \
> ^
> 
> don't we need
> 
>   "+m" (var)
> 
> here?

You actually can't have output operands with asm goto, only inputs
and clobbers.  But the "memory" clobber should be enough here.

If you suspect a compiler bug, can somebody please narrow it down to
a single object file (if I've skimmed the patch right, it is just an
optimization, where object files compiled without and with the patch
should actually coexist fine in the same kernel), ideally to a single
routine if possible and post a preprocessed source + gcc command line
+ version of gcc?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2013 at 04:46:56PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 09, 2013 at 04:33:59PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote:
> > > Once I force a x86_64 build using the 'same' config it goes away and
> > > generates 'sensible' code again (although I don't see why L9 isn't
> > > merged with L2):
> > 
> > i386-SMP also generates correct code afaict; a tad stupid but not wrong.
> > 
> > If I remove ftrace from the .config its still broken..
> > If I also remove the likely/unlikely tracer its still broken and lots
> > smaller:
> 
> OK, its -march=winchip2 that's buggered.

Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
unless somebody beats me to it.  But historically, the case where
asm goto labels jump to fallthru basic block had numerous problems in the
past.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
> > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
> > Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
> > unless somebody beats me to it.  But historically, the case where
> > asm goto labels jump to fallthru basic block had numerous problems in the
> > past.
> 
> That bug lists the component as middle end; this suggests x86_64 would
> be vulnerable too, can you confirm? So far we've only observed the wrong
> code on i386 targets, x86_64 targets appeared correct.

Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and
even say on ppc64 (sure, one would have to rewrite the asm to have it fail
at runtime).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 08:22:38AM +0200, Ingo Molnar wrote:
> > On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
> > >
> > > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 
> > > > 4.[6-9] miscompile it.  Will have a look tomorrow unless somebody 
> > > > beats me to it.  But historically, the case where asm goto labels 
> > > > jump to fallthru basic block had numerous problems in the past.
> > > 
> > > That bug lists the component as middle end; this suggests x86_64 would 
> > > be vulnerable too, can you confirm? So far we've only observed the 
> > > wrong code on i386 targets, x86_64 targets appeared correct.
> > 
> > Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and 
> > even say on ppc64 (sure, one would have to rewrite the asm to have it 
> > fail at runtime).
> 
> Please let us know once you know enough about the bug to suggest 
> workarounds. Because it's a nice optimization even extra instruction(s) 
> would be acceptable I suspect: we could perhaps put a NOP into a slowpath, 
> with an (unused) goto to it, or something like that?

IMHO you don't need to put there a nop, I guess asm (""); would be enough,
that will still make sure the label is never in the fallthru basic block
and the whole class of issues with asm goto with labels in the fallthru
bb can't hit.  The disadvantage is that it will generate worse code.

@@ -8,6 +8,7 @@ foo (int a, int b)
   asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
   return 0;
 lab:
+  asm ("");
   return 0;
 }

on the testcase from the PR results in something like:
#APP
# 8 "pr58670-1.c" 1
bts $1, -4(%rsp); jc .L3
# 0 "" 2
#NO_APP
.L5:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L3:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
movl$-3, %eax
ret
while code without the extra asm (""); and with a fixed compiler:
#APP
# 6 "pr58670.c" 1
bts $1, -4(%rsp); jc .L3
# 0 "" 2
#NO_APP
.L3:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
.L2:
movl$-3, %eax
ret

FYI, list of past compiler issues with asm goto include:
PR54127, PR46226, PR44071, PR52650, PR54455, PR51767.

I hope we get this fixed for 4.8.2, so you could then avoid
these hacks for GCC 4.8.2 and later.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 08:51:04AM +0200, Jakub Jelinek wrote:
> @@ -8,6 +8,7 @@ foo (int a, int b)
>asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
>return 0;
>  lab:
> +  asm ("");
>return 0;
>  }

Or alternatively put the asm (""); right after asm goto,
  asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
  asm ("");
  return ...;
lab;
  return ...;
What generates better code remains to be tested.  In any case, please
conditionalize the hacks on non-fixed compilers once the fix is released.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gcc4: Add 'asm goto' miscompilation quirk

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 10:24:30AM +0200, Ingo Molnar wrote:
> Something like the patch below? (Totally untested and all that.)
> 
> Notes:
> 
> - If the bug is fixed in 4.8.3 then the version check can be sharpened
>   from 9 to 40803.

The bug is likely going to be fixed already for 4.8.2 (to be released
next week or so).

> - I'd really prefer this quirk versus having to add the extra barrier to 
>   the label, as it makes the actual usage sites a lot less painful.

Please check how much it bloats the generated code.
Also, for the bitops patch, you probably want an asm_volatile_goto variant.

> --- a/include/linux/compiler-gcc4.h
> +++ b/include/linux/compiler-gcc4.h
> @@ -65,6 +65,19 @@
>  #define __visible __attribute__((externally_visible))
>  #endif
>  
> +/*
> + * GCC 'asm goto' miscompiles certain code sequences:
> + *
> + *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
> + *
> + * Work it around via quirk suggested by Jakub Jelinek.
> + * Not yet fixed, so use the quirk on all compiler versions:
> + */
> +#if GCC_VERSION <= 9
> +# define asm_goto(x...) do { asm goto(x); asm (""); } while (0)
> +#else
> +# define asm_goto(x...) do { asm goto(x); } while (0)
> +#endif
>  
>  #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
>  #if GCC_VERSION >= 40400

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, -v2] compiler/gcc4: Add quirk for 'asm goto' miscompilation bug

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 01:56:17PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 10, 2013 at 10:55:06AM +0200, Ingo Molnar wrote:
> > +/*
> > + * GCC 'asm goto' miscompiles certain code sequences:
> > + *
> > + *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
> > + *
> > + * Work it around via quirk suggested by Jakub Jelinek.
> > + * Fixed in GCC 4.8.2 and later versions.
> > + */
> > +#if GCC_VERSION <= 40801
> 
> We didn't do version checks for CC_HAVE_ASM_GOTO because of vendor
> backports; can't we detect this in the same way?

The problem is that it will be harder to check for this as compile time only
check, and for runtime check you'd need to have the assembly string for
every architecture and you couldn't do it for cross-compiling anyway.
For compile time only check, it wouldn't be 100% reliable, you could e.g.
check for that using -S -O2 -xc - -o - on:
int
foo (int a, int b)
{
  if (a)
return -3;
  asm volatile goto ("asm volatile goto to %l[lab]" : : "m" (b) : "memory" : 
lab);
  return 0;
lab:
  return 0;
}
and use awk on the resulting assembly to find out if the
asm volatile goto to (.*)$
string, then skip lines starting in column 0 with an
assembly comment character(s) (#, %, //, not sure if those 3 are all you can
see) and check that the first non-skipped line starts with the string matching
(.*) earlier followed by : (or perhaps skip other labels too?).
That said, the check could fail even in fixed gccs, so perhaps you want to
combine that with both version check and test, if version is >= 4.8.3
(note, while I hope it will be fixed in 4.8.2 release, people using
prerelease compilers would still have __GNUC_PATCHLEVEL__ == 2, at least
in upstream gcc (e.g. in Fedora/RHEL we patch down the patchlevel version,
so that __GNUC_PATCHLEVEL__ is 2 only for GCC release x.y.2 and following
snapshots, while upstream bumps patchlevel immediately after a release is
made), even with gcc containing that bug.  So for >= 4.8.3 just assume no
workaround is needed, otherwise scan assembly.

> 
> > +# define __asm_goto(vol, x...) do { asm vol goto(x); asm (""); } while (0)
> > +#else
> > +# define __asm_goto(vol, x...) do { asm vol goto(x); } while (0)
> > +#endif
> 
> This places the asm("") in the fallthrough case; but Jakub wrote:
> 
> > @@ -8,6 +8,7 @@ foo (int a, int b)
> >asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : 
> > lab);
> >return 0;
> >  lab:
> > +  asm ("");
> >return 0;
> >  }
> 
> Which places the asm ("") after the label, these two are not the same.

See the follow-up mails, I think placing it immediately after asm goto might
be better.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gcc4: Add 'asm goto' miscompilation quirk

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 07:04:18AM -0700, Richard Henderson wrote:
> On 10/10/2013 01:31 AM, Jakub Jelinek wrote:
> > Also, for the bitops patch, you probably want an asm_volatile_goto variant.
> 
> Why?  Asm without output (which asm goto must be) are automatically volatile.

You're right.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Friendlier EPERM - Request for input

2013-01-09 Thread Jakub Jelinek
On Wed, Jan 09, 2013 at 12:53:40PM -0800, Casey Schaufler wrote:
> I'm suggesting that the string returned by get_extended_error_info()
> ought to be the audit record the system call would generate, regardless
> of whether the audit system would emit it or not.

What system call would that info be for and would it be reset on next
syscall that succeeded, or also failed?

The thing is, various functions e.g. perform some syscall, save errno, do
some other syscall, and if they decide that the first syscall should be what
determines the whole function's errno, just restore errno from the saved
value and return.  Similarly, various functions just set errno upon
detecting some error condition in userspace.
There is no 1:1 mapping between many libc library calls and syscalls.
So, when would it be safe to call this new get_extended_error_info function
and how to determine to which syscall it was relevant?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] amusing copy_from_user bug

2001-04-10 Thread Jakub Jelinek

On Tue, Apr 10, 2001 at 03:11:05AM -0700, Dawson Engler wrote:
> As a side question: is it still true that verify_area's must be done before
> any use of __put_user/__get_user/__copy_from_user/etc?

I believe so, at least in generic code.
In architecture specific code (non-i386) it is usually sufficient just to do
one put_user/get_user/copy_from_user and then do the rest of
__put_user/__get_user etc. from nearby area (<4K is safe e.g. on sparc) and
some architectures don't care at all, because verify_area is a noop
(sparc64).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: shm_open doesn't work (fix maybe).

2001-04-24 Thread Jakub Jelinek

On Tue, Apr 24, 2001 at 11:46:20AM -0500, Tom Brusehaver (N-Sysdyne Corporation) wrote:
> 
> I have been chasing all around trying to find out why
> shm_open always returns ENOSYS. It is implemented
> in glibc-2.2.2, and seems the 2.4.3 kernel knows about
> shmfs.
> 
> It seems the file linux/mm/shmem.c has:
> #define SHMEM_MAGIC 0x01021994
> 
> And the glibc-2.2.2/sysdeps/unix/sysv/linux/linux_fsinfo.h has:
> #define SHMFS_SUPER_MAGIC 0x02011994
> 
> Well, which is correct?

Update your glibc, 2.2.3pre* matches 2.4.x kernel:

2001-03-03  Ulrich Drepper  <[EMAIL PROTECTED]>

* sysdeps/unix/sysv/linux/linux_fsinfo.h (SHMFS_SUPER_MAGIC):
Update for real 2.4 kernels.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: sendfile64?

2001-02-20 Thread Jakub Jelinek

On Tue, Feb 20, 2001 at 02:51:24PM +1300, Chris Wedgwood wrote:
> Why isn't there a sendfile64?
> 
> because nobody has implemented on -- arguably it's not needed; the
> different between:
> 
>   sendfile64(...)
> 
> and
> 
>   while(blah){
>   sendfile( ... 1G or so ...)
>   }
> 
> probably won't be detectable anyhow. I see no reason why sendfile64
> should be purely user-space (then again, I see no reason why not to
> extend the kernel API as is, but last time I tested it is was busted
> WRT signals so I would rather that be fixed before further
> proliferation there).

Wrong. sendfile takes a pointer to off_t, not loff_t, so you cannot replace
sendfile64 with multiple sendfile's if offset is non-NULL from userland.
It simply won't work properly on big files (no matter what size you transfer
at a time).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Posible bug in gcc

2001-02-26 Thread Jakub Jelinek

On Mon, Feb 26, 2001 at 05:15:28PM +, Alan Cox wrote:
> > I think I heve found a bug in gcc. I have tried both egcs 1.1.2 (gcc
> > 2.91.66) and gcc 2.95.2 versions.
> > 
> > I am attaching you a simplified test program ('bug.c', a really simple
> > program).
> 
> Well gcc-bugs would be the better place to send it but this is a known problem
> fixed in CVS gcc 2.95.3, CVS gcc 3.0 branch and gcc 2.96 (unofficial, Red Hat)

I'm not sure if it is known, at least not known to me, but definitely not
fixed in any of gcc 2.95.2, CVS gcc 3.0 branch, CVS gcc 3.1 head, gcc 2.96-RH.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Is sendfile all that sexy?

2001-01-16 Thread Jakub Jelinek

On Tue, Jan 16, 2001 at 10:05:06AM -0500, David L. Parsley wrote:
> Felix von Leitner wrote:
> > >   close (0);
> > >   close (1);
> > >   close (2);
> > >   open ("/dev/console", O_RDWR);
> > >   dup ();
> > >   dup ();
> > 
> > So it's not actually part of POSIX, it's just to get around fixing
> > legacy code? ;-)
> 
> This makes me wonder...
> 
> If the kernel only kept a queue of the three smallest unused fd's, and
> when the queue emptied handed out whatever it liked, how many things
> would break?  I suspect this would cover a lot of bases...

First it would break Unix98 and other standards:

The Single UNIX (R) Specification, Version 2
Copyright (c) 1997 The Open Group
...
 int open(const char *path, int oflag, ... );
...
The open() function will return a file descriptor for the named file that is the 
lowest file descriptor not currently
open for that process. The open file description is new, and therefore the file 
descriptor does not share it with any
other process in the system. The FD_CLOEXEC file descriptor flag associated with the 
new file descriptor will be
cleared.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Modprobe local root exploit

2000-11-14 Thread Jakub Jelinek

On Tue, Nov 14, 2000 at 10:42:41AM +, Malcolm Beattie wrote:
> Keith Owens writes:
> > All these patches against request_module are attacking the problem at
> > the wrong point.  The kernel can request any module name it likes,
> > using any string it likes, as long as the kernel generates the name.
> > The real problem is when the kernel blindly accepts some user input and
> > passes it straight to modprobe, then the kernel is acting like a setuid
> > wrapper for a program that was never designed to run setuid.
> 
> Rather than add sanity checking to modprobe, it would be a lot easier
> and safer from a security audit point of view to have the kernel call
> /sbin/kmodprobe instead of /sbin/modprobe. Then kmodprobe can sanitise
> all the data and exec the real modprobe. That way the only thing that
> needs auditing is a string munging/sanitising program.

Well, no matter what kernel needs auditing as well, the fact that dev_load
will without any check load any module the user wants is already problematic
and no munging helps with it at all, especially loading old ISA drivers
might not be a good idea.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



POSIX message queue passing (was Re: State of Posix compliance in v2.2/v2.4 kernel?)

2000-11-19 Thread Jakub Jelinek

On Sun, Nov 19, 2000 at 07:24:16PM +0900, GOTO Masanori wrote:
> At Mon, 13 Nov 2000 11:13:19 -0500,
> Jakub Jelinek <[EMAIL PROTECTED]> wrote:
> > ago were done in the kernel, POSIX message queue passing is not doable in
> > userland without kernel help either (I have a message queue filesystem
> > kernel patch for this, but it is a 2.5 thing).
> 
> Interesting. Is yours ready for?
> (I'm also working with it. I agree it's for 2.5)

Below is my preliminary version from Sep, 16th if you're interested.
I haven't had time for it since then, so it most probably will not apply
cleanly to current kernel.
Things still to do:
- clean it up
- implement poll on message queues
- handle __SI_RT in architectural copy_siginfo_to_user routines
- test much more than I have done so far
- fix mq_notify - see below
- avoid doing linear searches - see below

Message queues are presented as a new filesystem, mounted usually on
/dev/msg. The objects in that filesystems are fifos with special MQ
semantics.
One can use normal open/read/write on fifos in /dev/msg, which
means mq_open with mq_attr NULL, mq_receive which does not tell the priority
and mq_send with default priority.
Then there are a few ioctls which allow to open with special queue
attributes, send with priority and receive so that you get priority back,
etc.
Things I'm not sure about is mq_notify, because it states the signal should
be sent to the process (ie. I'd think it is tgid, not pid in 2.4.0-test8,
but then I don't know which close/exit should cause the notification
registration to be freed).
Also, I wonder how many pending messages typical message queues have
pending, if not too many, then the current linear search is fine, otherwise
I should put the messages into some heap which would allow O(1) mq_receive.
If you find any races/problems, please let me know.

I've coded mqueue.h public glibc userland header and mqueue.c which has
hacks on top and then basically what could end up in glibc's mq_*.c (after
shm_open.c code for locating mount points is copied in).

Jakub


--- linux/Documentation/ioctl-number.txt.jj Thu Jun 22 13:42:24 2000
+++ linux/Documentation/ioctl-number.txtFri Sep  8 13:16:42 2000
@@ -183,5 +183,6 @@ CodeSeq#Include FileComments
 0xB0   all RATIO devices   in development:
<mailto:[EMAIL PROTECTED]>
 0xB1   00-1F   PPPoX   <mailto:[EMAIL PROTECTED]>
+0xB2   00-1F   linux/mqueue.h
 0xCB   00-1F   CBM serial IEC bus  in development:
<mailto:[EMAIL PROTECTED]>
--- linux/include/asm-alpha/siginfo.h.jjSat May 27 02:49:37 2000
+++ linux/include/asm-alpha/siginfo.h   Mon Sep 11 13:30:50 2000
@@ -104,7 +104,7 @@ typedef struct siginfo {
 #define SI_KERNEL  0x80/* sent by the kernel from somewhere */
 #define SI_QUEUE   -1  /* sent by sigqueue */
 #define SI_TIMER __SI_CODE(__SI_TIMER,-2) /* sent by timer expiration */
-#define SI_MESGQ   -3  /* sent by real time mesq state change */
+#define SI_MESGQ __SI_CODE(__SI_RT,-3) /* sent by real time mesq state change */
 #define SI_ASYNCIO -4  /* sent by AIO completion */
 #define SI_SIGIO   -5  /* sent by queued SIGIO */
 
--- linux/include/asm-arm/siginfo.h.jj  Sat May 27 02:49:37 2000
+++ linux/include/asm-arm/siginfo.h Mon Sep 11 13:31:02 2000
@@ -104,7 +104,7 @@ typedef struct siginfo {
 #define SI_KERNEL  0x80/* sent by the kernel from somewhere */
 #define SI_QUEUE   -1  /* sent by sigqueue */
 #define SI_TIMER __SI_CODE(__SI_TIMER,-2) /* sent by timer expiration */
-#define SI_MESGQ   -3  /* sent by real time mesq state change */
+#define SI_MESGQ __SI_CODE(__SI_RT,-3) /* sent by real time mesq state change */
 #define SI_ASYNCIO -4  /* sent by AIO completion */
 #define SI_SIGIO   -5  /* sent by queued SIGIO */
 
--- linux/include/asm-i386/siginfo.h.jj Thu Sep  7 10:38:08 2000
+++ linux/include/asm-i386/siginfo.hMon Sep 11 13:31:15 2000
@@ -104,7 +104,7 @@ typedef struct siginfo {
 #define SI_KERNEL  0x80/* sent by the kernel from somewhere */
 #define SI_QUEUE   -1  /* sent by sigqueue */
 #define SI_TIMER __SI_CODE(__SI_TIMER,-2) /* sent by timer expiration */
-#define SI_MESGQ   -3  /* sent by real time mesq state change */
+#define SI_MESGQ __SI_CODE(__SI_RT,-3) /* sent by real time mesq state change */
 #define SI_ASYNCIO -4  /* sent by AIO completion */
 #define SI_SIGIO   -5  /* sent by queued SIGIO */
 
--- linux/include/asm-ia64/siginfo.h.jj Tue Aug 15 10:09:41 2000
+++ linux/include/asm-ia64/siginfo.hMon Sep 11 13:31:23 2000
@@ -113,7 +113,7 @@ typedef struct siginfo {
 #de

Re: Where did kgcc go in 2.4.0-test10 ?

2000-11-01 Thread Jakub Jelinek

On Wed, Nov 01, 2000 at 04:54:18PM -0700, Cort Dougan wrote:
> Since you're setting yourself up as a proponent of this can you explain why
> RedHat includes a compiler that doesn't work with the kernel?  Don't get

It actually does not compile only 2.2 kernels unless they are patched (the
patches so that they can work with gcc we ship are available from H.J.'s
site).
With 2.4, the gcc we shipped just prints some wrong cpp warnings (which have
been fixed long time ago) but compiles a workable kernel.
The thing then is really about what is the recommended compiler for
compiling kernel, and it is egcs 1.1.2 at the moment, not 2.95.2, nor our
2.96, nor CVS head (the last one is known to miscompile some things in the
kernel on x86).

> grumpy about who did it first or what the old one is named but be clear
> what I'm asking.  I want to know if the 'gcc' on RedHat 7.0 fixes some
> problems that the older compilers suffered from?  If there's a good reason

Yes, it fixes several problems the older compilers suffered from, see Richard
Henderson's posting about this on lkml from end of September.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: beware of dead string constants

2000-11-21 Thread Jakub Jelinek

On Tue, Nov 21, 2000 at 06:02:35AM -0600, Peter Samuelson wrote:
> 
> While trying to clean up some code recently (CONFIG_MCA, hi Jeff), I
> discovered that gcc 2.95.2 (i386) does not remove dead string
> constants:
> 
>   void foo (void)
>   {
> if (0)
>   printk(KERN_INFO "bar");
>   }
> 
> Annoyingly, gcc forgets to drop the "<6>bar\0".  It shows up in the
> object file, needlessly clogging your cachelines.

gcc was never dropping such strings, I've commited a patch to fix this
a week ago into CVS.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: gcc-2.95.2-51 is buggy

2000-11-24 Thread Jakub Jelinek

On Fri, Nov 24, 2000 at 06:20:33AM +0100, [EMAIL PROTECTED] wrote:
> >> ... RedHat's GCC snapshot "2.96" handles this case just fine.
> 
> > Now, if you can isolate the relevant part of the diff between
> > 2.95.2 and RH 2.96...
> 
> Maybe I have to be more precise in the statement "gcc 2.95.2 is buggy".
> 
> I just installed gcc 2.95.2 freshly ftp'ed from ftp.gnu.org, and
> 
> % /usr/bin/gcc -v
> Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
> gcc version 2.95.2 19991024 (release)
> % /usr/bin/gcc -Wall -O2 -o bug bug.c; ./bug
> 0x8480
> % /usr/gcc/aeb/bin/gcc -v
> Reading specs from /usr/gcc/aeb/lib/gcc-lib/i686-pc-linux-gnu/2.95.2/specs
> gcc version 2.95.2 19991024 (release)
> % /usr/gcc/aeb/bin/gcc -Wall -O2 -o nobug bug.c; ./nobug
> 0x0
> 
> So, not all versions of gcc 2.95.2 are equal.

I believe all 2.95.2's are equal in this, I think the fact that it gives 0
in the nobug case is some other reason:

$ for i in gcc kgcc '/usr/src/gcc-trunk/obj/gcc/xgcc -B /usr/src/gcc-trunk/obj/gcc/' 
'/usr/src/gcc-2.95.2/obj/gcc/xgcc -B /usr/src/gcc-2.95.2/obj/gcc/'; do $i -v; for j in 
-mcpu=i386 -mcpu=i586 -mcpu=i686; do $i $j -O2 -o aeb aeb.c; echo -n "$i $j "; ./aeb; 
done; done
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 2731 (Red Hat Linux 7.0)
gcc -mcpu=i386 0x0
gcc -mcpu=i586 0x0
gcc -mcpu=i686 0x0
Reading specs from /usr/lib/gcc-lib/i386-glibc21-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
kgcc -mcpu=i386 0x0
kgcc -mcpu=i586 0x0
kgcc -mcpu=i686 0x0
Reading specs from /usr/src/gcc-trunk/obj/gcc/specs
Configured with:
gcc version 2.97 20001120 (experimental)
/usr/src/gcc-trunk/obj/gcc/xgcc -B /usr/src/gcc-trunk/obj/gcc/ -mcpu=i386 0x0
/usr/src/gcc-trunk/obj/gcc/xgcc -B /usr/src/gcc-trunk/obj/gcc/ -mcpu=i586 0x0
/usr/src/gcc-trunk/obj/gcc/xgcc -B /usr/src/gcc-trunk/obj/gcc/ -mcpu=i686 0x0
Reading specs from /usr/src/gcc-2.95.2/obj/gcc/specs
gcc version 2.95.2 19991024 (release)
/usr/src/gcc-2.95.2/obj/gcc/xgcc -B /usr/src/gcc-2.95.2/obj/gcc/ -mcpu=i386 0x8480
/usr/src/gcc-2.95.2/obj/gcc/xgcc -B /usr/src/gcc-2.95.2/obj/gcc/ -mcpu=i586 0x8480
/usr/src/gcc-2.95.2/obj/gcc/xgcc -B /usr/src/gcc-2.95.2/obj/gcc/ -mcpu=i686 0x0

so the reason why it did not show up in the gcc you picked up from
ftp.gnu.org is that you have compiled it so that it defaults to -mcpu=i686
where the bug does not show up.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: initdata for modules?

2000-11-26 Thread Jakub Jelinek

On Mon, Nov 27, 2000 at 09:54:57AM +1100, Keith Owens wrote:
> On Sun, 26 Nov 2000 07:30:44 -0800, 
> "Adam J. Richter" <[EMAIL PROTECTED]> wrote:
> > In reading include/linux/init.h, I was surprised to discover
> >that __init{,data} expands to nothing when compiling a module.
> >I was wondering if anyone is contemplating adding support for
> >__init{,data} in module loading, to reduce the memory footprints
> >of modules after they have been loaded.
> 
> It has been discussed a few times but nothing was ever done about it.

Well, I've actually implemented it few years ago and even current modutils
you maintain support that already (see runsize member of struct module and
how is it assigned). __init stuff was not stored in a separate page and was
initially vmalloced together with the whole module, the only vm addition was
a shrink for a vmalloc area where it would free some pages from the end of
the area.
It lived in sparclinux-cvs for quite some time, but Linus have not accepted
it (I've posted several times).
I can dig the patch out of sparclinux CVS if anyone is interested.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] modutils 2.3.20 and beyond

2000-11-27 Thread Jakub Jelinek

On Mon, Nov 27, 2000 at 05:48:28PM +0100, Jes Sorensen wrote:
> > "Keith" == Keith Owens <[EMAIL PROTECTED]> writes:
> 
> Keith> On Sun, 26 Nov 2000 16:36:55 -0700, "Jeff V. Merkey"
> Keith> <[EMAIL PROTECTED]> wrote:
> >> Keith,
> >> 
> >> Please consider the attached patch for inclusion in all future
> >> versions of the modutils depmod program for compatiblity with
> >> RedHat and RedHat derived Linux distributions.
> 
> Keith> I have a big problem with Redhat.  They make incompatible
> Keith> changes to utilities, do not feed patches back to maintainers
> Keith> then expect the rest of the world to follow their lead.  The -i
> Keith> and -m flags to modutils are not the only example, I recently
> Keith> found IA64 and Sparc patches they had added to modutils code
> Keith> and not bothered to tell me.  Other distributors are much
> Keith> better about sending me patches, Debian and SuSe in particular
> Keith> do the right thing.
> 
> I don't remember where the ia64 modutils patches come from, there were
> some floating around between the ia64 developers for a while. The
> sparc patches I don't have a clue about where come from.

The sparc patches were not sent just because of lack of time on my part,
Jeff Johnson wrote it so that modules compiled with sparc64 gcc 2.96
(basically anything which generates OLO10 relocations) can be inserted and I
wanted to review/test it first myself (and did not get to it early enough).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Compiler warnings

2000-09-06 Thread Jakub Jelinek

On Wed, Sep 06, 2000 at 10:05:46PM +0200, [EMAIL PROTECTED] wrote:
> 
> I'm trying to compile 2.2.17 with gcc 2.96, and it shows a lot of
> warnings like this in several files.

First of all, you should not use gcc 2.96 for 2.2.x kernel compiles, only
2.4 should work.

> warning: pasting would not give a valid preprocessing token

I've fixed this recently. Some of these warnings were actually valid, I'll
post a kernel patch for these soon, but most of them were bogus warnings
when kernel was using GNU , ## restargs extension.

> 
> And fails to compile with the error:
> checksum.S:231: badly punctuated parameter list in #define

One cannot preprocess with -traditional and use macros with variable
arguments in gcc 2.96. 2.4 does not use -traditional for this file.

> 
> It's the update to gcc2.96 causing this problems?? How can i get to
> compile the kernel?

If you're using recent Red Hat distributions, use kgcc compiler instead of
gcc to compile the kernel.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Compiler warnings

2000-09-06 Thread Jakub Jelinek

On Thu, Sep 07, 2000 at 08:53:29AM +1100, Keith Owens wrote:
> On Wed, 6 Sep 2000 21:49:44 +0100 (BST), 
> Alan Cox <[EMAIL PROTECTED]> wrote:
> >Use a different gcc. There are reasons people shipping 2.96 for intel x86 also
> >include egcs. The kernel isnt ready for 2.96
> 
> Out of curiousity, which compiler would you recommend for IA64 kernels?
> The latest unwind code is in the bleeding edge version of gcc, which
> just happens to have the problems with '##' as well.

Obviously 2.96. I'm using it for 2.4 x86 and sparc64 kernels as well.
We were talking about 2.2.17 though, and I don't think 2.2 kernels work on
ia64...
If you want the '##' fix, grab
http://gcc.gnu.org/ml/gcc-patches/2000-09/msg6.html

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: files bigger than 2 GB

2000-09-12 Thread Jakub Jelinek

On Tue, Sep 12, 2000 at 03:12:34PM +0100, Alan Cox wrote:
> > I need support for files larger than 2GB.  What's the status for that ?  
> 
> 2.2 + patches or 2.4 test and glibc 2.1.9x

And make sure the utilities you want to work with those 2GB+ files were
compiled with -D_FILE_OFFSET_BITS=64 (check e.g. with nm -uD /your/binary | grep 64\$
).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Kernel 2.2.17 with RedHat 7 Problem !

2000-10-23 Thread Jakub Jelinek

On Mon, Oct 23, 2000 at 12:06:31PM +, David Wragg wrote:
> Gregory Maxwell <[EMAIL PROTECTED]> writes:
> > If 2.96 is broken, I'd appreciate it if you would describe the breakage. 
> 
> As in the RedHat 2.96?  Try compiling the following on RedHat 7.0 x86
> with "gcc -O2" and take a look at the generated code.  Nice, isn't it?
> 
> 
> #include 
> 
> void foo(void)
> {
> struct itimerval iv;
> 
> iv.it_interval.tv_sec = 0;
> iv.it_interval.tv_usec = 25;
> iv.it_value = iv.it_interval;
> 
> setitimer(ITIMER_REAL, &iv, NULL);
> }

Yes, this is a bug in the compiler (which I hope to fix today, CVS gcc is
broken as well), though the actual place which causes this to be miscompiled
is in the system headers where a restrict keyword is used on an incomplete
struct timeval forward definitions pointer and due to bug is set in the type
structure itself (at least that's my guess, need to run it under debugger
today - but if the select prototype is moved after the full struct timeval
definition, everything works correctly). Note that gcc 2.95.2 has some
restrict keyword related bugs as well (which glibc had to work around in
the headers; the bug was in 2.95.x only), it is not just 2.96.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre6: Use of abs()

2000-10-30 Thread Jakub Jelinek

On Mon, Oct 30, 2000 at 03:01:16PM +0100, Martin Dalecki wrote:
> Horst von Brand wrote:
> > 
> > Red Hat 7.0, i686, gcc-20001027 (Yes, I know. Just to flush out bugs on
> > both sides).
> > 
> > abs() is used at least in:
> > 
> > arch/i386/kernel/time.c
> > drivers/md/raid1.c
> > drivers/sound/sb_ess.c
> > 
> > gcc warns about use of a non-declared function each time.
> > 
> > No definition for the function is to be found (grep over all include/ comes
> > up clean, except for extern definitions in asm-{mips,ppc}; ditto for lib/).
> > Presumably gcc is using a builtin (it doesn't show up in System.map). Is
> > this the desired state of affairs? Should a include/linux/stdlib.h be
> 
> Yes abs will be transformed into an internal function, which will be
> fully
> unrolled due to -O2.

No matter what it should be prototyped in some header. And all uses should
be checked, because abs is 
int abs (int) __attribute__ ((__const__));
and sometimes people use it on `long' instead (such a bug has been fixed in
the kernel some months ago).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Recommended compiler? - Re: [patch] kernel/module.c (plus gratuitous rant)

2000-10-30 Thread Jakub Jelinek

On Mon, Oct 30, 2000 at 05:50:07PM -0300, Horst von Brand wrote:
> Martin Dalecki <[EMAIL PROTECTED]> said:
> > Peter Samuelson wrote:
> 
> [...]
> 
> > > * Red Hat "2.96" or CVS 2.97 will probably break any known kernel.
> 
> > Works fine for me and 2.4.0-test10-pre5... however there are tons of
> > preprocessor warnings in some drivers.
> 
> CVS (from 20001028 or so) gave a 2.4.0.10.6/i686 that crashed on boot, no
> time to dig deeper yet.

CVS 2.97 is known to miscompile e.g. buffer.c.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: non-gcc linux?

2000-11-05 Thread Jakub Jelinek

On Sun, Nov 05, 2000 at 01:52:24PM -0700, Tim Riker wrote:
> Alan,
> 
> Perhaps I did not explain myself, or perhaps I misunderstand your
> comments. I was responding to a comment that we could just copy some of
> the optimizations from Pro64 over into gcc.

That's hard to do, because the whole gcc has copyright assigned to FSF,
which means that either gcc steering committee would have to make an
exception from this for SGI, or SGI would have to be willing to assign some
code to FSF.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: State of Posix compliance in v2.2/v2.4 kernel?

2000-11-13 Thread Jakub Jelinek

On Mon, Nov 13, 2000 at 11:00:09AM -0500, Jeff Garzik wrote:
> [EMAIL PROTECTED] wrote:
> > Sorry if this is a FAQ, but I've searched the archives for this list
> > (http://www.uwsg.iu.edu/hypermail/linux/kernel/) and only come with references
> > from 1996!
> > 
> > What is the state of Posix-compliant services (threads, semaphores, timers,
> > etc.) in the current (v2.2/v2.4) Linux kernels?
> 
> IMHO this is a question better asked of glibc people, not kernel people.
> 
> The kernel does its best to facilitate POSIX compliances,

Well, it does not do its best. There are several areas where kernel should
help, things like POSIX semaphores would be much faster with kernel support,
likewise threads if some things Ulrich stated here a couple of months
ago were done in the kernel, POSIX message queue passing is not doable in
userland without kernel help either (I have a message queue filesystem
kernel patch for this, but it is a 2.5 thing).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Signal 11

2000-12-14 Thread Jakub Jelinek

On Thu, Dec 14, 2000 at 04:42:03AM -0800, Clayton Weaver wrote:
> There has a been a thread on the teTeX mailing list the last few days
> about a (RedHat, but probably more general than just their rpms)
> gcc-2.9.6 w/glibc-2.2.x bug. At -O2, it can miscompile 
> 
> unsigned varname; /* "unsigned int varname;" is ok */
> 
> (no problem at -O or no optimization at all, and doesn't happen if teTeX
> is compiled with kgcc).

That one is fixed already for some time, it was a bug in loop unrolling
(that patch is still pending review for the mainline CVS though).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Signal 11

2000-12-14 Thread Jakub Jelinek

On Thu, Dec 14, 2000 at 11:11:28AM -0800, Linus Torvalds wrote:
> user applications and (b) gcc-2.96 is so broken that it requires special
> libraries for C++ vtable chunks handling that is different, so the
> _working_ gcc can only be used with programs that do not need such
> library support.

Every major g++ release had incompatible libstdc++, even g++ 2.95.2 if
bootstrapped under glibc 2.1.x is binary incompatible with g++ 2.95.2
bootstrapped under glibc 2.2.x (libstdc++ uses different soname then;
even if we used g++ 2.95.2 we would not have C++ binary compatible with
other distributions).
This will change once 3.0 is out, but it will still take some time.

> compiler to something that works better RSN.  It apparently has problems
> compiling stuff like the CVS snapshots of X etc too (and obviously,
> anything you compile under gcc-2.96 is not likely to work anywhere else
> except with the broken libraries). 

Can you point to things in X which were actually miscompiled because of bugs
in gcc 2.96? So far I was aware about X bugs (already fixed in X CVS) which
were triggered with -fstrict-aliasing which is now the default while
gcc 2.95.2 had -fstrict-aliasing disabled by default.
That is not to say there were not bugs in the gcc we shipped, but the bugs
which were reported against it have been fixed already.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: i386: gcc & asm(): wrong constraint for "mull"

2000-12-29 Thread Jakub Jelinek

On Fri, Dec 29, 2000 at 10:54:38AM +0100, Ulrich Windl wrote:
> Hello,
> 
> I noticed (with some inspiration from Andy Kleen) that some asm() 
> instructions for the ia32 use the "g" constraint for "mull", where my 
> Intel 386 Assembly Language Manual suggests the "MUL" instruction needs 
> an r/m operand. So I guess the correct constraint is "rm" in gcc, and 
> not "g". That change identical assembly output for gcc-2.95.2, but some 
> gcc-2.96.x will try a multiplication with an immediate (constant) 
> operand for the "g" constarint, and the as will choke on that.
> (Redhat 7.0 ships such a version of gcc).

gcc 2.95.2 md.texi sais:
@cindex @samp{g} in constraint
@item @samp{g}
Any register, memory or immediate integer operand is allowed, except for
registers that are not general registers.

(2.95.2 was chosen to make it clear it is not something new in gcc).
That means gcc is really free to choose which of register, memory or
immediate it puts in and the fact that some gcc version choose one and
others choose other is perfectly correct.
Fix the constraints and be happy (at least during the upcoming millenium) :)

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: ["Michael N. Lipp" ] Can't compile linus 2.2.17 with latest gcc due to checksum.S

2000-09-26 Thread Jakub Jelinek

On Tue, Sep 26, 2000 at 08:20:49AM +0200, Michael N. Lipp wrote:
> Hi,
> 
> I can't compile the latest linux kernel with the latest gcc due to a
> strange define in checksum.S. The gcc preprocessor complains about
> the usage of elipses in the macros
> 
> #define SRC(y...) \
>   : y;\
>   .section __ex_table, "a";   \
>   .long b, 6001f  ;   \
>   .previous
> 
> #define DST(y...) \
>   : y;\
>   .section __ex_table, "a";   \
>   .long b, 6002f  ;   \
>   .previous
> 
> And I do agree, they look very strange. I tried adding comma
> (#define SRC(y,...)) as this is what it should look like, but then
> I get errors for the usage lines (SRC(1:movw (%esi), %bx)) and
> again I understand the preprocessor very well.
> 
> As egcs and gcc have re-merged and thus the latest gcc is really
> the next egcs, I consider this a real problem.

You should not compile 2.2.x kernels with latest gcc, use egcs-1.1.2 for it.
Nobody has actually tested if 2.2.x kernels work with gcc 2.96, so even if
you get over this (hint - remove -traditional from checksum.S's gcc
options), you might be surprised by other things.
The -traditional preprocessor in current gcc is really K&R, so stuff like
GNU restargs extensions are not present there.
If you're on Red Hat Linux 7, use kgcc compiler instead of gcc to build the
kernel, otherwise check out your distribution to see where egcs (or gcc
2.95) lives.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Can't compile linus 2.2.17 with latest gcc due to checksum.S

2000-09-26 Thread Jakub Jelinek

On Tue, Sep 26, 2000 at 05:42:59PM +0200, Mads Martin Joergensen wrote:
> * Timur Tabi <[EMAIL PROTECTED]> [Sep 26. 2000 17:36]:
> > > Maybe this can be fixed for 2.96, but it breaks badly elsewhere (doesn't
> > > compile; kernel builds but hangs/crashes at boot; kernel appears to work
> > > fine while it is busy eating your disk; ...)
> > 
> > Why is 2.96 so screwed up?  I mean, the version numbers imply that 2.96 is a
> > minor bugfix over 2.95, but your comments make it sound like it's a major
> > change.
> 
> Maybe because gcc 2.96 have not been released yet, and therefore not is
> bugfree yet?

Have you actually seen a bugfree compiler? I don't expect to ever seen any.
Anyway, this is more about 2.2 kernels relying on certain things which 2.96
might no longer guarantee because of some optimizations. E.g. 2.96 compiled
2.4.0-testx kernels work pretty well on ia32, sparc64, alpha and ia64 AFAIK.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: warning message posted from apic.h

2000-10-03 Thread Jakub Jelinek

On Fri, Sep 29, 2000 at 11:09:33AM +, Stephen Torri wrote:
> I get the following message compiling 2.4.0-test6 or test8 on a RedHat 7
> system. "/usr/src/linux/include/asm/apic.h:13:29: warning: nothing can be
> posted after this token". Is this an issue with apic?

Yes, this one is apic.h bug which RHL 7 cpp warns about:

--- linux/include/asm-i386/apic.h.jj   Mon Oct  2 20:01:18 2000
+++ linux/include/asm-i386/apic.h  Tue Oct  3 23:50:33 2000
@@ -10,7 +10,7 @@
 #ifdef CONFIG_X86_LOCAL_APIC

 #if APIC_DEBUG
-#define Dprintk(x...) printk(##x)
+#define Dprintk(x...) printk(x)
 #else
 #define Dprintk(x...)
 #endif

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test9-pre8 on SPARC build failure

2000-10-03 Thread Jakub Jelinek

On Tue, Oct 03, 2000 at 10:41:57PM -0700, Dr. Kelsey Hudson wrote:
> > Question is, is this still broken on -test9-final or did
> > the fix Linus merged earlier today get rid of your problems?
> 
> Let me try this and find out...
...
> making dep...
> 
> ::curses his SS20 for being so SLOW!::
> I need better than a 50MHz processor in this damn thing. :) Better yet, I
> need a better machine! :) Got any donations? Just kidding.
> 
> ...Ok...Making boot...
> 
> Damn. A good 2 hours later and it looks as though the compile exited
> cleanly :) yaaay! 
> 
> The answer to your question is yes, the fix Linus put in today fixed the
> problem :)

This does tell nothing if the pcibios thing is fixed or not, because you
most probably did not configure PCI on your sparc32 (why would you do that,
when you don't have a JavaStation?).
So you have to either look at the code or configure PCI in...

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Why does everyone hate gcc 2.95?

2000-10-04 Thread Jakub Jelinek

On Tue, Oct 03, 2000 at 11:12:24PM -0700, [EMAIL PROTECTED] wrote:
> No, better yet,
> what is a good version to use when porting to a new processor (actually
> an old processor)?  I've pulled the source to gcc (2.95.2) and binutils
> (2.10) in prep for a port to a new/old machine.  If these versions aren't
> good to start from, what versions are and where can I find them?

Those versions surely are not good to start from for doing new ports.
There is almost 2 years of development gone since 2.95 was frozen and many
things have changed, so if you start with 2.95.2, you'll have a hard time
forward porting it to gcc 3.
With binutils it probably does not matter much, but it could be easier to
use CVS as well.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Updated 2.4 TODO List -- new addition WAS(test9 PCI resourcecollisions (fwd)

2000-10-11 Thread Jakub Jelinek

On Tue, Oct 10, 2000 at 11:32:43PM -0500, Gnea wrote:
> 
> On Tue, 10 Oct 2000 19:56:46 -0400 (EDT), jamal blurted forth:
> 
> > 
> >  Ted,
> >  
> >  Please add this to your list. Linux is unusable in these machines.
> >  I have cc'ed Martin and Linus because they play in that PCI area.
> 
> erm, looking at your list it says that you're using Redhat 7.0, which
> is known to ship with a buggy gcc, which is KNOWN to do nasty things
> with kernels.

Can you tell me (when it is KNOWN) what nasty things does that gcc do to
kernels? The thing that it does not compile vanilla 2.2.x kernels is not its
fault, and if you choose to either use K&R preprocessing in assembly (but
then no GNU extensions) or ANSI preprocessing plus you export memset/memcpy,
it will actually build and work, see H.J.'s patchlets:
http://www.lucon.org/linux/linux-2.2.14-gcc.patch
http://www.lucon.org/linux/linux-2.2.17-library.patch

The fact that we recommend using kgcc (especially for 2.2 kernels) does not
mean that the default gcc is broken, but simply that using it for kernels
has not been tested yet too much and there can be e.g. bugs in the way
kernel uses inline assembly and the likes.

> 
> Linux version 2.4.0-test9-JHS1 ([EMAIL PROTECTED]) (gcc
> version 2.96 2
> 731 (Red Hat Linux 7.0)) #2 Thu Oct 5 11:59:31 EDT 2000
> 
> yeah, that pretty much sums it up right there.. you may want to try
> something else.

See above, it does not sum up anything. The only thing is that if somebody
is reporting a bug on lkml, he'd just better first made sure it is
reproduceable with kgcc as well (bug reports for kernels compiled with
gcc 2.95 have been handled this way for a long time on lkml).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: TODO: drivers/pcmcia/ds.c: ds_read & ds_write. SMP locks are missing fix

2000-10-12 Thread Jakub Jelinek

On Thu, Oct 12, 2000 at 11:38:11AM -0400, Yong Chi wrote:
> Hopefully this will do for SMP locks.  =)

Holding a spinlock for this long (especially when you might sleep there in two
places (interruptible_sleep_on, put_user)) is basically a bad idea.
spinlocks are designed to be holded only for short time.
Either protect just a small critical section with a spinlock, or use
semaphores.

> --- ds.c.bak  Wed Oct 11 13:05:16 2000
> +++ ds.c  Thu Oct 12 11:25:20 2000
> @@ -95,6 +95,7 @@
>  u_intuser_magic;
>  int  event_head, event_tail;
>  event_t  event[MAX_EVENTS];
> +spinlock_t  lock;
>  struct user_info_t   *next;
>  } user_info_t;
>  
> @@ -567,6 +568,7 @@
>  user->event_tail = user->event_head = 0;
>  user->next = s->user;
>  user->user_magic = USER_MAGIC;
> +spin_lock_init(&user->lock);
>  s->user = user;
>  file->private_data = user;
>  
> @@ -616,6 +618,7 @@
>  socket_t i = MINOR(file->f_dentry->d_inode->i_rdev);
>  socket_info_t *s;
>  user_info_t *user;
> +ssize_t retval=4;
>  
>  DEBUG(2, "ds_read(socket %d)\n", i);
>  
> @@ -625,16 +628,23 @@
>   return -EINVAL;
>  s = &socket_table[i];
>  user = file->private_data;
> -if (CHECK_USER(user))
> - return -EIO;
> -
> +spin_lock(&user->lock);
> +if (CHECK_USER(user)) {
> + retval= -EIO;
> +goto read_out;
> +}
> +
>  if (queue_empty(user)) {
>   interruptible_sleep_on(&s->queue);
>   if (signal_pending(current))
> - return -EINTR;
> + retval= -EINTR;
> +goto read_out;
>  }
>  put_user(get_queued_event(user), (int *)buf);
> -return 4;
> +
> +read_out:
> +spin_unlock(&user->lock);
> +return retval;
>  } /* ds_read */
>  
>  /**/
> @@ -645,6 +655,7 @@
>  socket_t i = MINOR(file->f_dentry->d_inode->i_rdev);
>  socket_info_t *s;
>  user_info_t *user;
> +ssize_t retval=4;
>  
>  DEBUG(2, "ds_write(socket %d)\n", i);
>  
> @@ -656,18 +667,25 @@
>   return -EBADF;
>  s = &socket_table[i];
>  user = file->private_data;
> -if (CHECK_USER(user))
> - return -EIO;
> +spin_lock(&user->lock);
> +if (CHECK_USER(user)) {
> + retval= -EIO;
> + goto write_out;
> +}
>  
>  if (s->req_pending) {
>   s->req_pending--;
>   get_user(s->req_result, (int *)buf);
>   if ((s->req_result != 0) || (s->req_pending == 0))
>   wake_up_interruptible(&s->request);
> -} else
> - return -EIO;
> +} else {
> + retval= -EIO;
> + goto write_out;
> +}
>  
> -return 4;
> +write_out:
> +spin_unlock(&user->lock);
> +return retval;
>  } /* ds_write */
>  
>  /**/


Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Updated Linux 2.4 Status/TODO List (from the ALS show)

2000-10-13 Thread Jakub Jelinek

On Fri, Oct 13, 2000 at 02:17:23PM -0700, Richard Henderson wrote:
> On Fri, Oct 13, 2000 at 12:45:47PM +0100, Alan Cox wrote:
> > Can we always be sure the rss will fit in an atomic_t - is it > 32bits on the
> > ultrsparc/alpha ?
> 
> It is not.

It is not even 32bit on sparc32 (24bit only).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: pthreads & fork & execve

2001-04-02 Thread Jakub Jelinek

On Mon, Apr 02, 2001 at 09:54:25AM -0300, Gustavo Niemeyer wrote:
> Hi Richard! Hi Dennis!
> 
> > I tracked this down to a corrupt jumptable somewhere in the pthreads
> > part of the libc (didnt have the source handy at that time, though). So
> > I think this is a libc bug (version does not matter) - I even did a
> > followup to a similar bug in the libc gnats database (I think I should
> > have opened a new one, though...). But I failed to construct a "simple"
> > testcase showing the bug (We use rather large amount of threads and
> > in one or two doing popen() calls - or handcrafted fork() && execv(),
> > the SIGSEGV is during fork()).
> 
> We're going trough two similar problems here. One is KDE, and the other
> is Linuxconf. Linuxconf is core dumping on a module when it is linked
> with pthread and dlopen()'ed with RTLD_GLOBAL. We must reduce one of
> them to a testcase.

By any chance, are you dlopening a DSO linked against -lpthread from
program not linked against -lpthread?

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [reiserfs-list] Re: ReiserFS Oops (2.4.1, deterministic, symlink

2001-02-02 Thread Jakub Jelinek

On Sat, Feb 03, 2001 at 12:40:03AM +0100, J . A . Magallon wrote:
> Please, do not do so. That depends on the PACKAGE name and version, and there
> is no standard way of versioning a patched gcc.
> The -54 is a RH'ism, for example Mandrake Cooker includes patches from
> different sources, and gcc is versioned like

You can do:
if [ "$CC" = gcc ]; then
  echo 'inline void f(unsigned int n){int 
i,j=-1;for(i=0;i<10&&j<0;i++)if((1UL< test.c
  gcc -O2 -o test test.c
  if ./test; then echo "*** Please don't use this compiler to compile kernel"; fi
  rm -f test.c test
fi

(the $CC = gcc test is there e.g. so that the test is not done when
cross-compiling or when there is a separate kernel compiler and userland
compiler (e.g. on sparc64). This test will barf on gcc-2.96 up to -67 and
on 2.97 until end of November or so).
Similarly a testcase for the reload bug which caused in 2.95.2
miscompilation of some long long stuff in the kernel could be added as well
if you want to go that way.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [reiserfs-list] Re: ReiserFS Oops (2.4.1, deterministic, symlink

2001-02-03 Thread Jakub Jelinek

On Sat, Feb 03, 2001 at 04:25:20AM +, Paul Jakma wrote:
> On Fri, 2 Feb 2001, Jakub Jelinek wrote:
> 
> > You can do:
> > if [ "$CC" = gcc ]; then
> >   echo 'inline void f(unsigned int n){int 
>i,j=-1;for(i=0;i<10&&j<0;i++)if((1UL< > test.c
> >   gcc -O2 -o test test.c
> >   if ./test; then echo "*** Please don't use this compiler to compile kernel"; fi
> >   rm -f test.c test
> > fi
> >
> > (the $CC = gcc test is there e.g. so that the test is not done when
> > cross-compiling or when there is a separate kernel compiler and userland
> > compiler (e.g. on sparc64). This test will barf on gcc-2.96 up to -67 and
> >
> > Jakub
> 
> ehhmm..
> 
> [root@fogarty /tmp]# rpm -q gcc
> gcc-2.96-70
> [root@fogarty /tmp]# cat test.c
> inline void f(unsigned int n){int
> i,j=-1;for(i=0;i<10&&j<0;i++)if((1UL< exit(1);}
> [root@fogarty /tmp]# gcc -o test test.c
> [root@fogarty /tmp]# ./test
> 
> didn't barf here with 2.96-70.

I used a wrong word (the test originally had abort() instead of exit(0) and
exit(0) instead of exit(1)). The test will exit with 0 if it was
miscompiled, 1 if it was not. And on 2.96-70 it should exit with 1 as it
should not be miscompiled.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [OT] Re: PCI-SCI Drivers v1.1-7 released

2001-02-07 Thread Jakub Jelinek

On Wed, Feb 07, 2001 at 11:08:52AM -0700, Jeff V. Merkey wrote:
> Not supporting #ident for CVS managed code bases would see to 
> me, at first glance, to be a show stopper to shipping a release 
> of anything, since many folks need CVS support.

Could you please explain what you mean by not supporting #ident?
It works just fine for me in all our gcc packages I've checked.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Thread_Id

2005-07-14 Thread Jakub Jelinek
On Thu, Jul 14, 2005 at 02:25:43PM +0200, Arjan van de Ven wrote:
> pure luck. NPTL threading uses it to store a pointer to per thread info
> structure; other threading (linuxthreads) may have stored a pid there to
> identify the internal thread. nptl is 2.6 only so you might have
> switched implementation of threading when you switched kernels.

Actually, in linuxthreads what pthread_self () returned has the first slot
in its internal threads array (up to max number of supported threads)
that was unused at thread creation time in the low order bits and sequence
number of thread creation in its high order bits.
So unless you are using yet another threading library (I thought NGPT
is dead for years...), the claim that you get the same numbers from
gettid() syscall under NPTL as pthread_self () gives you under LinuxThreads
is simply not true.  And you certainly shouldn't be using gettid ()
syscall in NPTL, as it is just an implementation detail that there is
a 1:1 mapping between NPTL threads and kernel threads.  It can change
at any time.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ipc

2005-07-07 Thread Jakub Jelinek
On Thu, Jul 07, 2005 at 02:13:02PM +0200, Paolo Ornati wrote:
> You need to tell GCC to use "libmqueue"... something like this:
> 
>   gcc -Wall -O2 -o prog prog.c -lmqueue

If you have glibc 2.3.4 or later, you should use -lrt instead.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-08 Thread Jakub Jelinek
On Fri, Jul 08, 2005 at 06:42:53PM +0100, Alistair John Strachan wrote:
> > btw., which gcc version are you using?
> 
> Not the GCC version known to bloat stacks ;-)
> 
> 3.4.4, on both my machines. I'm not touching 4.x until 4.0.1 is released with 
> the miscompiled-code fixes.

GCC 4.0.x bloats stacks less than 3.4.4.
And, if you are looking for 4.0.1, it has been released yesterday.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux AIO status & todo

2005-08-23 Thread Jakub Jelinek
On Tue, Aug 23, 2005 at 01:14:38PM +0530, Suparna Bhattacharya wrote:

>   2. No support for propagating IO completion events to user space
>  threads using RT signals. User threads need to poll the completion
>  queue using io_getevents. POSIX specifies that when an AIO
>  request completes, a signal can be delivered to the application
>  to indicate the completion of the IO.

POSIX AIO needs to handle SIGEV_NONE, SIGEV_SIGNAL and SIGEV_THREAD
notification.  Obviously kernel shouldn't create threads for SIGEV_THREAD
itself, as kernel shouldn't hardcode all the implementation details how a
thread can be created.  But it would be good if AIO signalling e.g. handled
both SIGEV_SIGNAL and SIGEV_SIGNAL | SIGEV_THREAD_ID, with the same usage as
e.g. timer_* syscalls.  If kernel makes sure SI_ASYNCIO si_code is set in
the notification signal siginfos, glibc could even use just one helper
thread for timer_*/[al]io_* and maybe in the future other SIGEV_THREAD 
notification.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] FUTEX_WAKE_OP (pthread_cond_signal speedup)

2005-08-23 Thread Jakub Jelinek
Hi!

ATM pthread_cond_signal is unnecessarily slow, because it wakes one
waiter (which at least on UP usually means an immediate context switch
to one of the waiter threads).  This waiter wakes up and after a few
instructions it attempts to acquire the cv internal lock, but that lock
is still held by the thread calling pthread_cond_signal.  So it goes
to sleep and eventually the signalling thread is scheduled in, unlocks
the internal lock and wakes the waiter again.

Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
to avoid this performance issue, but it was removed when locks were
redesigned to the 3 state scheme (unlocked, locked uncontended, locked
contended).

Following scenario shows why simply using FUTEX_REQUEUE in
pthread_cond_signal together with using lll_mutex_unlock_force
in place of lll_mutex_unlock is not enough and probably why it
has been disabled at that time:

The number is value in cv->__data.__lock.
thr1thr2thr3
0   pthread_cond_wait
1   lll_mutex_lock (cv->__data.__lock)
0   lll_mutex_unlock (cv->__data.__lock)
0   lll_futex_wait (&cv->__data.__futex, futexval)
0   pthread_cond_signal
1   lll_mutex_lock (cv->__data.__lock)
1   pthread_cond_signal
2   lll_mutex_lock (cv->__data.__lock)
2 lll_futex_wait (&cv->__data.__lock, 2)
2   lll_futex_requeue (&cv->__data.__futex, 0, 1, 
&cv->__data.__lock)
  # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
2   lll_mutex_unlock_force (cv->__data.__lock)
0 cv->__data.__lock = 0
0 lll_futex_wake (&cv->__data.__lock, 1)
1   lll_mutex_lock (cv->__data.__lock)
0   lll_mutex_unlock (cv->__data.__lock)
  # Here, lll_mutex_unlock doesn't know there are threads waiting
  # on the internal cv's lock

Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
but it will cost us not one, but 2 extra syscalls and, what's worse, one
of these extra syscalls will be done for every single waiting loop in
pthread_cond_*wait.
We would need to use lll_mutex_unlock_force in pthread_cond_signal
after requeue and lll_mutex_cond_lock in pthread_cond_*wait after
lll_futex_wait.

Another alternative is to do the unlocking pthread_cond_signal needs
to do (the lock can't be unlocked before lll_futex_wake, as that is racy)
in the kernel.

I have implemented both variants, futex-requeue-glibc.patch is the
first one and futex-wake_op{,-glibc}.patch is the unlocking
inside of the kernel.  The kernel interface allows userland to specify
how exactly an unlocking operation should look like (some atomic
arithmetic operation with optional constant argument and comparison
of the previous futex value with another constant).

It has been implemented just for ppc*, x86_64 and i?86, for other
architectures I'm including just a stub header which can be used as
a starting point by maintainers to write support for their arches
and ATM will just return -ENOSYS for FUTEX_WAKE_OP.  The requeue
patch has been (lightly) tested just on x86_64, the wake_op patch
on ppc64 kernel running 32-bit and 64-bit NPTL and x86_64 kernel running
32-bit and 64-bit NPTL.

With the following benchmark on UP x86-64 I get:

for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so 
--library-path .:$i /tmp/bench; \
for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; 
done; done
time elf/ld.so --library-path .:nptl-orig /tmp/bench
real 0m0.655s user 0m0.253s sys 0m0.403s
real 0m0.657s user 0m0.269s sys 0m0.388s
time elf/ld.so --library-path .:nptl-requeue /tmp/bench
real 0m0.496s user 0m0.225s sys 0m0.271s
real 0m0.531s user 0m0.242s sys 0m0.288s
time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
real 0m0.380s user 0m0.176s sys 0m0.204s
real 0m0.382s user 0m0.175s sys 0m0.207s

The benchmark is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt1.txt
Older futex-requeue-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt2.txt
Older futex-wake_op-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt3.txt
Will post a new version (just x86-64 fixes so that the patch
applies against pthread_cond_signal.S) to libc-hacker ml soon.

Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
testcase that will not test the atomicity of the operation, but at least
check if the threads that should have been woken up are woken up and
whether the arithmetic operation in the kernel gave the expected results.

Jakub
--- linux-2.6.12/include/linux/futex.h.jj   2005-06-17 21:48:29.0 
+0200
+++ linux-2.6.12/include/linux/futex.h  2005-08-23 11:11:41.0 +0200
@@ -4,14 +4,40 @@
 /* Second argument to futex syscall */
 
 
-#define FUTEX_WAIT

Re: [PATCH] FUTEX_WAKE_OP (pthread_cond_signal speedup)

2005-08-23 Thread Jakub Jelinek
On Tue, Aug 23, 2005 at 10:36:08AM -0400, Ingo Molnar wrote:
> a detail: many of the futex_atomic_op_inuser() seem to be duplicated
> across architectures. Might be worth putting into asm-generic, to avoid
> the duplication?

Those are stub files waiting for arch maintainers to actually implement
them, so they will be eventually different, but for the time being they
just -ENOSYS, so that things compile.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MAX_ARG_PAGES has no effect?

2005-08-31 Thread Jakub Jelinek
On Wed, Aug 31, 2005 at 02:11:44PM +0200, Ingo Molnar wrote:
> > I recompiled and installed the kernel, but there's no change (getconf 
> > ARG_MAX still gives 131072.)  What am I missing?
> 
> MAX_ARG_PAGES should work just fine. I think the 'getconf ARG_MAX' 
> output is hardcoded. (because the kernel does not provide the 
> information dynamically)

Yeah, you get the value of ARG_MAX from  that was compiled
in when you compiled glibc.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [discuss] [2.6 patch] include/asm-x86_64 "extern inline" -> "static inline"

2005-09-05 Thread Jakub Jelinek
On Mon, Sep 05, 2005 at 08:00:05PM +0200, Adrian Bunk wrote:
> It isn't the same, but "static inline" is the correct variant.
> 
> "extern inline __attribute__((always_inline))" (which is what
> "extern inline" is expanded to) doesn't make sense.

It does make sense and is different from
static inline __attribute__((always_inline)).
Try:
static inline __attribute__((always_inline)) void foo (void) {}
void (*fn)(void) = foo;
vs.
extern inline __attribute__((always_inline)) void foo (void) {}
void (*fn)(void) = foo;
In the former case, GCC will emit the out of line static copy of foo
if you take its address, in the latter case either you provide foo
function by other means, or you get linker error.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible 2.6.24-rc7 issue w/respect to pthreads

2008-01-09 Thread Jakub Jelinek
On Wed, Jan 09, 2008 at 02:35:32AM -0800, [EMAIL PROTECTED] wrote:
> After I patched my 2.6.23 kernel to 2.6.24-rc7 this morning, I noticed
> some odd behavior with respect to POSIX threads in a test program I had
> written (originally to test epoll.)
> 
> The behavior is as follows:
> 
> 1.  main() creates a new thread of execution with pthread_create
> 2.  thread_func() immediately calls pthread_detach(), which is supposed to
> ensure that thread resources are cleaned up when the thread terminates.
> 3.  The spawned thread sleeps and then prints a message "got here"
> 4.  The main thread calls pthread_join().  According to the POSIX
> documentation, this should suspend execution until the spawned thread has
> terminated.

Your testcase is buggy.  Detached threads aren't joinable, you can't call
pthread_join on them.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fedora's latest gcc produces unbootable kernels

2007-12-03 Thread Jakub Jelinek
On Mon, Dec 03, 2007 at 09:17:22AM +0100, Thomas Gleixner wrote:
> I looked at the disassembly but I can not spot the problem.
> 
> I think the real problem is somewhere else. Likely candidates are
> hrtimer_forward() or hrtimer_start() - in that order.

Should be hopefully fixed in latest Fedora gcc.  The problem was in code like
typedef union { long long int s; } U;
typedef struct { U u; } S;

void foo (S *s, long long int x, unsigned long int y)
{
  s->u = ({ (U) { .s = s->u.s + x * y }; });
}

where a backport of a recent optimization of mine, without which gcc handles
terribly initializers from compound literals (which is something hrtimer
uses just everywhere - why can't ktime.h for #if BITS_PER_LONG == 64 || 
defined(CONFIG_KTIME_SCALAR)
just use a scalar rather than union with a scalar in it??), sets the LHS
object to the compound literal's initializer rather than forcing creation of
a temporary object (the compound literal).  Unfortunately the gimplifier
had some bugs in case the initializer references (or at least might
reference) parts of LHS object.  Fixed by backporting 2 Ada bugfixes for the
gimplifier from GCC trunk (Ada was hitting those bugs even without this
compound literal optimization).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fedora's latest gcc produces unbootable kernels

2007-12-03 Thread Jakub Jelinek
On Mon, Dec 03, 2007 at 12:34:17PM +0100, Thomas Gleixner wrote:
> Of course just to annoy you :)

It doesn't matter whether I'm annoyed about this or not, but whether gcc is
able to generate decent code with it or not.  And especially with union it
is not, at least through all the tree ssa passes.  You already have a lot of
the details hidden in ktime.h accessor inlines, so I don't think it would be
hard to add further one or two.

Anyway, even just using typedef struct ktime { s64 tv64; } ktime_t; could
make things better in case you have just one field.  Unlike unions, structs
can be (and in this case most likely will be) scalarized by SRA, so
half of tree SSA passes will see it as integral var and will be able to
perform optimizations on it.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Futex queue_me/get_user ordering

2005-03-17 Thread Jakub Jelinek
On Thu, Nov 18, 2004 at 02:47:26PM -0500, Jakub Jelinek wrote:
> The scenario described in futex_wait-fix.patch IMHO can happen even
> if all calls to pthread_cond_signal are done with mutex held around it, i.e.
> A B   X   Y
> pthread_mutex_lock (&mtx);
> pthread_cond_wait (&cv, &mtx);
>   - mtx release *)
>   total++ [1/0/0] (0) {}
>   pthread_mutex_lock (&mtx);
>   pthread_cond_signal (&cv);
> - wake++ [1/1/0] (1) {}
> FUTEX_WAKE, 1 (returns, nothing is queued)
>   pthread_mutex_unlock (&mtx);
>   pthread_mutex_lock (&mtx);
>   pthread_cond_wait (&cv, &mtx);
> - mtx release *)
> total++ [2/1/0] (1) {}
>   FUTEX_WAIT, 0
>   queue_me [2/1/0] (1) {A}
>   0 != 1
>   FUTEX_WAIT, 1
>   queue_me [2/1/0] (1) {A,B}
>   1 == 1
>   pthread_mutex_lock (&mtx);
>   pthread_cond_signal (&cv);
> - wake++ [2/2/0] (2) {A,B}
> FUTEX_WAKE, 1 (unqueues 
> incorrectly A)
> [2/2/0] (2) {B}
>   pthread_mutex_unlock (&mtx);
>   try to dequeue but already dequeued
>   would normally return EWOULDBLOCK here
>   but as unqueue_me failed, returns 0
>   woken++ [2/2/1] (2) {B}
>   schedule_timeout (forever)
>   - mtx reacquire
>   pthread_cond_wait returns
> pthread_mutex_unlock (&mtx);
> 
>   ---
>   the code would like to say pthread_mutex_unlock (&mtx);
>   and pthread_exit here, but never reaches there.
...

http://www.ussg.iu.edu/hypermail/linux/kernel/0411.2/0953.html

Your argument in November was that you don't want to slow down the
kernel and that userland must be able to cope with the
non-atomicity of futex syscall.

But with the recent changes to futex.c I think kernel can ensure
atomicity for free.

With get_futex_value_locked doing the user access in_atomic () and
repeating if that failed, I think it would be just a matter of
something as in the patch below (totally untested though).
It would simplify requeue implementation (getting rid of the nqueued
field), as well as never enqueue a futex in futex_wait until
the *uaddr == val uaccess check has shown it should be enqueued.
And I don't think the kernel will be any slower because of that,
in the common case where get_futex_value_locked does not cause
a mm fault (userland typically accessed that memory a few cycles before
the syscall), the futex_wait change is just about doing first half of
queue_me before the user access and second half after it.

--- linux-2.6.11/kernel/futex.c.jj  2005-03-17 04:42:29.0 -0500
+++ linux-2.6.11/kernel/futex.c 2005-03-17 05:13:45.0 -0500
@@ -97,7 +97,6 @@ struct futex_q {
  */
 struct futex_hash_bucket {
spinlock_t  lock;
-   unsigned intnqueued;
struct list_head   chain;
 };
 
@@ -265,7 +264,6 @@ static inline int get_futex_value_locked
inc_preempt_count();
ret = __copy_from_user_inatomic(dest, from, sizeof(int));
dec_preempt_count();
-   preempt_check_resched();
 
return ret ? -EFAULT : 0;
 }
@@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
-   unsigned int nqueued;
 
  retry:
down_read(¤t->mm->mmap_sem);
@@ -354,23 +351,24 @@ static int futex_requeue(unsigned long u
bh1 = hash_futex(&key1);
bh2 = hash_futex(&key2);
 
-   nqueued = bh1->nqueued;
+   if (bh1 < bh2)
+   spin_lock(&bh1->lock);
+   spin_lock(&bh2->lock);
+   if (bh1 > bh2)
+   spin_lock(&bh1->lock);
+
if (likely(valp != NULL)) {
int curval;
 
-   /* In order to avoid doing get_user while
-  holding bh1->lock and bh2->lock, nqueued
-  (monotonically increasing field) must be first
-  read, then *uaddr1 fetched from userland and
-  after acquiring lock nqueued field compared with
-  the stored value.  The smp_mb () below
-  makes sure that bh1->nqueued is read from memory
-  before *uaddr1.  */
-   smp_mb();
-
ret = get_futex_value_locked(&curval, (int __us

Re: Futex queue_me/get_user ordering

2005-03-17 Thread Jakub Jelinek
On Thu, Mar 17, 2005 at 03:20:31PM +, Jamie Lokier wrote:
> If you change futex_wait to be "atomic", and then have userspace locks
> which _depend_ on that atomicity, it becomes impossible to wait on
> multiple of those locks, or make poll-driven state machines which can
> wait on those locks.

The futex man pages that have been around for years (certainly since mid 2002)
certainly don't document FUTEX_WAIT as token passing operation, but as atomic
operation:

Say http://www.icewalkers.com/Linux/ManPages/futex-2.html

FUTEX_WAIT
This operation atomically verifies that  the  futex
address  still contains the value given, and sleeps
awaiting FUTEX_WAKE on this futex address.  If  the
timeout argument is non-NULL, its contents describe
the maximum duration of the wait, which is infinite
otherwise.   For futex(4), this call is executed if
decrementing the count gave a negative value (indi
cating  contention),  and  will sleep until another
process   releases  the  futex  and  executes   the
FUTEX_WAKE operation.

RETURN VALUE
FUTEX_WAIT
Returns 0 if the process was woken by a  FUTEX_WAKE
call. In case of timeout, ETIMEDOUT is returned. If
the futex was not equal to the expected value,  the
operation  returns  EWOULDBLOCK.  Signals (or other
spurious wakeups) cause FUTEX_WAIT to return EINTR.

so there very well might be programs other than glibc that
depend on this behaviour.  Given that in most cases the race
is not hit every day (after all, we have been living with it for
several years), they probably wouldn't know there is a problem
like that.

> You can do userspace threading and simulate most blocking system calls
> by making them non-blocking and using poll).

Sure, but then you need to write your own locking as well and
can just use the token passing property of futexes there.

> It's not a _huge_ loss, but considering it's only Glibc which is
> demanding this and futexes have another property, token-passing, which
> Glibc could be using instead - why not use it?

Because that requires requeue being done with the cv lock held, which
means an extra context switch.

> > @@ -265,7 +264,6 @@ static inline int get_futex_value_locked
> > inc_preempt_count();
> > ret = __copy_from_user_inatomic(dest, from, sizeof(int));
> > dec_preempt_count();
> > -   preempt_check_resched();
> >  
> > return ret ? -EFAULT : 0;
> >  }
> 
> inc_preempt_count() and dec_preempt_count() aren't needed, as
> preemption is disabled by the queue spinlocks.  So
> get_futex_value_locked isn't needed any more: with the spinlocks held,
> __get_user will do.

They aren't needed if CONFIG_PREEMPT.  But with !CONFIG_PREEMPT, they
are IMHO still needed, as spin_lock/spin_unlock call preempt_{disable,enable},
which is a nop if !CONFIG_PREEMPT.
__get_user can't be used though, it should be __get_user_inatomic
(or __copy_from_user_inatomic if the former doesn't exist).

> > [numerous instances of...]
> > +   preempt_check_resched();
> 
> Not required.  The spin unlocks will do this.

True, preempt_check_resched() is a nop if !CONFIG_PREEMPT and for
CONFIG_PREEMPT spin_unlock will handle it.  Will remove them from the
patch.

> > But with the recent changes to futex.c I think kernel can ensure
> > atomicity for free.
> 
> I agree it would probably not slow the kernel, but I would _strongly_
> prefer that Glibc were fixed to use the token-passing property, if
> Glibc is the driving intention behind this patch - instead of this
> becoming a semantic that application-level users of futex (like
> database and IPC libraries) come to depend on and which can't be
> decomposed into a multiple-waiting form.
> 
> (I admit that the kernel code does look nicer with
> get_futex_value_locked gone, though).
> 
> By the way, do you know of Scott Snyder's recent work on fixing Glibc
> in this way?  He bumped into one of Glibc's currently broken corner
> cases, fixed it (according to the algorithm I gave in November), and
> reported that it works fine with the fix.

I certainly haven't seen his patch.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Futex queue_me/get_user ordering

2005-03-18 Thread Jakub Jelinek
On Thu, Mar 17, 2005 at 03:20:31PM +, Jamie Lokier wrote:
> > [numerous instances of...]
> > +   preempt_check_resched();
> 
> Not required.  The spin unlocks will do this.

Here is updated patch with those removed (all of them are preceeded
by spin_unlock) and out_unqueue label and following unused code removed
too.

--- linux-2.6.11/kernel/futex.c.jj  2005-03-17 04:42:29.0 -0500
+++ linux-2.6.11/kernel/futex.c 2005-03-18 05:45:29.0 -0500
@@ -97,7 +97,6 @@ struct futex_q {
  */
 struct futex_hash_bucket {
spinlock_t  lock;
-   unsigned intnqueued;
struct list_head   chain;
 };
 
@@ -265,7 +264,6 @@ static inline int get_futex_value_locked
inc_preempt_count();
ret = __copy_from_user_inatomic(dest, from, sizeof(int));
dec_preempt_count();
-   preempt_check_resched();
 
return ret ? -EFAULT : 0;
 }
@@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
-   unsigned int nqueued;
 
  retry:
down_read(¤t->mm->mmap_sem);
@@ -354,23 +351,22 @@ static int futex_requeue(unsigned long u
bh1 = hash_futex(&key1);
bh2 = hash_futex(&key2);
 
-   nqueued = bh1->nqueued;
+   if (bh1 < bh2)
+   spin_lock(&bh1->lock);
+   spin_lock(&bh2->lock);
+   if (bh1 > bh2)
+   spin_lock(&bh1->lock);
+
if (likely(valp != NULL)) {
int curval;
 
-   /* In order to avoid doing get_user while
-  holding bh1->lock and bh2->lock, nqueued
-  (monotonically increasing field) must be first
-  read, then *uaddr1 fetched from userland and
-  after acquiring lock nqueued field compared with
-  the stored value.  The smp_mb () below
-  makes sure that bh1->nqueued is read from memory
-  before *uaddr1.  */
-   smp_mb();
-
ret = get_futex_value_locked(&curval, (int __user *)uaddr1);
 
if (unlikely(ret)) {
+   spin_unlock(&bh1->lock);
+   if (bh1 != bh2)
+   spin_unlock(&bh2->lock);
+
/* If we would have faulted, release mmap_sem, fault
 * it in and start all over again.
 */
@@ -385,21 +381,10 @@ static int futex_requeue(unsigned long u
}
if (curval != *valp) {
ret = -EAGAIN;
-   goto out;
+   goto out_unlock;
}
}
 
-   if (bh1 < bh2)
-   spin_lock(&bh1->lock);
-   spin_lock(&bh2->lock);
-   if (bh1 > bh2)
-   spin_lock(&bh1->lock);
-
-   if (unlikely(nqueued != bh1->nqueued && valp != NULL)) {
-   ret = -EAGAIN;
-   goto out_unlock;
-   }
-
head1 = &bh1->chain;
list_for_each_entry_safe(this, next, head1, list) {
if (!match_futex (&this->key, &key1))
@@ -435,13 +420,9 @@ out:
return ret;
 }
 
-/*
- * queue_me and unqueue_me must be called as a pair, each
- * exactly once.  They are called with the hashed spinlock held.
- */
-
 /* The key must be already stored in q->key. */
-static void queue_me(struct futex_q *q, int fd, struct file *filp)
+static inline struct futex_hash_bucket *
+queue_lock(struct futex_q *q, int fd, struct file *filp)
 {
struct futex_hash_bucket *bh;
 
@@ -455,11 +436,35 @@ static void queue_me(struct futex_q *q, 
q->lock_ptr = &bh->lock;
 
spin_lock(&bh->lock);
-   bh->nqueued++;
+   return bh;
+}
+
+static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *bh)
+{
list_add_tail(&q->list, &bh->chain);
spin_unlock(&bh->lock);
 }
 
+static inline void
+queue_unlock(struct futex_q *q, struct futex_hash_bucket *bh)
+{
+   spin_unlock(&bh->lock);
+   drop_key_refs(&q->key);
+}
+
+/*
+ * queue_me and unqueue_me must be called as a pair, each
+ * exactly once.  They are called with the hashed spinlock held.
+ */
+
+/* The key must be already stored in q->key. */
+static void queue_me(struct futex_q *q, int fd, struct file *filp)
+{
+   struct futex_hash_bucket *bh;
+   bh = queue_lock(q, fd, filp);
+   __queue_me(q, bh);
+}
+
 /* Return 1 if we were still queued (ie. 0 means we were woken) */
 static int unqueue_me(struct futex_q *q)
 {
@@ -503,6 +508,7 @@ static int futex_wait(unsigned long uadd
DECLARE_WAITQUEUE(wait, current);
int ret, curval;
struct futex_q q;
+   struct futex_hash_bucket *bh;
 
  retry:
down_read(¤t->mm->mmap_sem);
@@ -511,7 +517,7 @@ static int futex_wait(unsigned long uadd
if (unlikely(ret != 0))
goto out_release_sem;
 
-   queue_me(&q, -1, 

Re: kernel bug: futex_wait hang

2005-03-21 Thread Jakub Jelinek
On Tue, Mar 22, 2005 at 12:30:53AM -0500, Lee Revell wrote:
> On Mon, 2005-03-21 at 21:08 -0800, Andrew Morton wrote:
> > Jamie Lokier <[EMAIL PROTECTED]> wrote:
> > > 
> > > The most recent messages under "Futex queue_me/get_user ordering",
> > > with a patch from Jakub Jelinek will fix this problem by changing the
> > > kernel.  Yes, you should apply Jakub's most recent patch, message-ID
> > > "<[EMAIL PROTECTED]>".
> > > 
> > > I have not tested the patch, but it looks convincing.
> > 
> > OK, thanks.  Lee && Paul, that's at
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc1/2.6.12-rc1-mm1/broken-out/futex-queue_me-get_user-ordering-fix.patch
> > 
> 
> Does not fix the problem.

Have you analyzed the use of mutexes/condvars in the program?
The primary suspect is a deadlock, race of some kind or other bug
in the program.  All these will show up as a hang in FUTEX_WAIT.
The argument that it works with LinuxThreads doesn't count,
the timing and internals of both threading libraries are so different
that a program bug can only show up with one of the threading libraries
and not both.
Only once you distill a minimal self-contained testcase that proves
the program is correct and it gets analyzed, it is time to talk about
NPTL or kernel bugs.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel bug: futex_wait hang

2005-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2005 at 05:12:59AM -0800, [EMAIL PROTECTED] wrote:
> the hang occurs during an attempted thread cancel+join. we know from
> strace that one thread calls tgkill() on the other. the other thread is
> blocked in a poll call on a FIFO. after tgkill, the first thread enters a
> futex wait, apparently waiting for the thread ID of the cancelled thread
> to appear at some location (just a guess based on the info from strace).
> the wait never returns, and so the first thread ends up hung in
> pthread_join(). there are no user-defined mutexes or condvars involved.

If the thread that is to be cancelled is in async cancel state (it should
be when waiting in a poll and if cancellation is not disabled in that thread),
then pthread_cancel sends a SIGCANCEL signal to it via tgkill.
If tgkill succeeds (and thus pthread_cancel succeeds too) and you call
pthread_join on it, in the likely case the thread is still alive
pthread_join will FUTEX_WAIT on pd->tid, waiting until the thread dies.
NPTL threads are created with CLONE_CHILD_CLEARTID &self->tid, so this
futex will be FUTEX_WAKEd by mm_release in kernel whenever the thread is
exiting (or dying in some other way).

So, if pthread_join waits for the thread forever, the thread must be
around (otherwise pthread_join would not block on it; well, there could
be memory corruption in the program and anything would be possible then).
This would mean either that the poll has not been awaken by the SIGCANCEL
signal, or e.g. that one of the registered cleanup handlers (or C++
destructors) in the thread that is being cancelled get stuck for whatever
reason (deadlock, etc.).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Patch 4/6 randomize the stack pointer

2005-01-29 Thread Jakub Jelinek
On Sat, Jan 29, 2005 at 01:31:46AM -0500, John Richard Moser wrote:
> Finally, although an NX stack is nice, you should probably take into
> account IBM's stack smash protector, ProPolice.  Any attack that can
> evade SSP reliably can evade an NX stack; but ProPolice protects from
> other overflows.  Now I'm sure RH is over there inventing something that
> detects buffer overflows at compile time and misses or warns about the
> ones it can't identify:
> 
> if (strlen(a) > 4)
>   a[5] = '\0';
> foo(a);
> 
> void foo(char *a) {
>char b[5];
>strcpy(b,a);
> }
> 
> This code is safe, but you can't tell from looking at foo().  You don't
> get a look at every other object being compiled against this one that
> may call foo() either.  So compile time buffer overflow detection is a
> best-effort at best.

If strlen(a) > 4 above, then -D_FORTIFY_SOURCE={1,2} compiled program
will be terminated in the strcpy call.  At compile time it computes
that the strcpy call can fill in at most 5 bytes and if it copies more,
then it terminates.

> ProPolice protects local variables with 0 overhead; passed arguments
> with a few instructions; and the return pointer and stack frame pointer
> with a couple instructions.  At runtime.  Want to impress me?  Actually
> deploy ProPolice instead of showing up 3 years from now waving around
> your own patch that you wrote that half-impliments half of it.  If you
> want "something better," it's GPL, so grab it and start hacking.

__builtin_object_size () checking/-D_FORTIFY_SOURCE=n changes are (partly)
orthogonal to ProPolice.  There are exploits prevented by
-D_FORTIFY_SOURCE={1,2} checking and not ProPolice and vice versa.
Things that the former protects and the latter does not are e.g.
some non-automatic buffer overflows or heap overflows, some format string
vulnerabilities and for automatic variables e.g. those that don't
overflow into another function's frame, but just overwrite other local
variables in the same function.  ProPolice on the other side will detect
stack overflows that overflow into another function's frame, even if they
aren't done through string operations (, s*printf, gets, etc.)
or if the compiler can't figure out what certain arguments to these
functions points to (and where) at compile time.

The ideas in IBM's ProPolice changes are good and worth
implementing, but the current implementation is bad.

FYI, you can find some details about -D_FORTIFY_SOURCE=n in
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02055.html

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ppc32 weirdness with gcc-4.0 in 2.6.11-rc4

2005-02-24 Thread Jakub Jelinek
On Thu, Feb 24, 2005 at 04:08:47PM +0100, Mikael Pettersson wrote:
> /* gcc4bug.c
>  * Written by Mikael Pettersson <[EMAIL PROTECTED]>, 2005-02-24.
...
Reproduced, thanks for the testcase.  Looking into it...

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ppc32 weirdness with gcc-4.0 in 2.6.11-rc4

2005-02-24 Thread Jakub Jelinek
On Thu, Feb 24, 2005 at 04:08:47PM +0100, Mikael Pettersson wrote:
> _However_, the 0k data message is due to a gcc-4.0 bug, and below
> you'll find a test program which illustrates it.

http://gcc.gnu.org/PR20196

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix binfmt_elf.c

2001-07-02 Thread Jakub Jelinek

Hi!

There is a bug in binfmt_elf.c if the dynamic linker has non-zero base vaddr
(e.g. if it is prelinked). The issue is that in such case ld-linux.so.2 is
loaded at ELF_ET_DYN_BASE + p_vaddr instead of ELF_ET_DYN_BASE, on some
architectures into non-desirable places in virtual memory.

Best explained on a ld-linux.so.2 prelink(1)ed to 0x4000 on ia32:

$ LD_TRACE_LOADED_OBJECTS=1 ./ld-linux.so.2 ./libc.so.6
/lib/ld-linux.so.2 => ./ld-linux.so.2 (0x6000)

ELF_ET_DYN_BASE is defined to 0x2000 in ia32 (see the patch, it was
meant to be 0x8000), so ld-linux.so.2 should have l_map_start 0x2000
while as you see in reality it has 0x6000.
If this prelinked VMA + ELF_ET_DYN_BASE fits into kernel reserved address
space, ./ld-linux.so.2 running won't work at all.

Also, many platforms such as i386 use
#define ELF_ET_DYN_BASE (2 * TASK_SIZE / 3)
which I guess is not what was originally intended (on i386 this is usually
0x2aaa). As this value gets passed to elf_map which rounds it down to
ELF page boundary anyway, I think (TASK_SIZE / 3 * 2) is far better.
I've changed it on ia32 only, but if someone would test it on other
platforms which set ELF_ET_DYN_BASE this way it would be probably good to
change elsewhere as well.

--- linux/fs/binfmt_elf.c.jjThu May 24 11:11:36 2001
+++ linux/fs/binfmt_elf.c   Thu May 24 11:32:26 2001
@@ -396,7 +396,7 @@ out:
 static int load_elf_binary(struct linux_binprm * bprm, struct pt_regs * regs)
 {
struct file *interpreter = NULL; /* to shut gcc up */
-   unsigned long load_addr = 0, load_bias;
+   unsigned long load_addr = 0, load_bias = 0;
int load_addr_set = 0;
char * elf_interpreter = NULL;
unsigned int interpreter_type = INTERPRETER_NONE;
@@ -595,12 +595,6 @@ static int load_elf_binary(struct linux_
setup_arg_pages(bprm); /* XXX: check error */
current->mm->start_stack = bprm->p;
 
-   /* Try and get dynamic programs out of the way of the default mmap
-  base, as well as whatever program they might try to exec.  This
-  is because the brk will follow the loader, and is not movable.  */
-
-   load_bias = ELF_PAGESTART(elf_ex.e_type==ET_DYN ? ELF_ET_DYN_BASE : 0);
-
/* Now we do a little grungy work by mmaping the ELF image into
   the correct location in memory.  At this point, we assume that
   the image should be loaded at fixed address, not at a variable
@@ -624,6 +618,11 @@ static int load_elf_binary(struct linux_
vaddr = elf_ppnt->p_vaddr;
if (elf_ex.e_type == ET_EXEC || load_addr_set) {
elf_flags |= MAP_FIXED;
+   } else if (elf_ex.e_type == ET_DYN) {
+   /* Try and get dynamic programs out of the way of the default 
+mmap
+  base, as well as whatever program they might try to exec.  
+This
+  is because the brk will follow the loader, and is not 
+movable.  */
+   load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
}
 
error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, 
elf_flags);
--- linux/include/asm-i386/elf.h.jj Mon Mar 26 18:48:10 2001
+++ linux/include/asm-i386/elf.hThu May 24 11:49:38 2001
@@ -55,7 +55,7 @@ typedef struct user_fxsr_struct elf_fpxr
the loader.  We need to make sure that it is out of the way of the program
that it will "exec", and that there is sufficient room for the brk.  */
 
-#define ELF_ET_DYN_BASE (2 * TASK_SIZE / 3)
+#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2)
 
 /* Wow, the "main" arch needs arch dependent functions too.. :) */
 
Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] Fix kernel linker scripts

2001-07-02 Thread Jakub Jelinek

Hi!

Apparently all kernel scripts only have .rodata and not also .rodata.* input
sections in it.
This has been no problem so far, but since binutils and gcc support SHF_MERGE
sections (so that string constant (and other constant too) duplicates can be
removed at link time) the compiler creates sections like .rodata.str1.1
and they really should be merged into the rodata output section
(or whatever else) during linking (the default binutils linker scripts are
doing this for ages).
On some architectures it creates no problems, just one more section in
section table (like i386), on others it causes the kernel not to boot at all
(e.g. on ia64).
Please apply.

--- linux/arch/alpha/boot/bootloader.lds.jj Sun Sep  6 13:34:33 1998
+++ linux/arch/alpha/boot/bootloader.ldsTue Jun 26 11:05:14 2001
@@ -6,7 +6,7 @@ SECTIONS
   .text : { *(.text) }
   _etext = .;
   PROVIDE (etext = .);
-  .rodata : { *(.rodata) }
+  .rodata : { *(.rodata) *(.rodata.*) }
   .data : { *(.data) CONSTRUCTORS }
   .got : { *(.got) }
   .sdata : { *(.sdata) }
--- linux/arch/alpha/vmlinux.lds.in.jj  Mon Jun 26 14:26:56 2000
+++ linux/arch/alpha/vmlinux.lds.in Tue Jun 26 11:05:24 2001
@@ -53,7 +53,7 @@ SECTIONS
   /* Global data */
   _data = .;
   .data.cacheline_aligned : { *(.data.cacheline_aligned) }
-  .rodata : { *(.rodata) }
+  .rodata : { *(.rodata) *(.rodata.*) }
   .data : { *(.data) CONSTRUCTORS }
   .got : { *(.got) }
   .sdata : { *(.sdata) }
--- linux/arch/arm/boot/compressed/vmlinux.lds.in.jjThu Feb  8 19:32:44 2001
+++ linux/arch/arm/boot/compressed/vmlinux.lds.in   Tue Jun 26 11:05:35 2001
@@ -24,6 +24,7 @@ SECTIONS
 *(.fixup)
 *(.gnu.warning)
 *(.rodata)
+*(.rodata.*)
 *(.glue_7)
 *(.glue_7t)
 input_data = .;
--- linux/arch/arm/vmlinux-armo.lds.in.jj   Thu Feb  8 19:32:44 2001
+++ linux/arch/arm/vmlinux-armo.lds.in  Tue Jun 26 11:05:49 2001
@@ -47,6 +47,7 @@ SECTIONS
*(.gnu.warning)
*(.text.lock)   /* out-of-line lock text */
*(.rodata)
+   *(.rodata.*)
*(.glue_7)
*(.glue_7t)
*(.kstrtab)
--- linux/arch/arm/vmlinux-armv.lds.in.jj   Wed May 16 18:25:16 2001
+++ linux/arch/arm/vmlinux-armv.lds.in  Tue Jun 26 11:05:57 2001
@@ -42,6 +42,7 @@ SECTIONS
*(.gnu.warning)
*(.text.lock)   /* out-of-line lock text */
*(.rodata)
+   *(.rodata.*)
*(.glue_7)
*(.glue_7t)
*(.got) /* Global offset table  */
--- linux/arch/cris/boot/compressed/decompress.ld.jjFri Apr  6 13:42:55 2001
+++ linux/arch/cris/boot/compressed/decompress.ld   Tue Jun 26 11:06:04 2001
@@ -13,6 +13,7 @@ SECTIONS
_stext = . ;
*(.text)
*(.rodata)
+   *(.rodata.*)
_etext = . ;
} > dram
.data :
--- linux/arch/cris/cris.ld.jj  Tue May  1 19:04:56 2001
+++ linux/arch/cris/cris.ld Tue Jun 26 11:06:23 2001
@@ -24,7 +24,7 @@ SECTIONS
*(.fixup)
*(.text.__*)
*(.rodata)
-   *(.rodata.__*)
+   *(.rodata.*)
}
 
. = ALIGN(4);/* Exception table */
--- linux/arch/i386/vmlinux.lds.jj  Wed Jan  3 23:45:26 2001
+++ linux/arch/i386/vmlinux.lds Tue Jun 26 11:06:33 2001
@@ -17,7 +17,7 @@ SECTIONS
 
   _etext = .;  /* End of text section */
 
-  .rodata : { *(.rodata) }
+  .rodata : { *(.rodata) *(.rodata.*) }
   .kstrtab : { *(.kstrtab) }
 
   . = ALIGN(16);   /* Exception table */
--- linux/arch/ia64/boot/bootloader.lds.jj  Sun Feb  6 21:42:40 2000
+++ linux/arch/ia64/boot/bootloader.lds Tue Jun 26 11:06:42 2001
@@ -12,7 +12,7 @@ SECTIONS
 
   /* Global data */
   _data = .;
-  .rodata : { *(.rodata) }
+  .rodata : { *(.rodata) *(.rodata.*) }
   .data: { *(.data) *(.gnu.linkonce.d*) CONSTRUCTORS }
   __gp = ALIGN (8) + 0x20;
   .got   : { *(.got.plt) *(.got) }
--- linux/arch/ia64/sn/fprom/fprom.lds.jj   Thu Jan  4 16:00:15 2001
+++ linux/arch/ia64/sn/fprom/fprom.lds  Tue Jun 26 11:07:02 2001
@@ -24,7 +24,7 @@ SECTIONS
   _data = .;
 
   .rodata : AT(ADDR(.rodata) - 0x )
-   { *(.rodata) }
+   { *(.rodata) *(.rodata.*) }
   .opd : AT(ADDR(.opd) - 0x )
{ *(.opd) }
   .data : AT(ADDR(.data) - 0x )
--- linux/arch/ia64/vmlinux.lds.S.jjThu Apr  5 15:51:47 2001
+++ linux/arch/ia64/vmlinux.lds.S   Tue Jun 26 11:07:15 2001
@@ -83,7 +83,7 @@ SECTIONS
   ia64_unw_end = .;
 
   .rodata : AT(ADDR(.rodata) - PAGE_OFFSET)
-   { *(.rodata) }
+   { *(.rodata) *(.rodata.*) }
   .kstrtab : AT(ADDR(.kstrtab) - PAGE_OFFSET)
{ *(.kstrtab) }
   .opd : AT(ADDR(.opd) - PAGE_

Re: memcpy(a,b,CONST) is not inlined by gcc 3.4.1 in Linux kernel

2005-03-29 Thread Jakub Jelinek
On Tue, Mar 29, 2005 at 05:37:06PM +0300, Denis Vlasenko wrote:
> typedef unsigned int size_t;
> 
> static inline void * __memcpy(void * to, const void * from, size_t n)
> {
> int d0, d1, d2;
> __asm__ __volatile__(
> "rep ; movsl\n\t"
> "testb $2,%b4\n\t"
> "je 1f\n\t"
> "movsw\n"
> "1:\ttestb $1,%b4\n\t"
> "je 2f\n\t"
> "movsb\n"
> "2:"
> : "=&c" (d0), "=&D" (d1), "=&S" (d2)
> :"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from)
> : "memory");
> return (to);
> }
> 
> /*
>  * This looks horribly ugly, but the compiler can optimize it totally,
>  * as the count is constant.
>  */
> static inline void * __constant_memcpy(void * to, const void * from, size_t n)
> {
> if (n <= 128)
> return __builtin_memcpy(to, from, n);
> 
> #define COMMON(x) \
> __asm__ __volatile__( \
> "rep ; movsl" \
> x \
> : "=&c" (d0), "=&D" (d1), "=&S" (d2) \
> : "0" (n/4),"1" ((long) to),"2" ((long) from) \
> : "memory");
> {
> int d0, d1, d2;
> switch (n % 4) {
> case 0: COMMON(""); return to;
> case 1: COMMON("\n\tmovsb"); return to;
> case 2: COMMON("\n\tmovsw"); return to;
> default: COMMON("\n\tmovsw\n\tmovsb"); return to;
> }
> }
> 
> #undef COMMON
> }
> 
> #define memcpy(t, f, n) \
> (__builtin_constant_p(n) ? \
>  __constant_memcpy((t),(f),(n)) : \
>  __memcpy((t),(f),(n)))
> 
> int f3(char *a, char *b) { memcpy(a,b,3); }

The problem is that in GCC < 4.0 there is no constant propagation
pass before expanding builtin functions, so the __builtin_memcpy
call above sees a variable rather than a constant.

Either use GCC 4.0+, where this works just fine, or move the
n <= 128 case into the macro:
#define memcpy(t, f, n) \
(__builtin_constant_p(n) ? \
 ((n) <= 128 ? __builtin_memcpy(t,f,n) : __constant_memcpy(t,f,n) : \
 __memcpy(t,f,n))

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC: 2.6 patch] add -fno-tree-scev-cprop to KBUILD_CFLAGS

2007-11-12 Thread Jakub Jelinek
On Sun, Nov 11, 2007 at 07:48:29AM +0100, Adrian Bunk wrote:
> The gcc from svn that will become gcc 4.3 generates libgcc calls in 
> cases like the following (on 32bit architectures):
> 
> <--  snip  -->
> 
> static inline void timespec_add_ns(struct timespec *a, u64 ns)
> {
> ...
> while(ns >= NSEC_PER_SEC) {
> ns -= NSEC_PER_SEC;
> a->tv_sec++;
> }
> ...
> 
> <--  snip  -->

Blindly using -fno-tree-scev-cprop just to get rid of one case where
this turns out to be a pessimization when kernel knows ns is usually very
small is IMHO a wrong thing, you'd lose many cases where this optimization
can actually improve performance.  Instead, for this exact case just
add an optimization barrier to avoid gcc doing this.
Adding asm ("" : "=r" (ns) : "0" (ns)); (or hide it in some macro) into the
loop will do the job just fine.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vdso.so mislinked by buggy linker was Re: Linus 2.6.23-rc1

2007-07-22 Thread Jakub Jelinek
On Mon, Jul 23, 2007 at 01:56:20AM +0200, Andi Kleen wrote:
> On Monday 23 July 2007 01:38:40 Andre Noll wrote:
> [readded linux-kernel, Linus]
> 
> >   [Nr] Name  Type Address   Offset
> >Size  EntSize  Flags  Link  Info  Align
> >   [ 0]   NULL   
> >     0 0 0
> >   [ 1] .hash HASH ff700120  0120
> >00b4  0004   A   2 0 8
> >   [ 2] .dynsym   DYNSYM   ff7001d8  01d8
> >0270  0018   A   312 8
> >   [ 3] .dynstr   STRTAB   ff700448  0448
> >0059     A   0 0 1
> >   [ 4] .gnu.version  VERSYM   ff7004a2  04a2
> >0034  0002   A   2 0 2
> >   [ 5] .gnu.version_dVERDEF   ff7004d8  04d8
> >0038     A   3 2 8
> >   [ 6] .text PROGBITS ff700c00  00100bab
>   
> >02e4    AX   0 0 64
> 
> It puts .text at 1MB. Your vdso file must be huge? 
> 
> It looks like it ignores the
> -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
> options passed to it. The AMD64 ABI has a 1MB minimum page size, but
> these options are supposed to disable it.

These options are fairly new, before they were ignored (like all unknown
-z options).  They were added 2006-05-30 to CVS binutils.

I guess the problem is caused by the gap being too big and old binutils.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linus 2.6.23-rc1

2007-07-22 Thread Jakub Jelinek
On Mon, Jul 23, 2007 at 01:31:00AM +0200, Andi Kleen wrote:
> On Monday 23 July 2007 01:23:38 Andre Noll wrote:
> > On 00:22, Andi Kleen wrote:
> > > > /usr/bin/ld: section .text [ff700500 -> ff7007e3] 
> > > > overlaps section .gnu.version_d [ff7004d8 -> ff70050f]
> > > 
> > > Does this patch fix it?
> > 
> > Nope, with 0x600 I still get the same error. But it helped to further
> > increase VDSO_TEXT_OFFSET to 0xc00. I tried 0x700, 0x800,... and 0xc00
> > is the smallest value in this series that makes the error go away, i.e.
> > the patch below works for me.
> 
> Can you send (privately) readelf -a output from your vdso.so ? 
> Your linker must be doing something weird.
> 
> 0xc00 is quite wasteful.

I think Roland's --build-id doesn't create very big section, the likely
culprit would be a hacked up ld that e.g. defaults to --hash-style=both.
Can you retry with --hash-style=sysv?  vdso really has to include the
traditional .hash section, otherwise it wouldn't be compatible with
old glibcs, and an additional .gnu.hash might be an overkill for it
- doesn't the vdso define only very few symbols?

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vdso.so mislinked by buggy linker was Re: Linus 2.6.23-rc1

2007-07-23 Thread Jakub Jelinek
On Mon, Jul 23, 2007 at 01:56:20AM +0200, Andi Kleen wrote:
> On Monday 23 July 2007 01:38:40 Andre Noll wrote:
> [readded linux-kernel, Linus]
> 
> >   [Nr] Name  Type Address   Offset
> >Size  EntSize  Flags  Link  Info  Align
> >   [ 0]   NULL   
> >     0 0 0
> >   [ 1] .hash HASH ff700120  0120
> >00b4  0004   A   2 0 8
> >   [ 2] .dynsym   DYNSYM   ff7001d8  01d8
> >0270  0018   A   312 8
> >   [ 3] .dynstr   STRTAB   ff700448  0448
> >0059     A   0 0 1
> >   [ 4] .gnu.version  VERSYM   ff7004a2  04a2
> >0034  0002   A   2 0 2
> >   [ 5] .gnu.version_dVERDEF   ff7004d8  04d8
> >0038     A   3 2 8
> >   [ 6] .text PROGBITS ff700c00  00100bab
>   
> >02e4    AX   0 0 64
> 
> It puts .text at 1MB. Your vdso file must be huge? 
> 
> It looks like it ignores the
> -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
> options passed to it. The AMD64 ABI has a 1MB minimum page size, but
> these options are supposed to disable it.
> 
> Not sure how to work around this, but having an 1+MB vdso would be incredibly
> wasteful. What version is it? Perhaps we just drop support for this. I can't
> think of a workaround currently.

Looking at vdso.lds.S, if you change just VDSO_TEXT_OFFSET to 0xc00 and
don't tweak the linker script, then you jump backwards with the dot, you
should even get a linker warning about it:

  . = VDSO_PRELINK + VDSO_TEXT_OFFSET;

  .text   : { *(.text) }:text
  .text.ptr   : { *(.text.ptr) }:text
  . = VDSO_PRELINK + 0x900;

Guess that 0x900 should have been VDSO_TEXT_OFFSET + 0x400 or something
similar.  Also note that it is highly desirable to fit the whole vdso into
one page, so increasing VDSO_TEXT_OFFSET etc. offsets too much is just
wasting memory.  From the above dump, VDSO_TEXT_OFFSET 0x500 is too low,
but 0x600 should work, assuming .data section is moved 0x100 higher as well.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: gcc fixed size char array initialization bug - known?

2007-08-02 Thread Jakub Jelinek
On Thu, Aug 02, 2007 at 09:55:51PM +0200, Guennadi Liakhovetski wrote:
> I've run across the following gcc "feature":
> 
>   char c[4] = "01234";
> 
> gcc emits a nice warning
> 
> warning: initializer-string for array of chars is too long
> 
> But do a
> 
>   char c[4] = "0123";
> 
> and - a wonder - no warning. No warning with gcc 3.3.2, 3.3.5, 3.4.5, 
> 4.1.2. I was told 4.2.x does produce a warning.

4.2.x nor 4.3 doesn't warn either and it is correct not to warn about
perfectly valid code.
ISO C99 is very obvious in that the terminating '\0' (resp. L'\0') from
the string literal is only added if there is room in the array or if the
array has unknown size.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Implementation of POSIX mqueues in Linux 2.6

2007-08-03 Thread Jakub Jelinek
On Fri, Aug 03, 2007 at 09:59:32AM +, gregfe wrote:
> I find little documentation on the actual implementation of POSIX message
> queues in Linux, and need some advise. In particular, I am wondering
> whether it supports inter-process *and* inter-thread communication, and if

Not sure what exactly you mean by inter-thread communication, whether
communication between threads within one process or between threads from
different processes.  You can use mq_* for either, except that mq_notify
registered signal notification is sent to the process that called mq_notify,
not thread (and for SIGEV_THREAD a new thread is created).

Though of course for communication between threads within one process
mq_* is a huge overkill.

> On more thing: kernel's "make menuconfig" of
> version 2.6.11 says :
> 
> >> To use this feature you will also need mqueue library, available
> 
> >> from <... a URL ... to M. Wronski's and K. Benedyczak's home page>"
> 
> Is it still up to date ?

No, glibc supports mq_* APIs for more than 3 years now.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: smaller kernel with no real time futexes

2007-08-01 Thread Jakub Jelinek
On Wed, Aug 01, 2007 at 09:24:34PM +0200, Andi Kleen wrote:
> Adrian,
> 
> You said earlier you're looking at smaller allnoconfig kernels.
> One thing I noticed recently that realtime pi futexes are always
> enabled and that pulls in a lot of other code (like the plists) 
> 
> Userland needs to handle them not being available anyways for older
> kernels.
> 
> Might be worth looking into turning that into a CONFIG.

That's a very bad idea.  glibc configured for 2.6.18 and higher kernels
assumes PI futexes are present.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RESEND] PIE executable randomization

2007-08-14 Thread Jakub Jelinek
On Wed, Aug 08, 2007 at 04:03:07PM +0200, Jiri Kosina wrote:
> @@ -870,11 +917,15 @@ static int load_elf_binary(struct linux_binprm *bprm, 
> struct pt_regs *regs)
>* default mmap base, as well as whatever program they
>* might try to exec.  This is because the brk will
>* follow the loader, and is not movable.  */
> +#ifdef CONFIG_X86
> + load_bias = 0;
> +#else
>   load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
> +#endif
>   }
>  
>   error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
> - elf_prot, elf_flags);
> + elf_prot, elf_flags,0);
>   if (BAD_ADDR(error)) {
>   send_sig(SIGKILL, current, 0);
>   retval = IS_ERR((void *)error) ?

If I'm reading the above hunk correctly, this means we will randomize
all PIEs and even all dynamic linkers invoked as executables on i?86 and
x86_64, and on the rest of arches we won't randomize at all, instead
load ET_DYN objects at ELF_ET_DYN_BASE address.

But I don't see anything i?86/x86_64 specific on this.

What would make much more sense to me would be conditionalizing on
whether we are loading a dynamic linker (in which case loading it
at ELF_ET_DYN_BASE is desirable or not (PIEs, ...; and for PIEs we
want to randomize on all architectures).

So something like
if (elf_interpreter)
load_bias = 0;
else
/* Probably dynamic linker invoked as
   /lib*/ld*so* program args - load at
   ELF_ET_DYN_BASE.  */
load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - 
vaddr);
instead of
#ifdef CONFIG_X86
load_bias = 0;
#else
load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
#endif

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Define the EF_AS_NO_RANDOM e_flag bit

2007-01-23 Thread Jakub Jelinek
On Tue, Jan 23, 2007 at 11:28:13PM +0300, Samium Gromoff wrote:
> Author: Samium Gromoff <[EMAIL PROTECTED]>
> Date:   Tue Jan 23 22:31:13 2007 +0300
> 
> Define the ELF binary header flag EF_AS_NO_RANDOM
> 
> EF_AS_NO_RANDOM should mean that the binary requests to not apply
> randomisation to address spaces of its processes.
> 
> diff --git a/include/linux/elf.h b/include/linux/elf.h
> index 60713e6..58ebb47 100644
> --- a/include/linux/elf.h
> +++ b/include/linux/elf.h
> @@ -172,6 +172,8 @@ typedef struct elf64_sym {
>  
>  #define EI_NIDENT  16
>  
> +#define EF_AS_NO_RANDOM 0x1/* do not randomise the address space */
> +

You can't make up EF_* flags this way, they are arch specific, the LSB bit
(but many others too) are already used on many architectures.
E.g.:
elf/mt.h:#define EF_MT_CPU_MRISC  0x0001  /* default */
elf/sparc.h:#define EF_SPARCV9_PSO0x1 /* partial store 
ordering */
elf/bfin.h:#define EF_BFIN_PIC0x0001  /* -fpic */
elf/alpha.h:#define EF_ALPHA_32BIT0x0001
elf/mips.h:#define EF_MIPS_NOREORDER  0x0001
elf/m68k.h:#define EF_M68K_CF_ISA_A_NODIV 0x01  /* ISA A except for div */
elf/sh.h:#define EF_SH1  1
elf/arm.h:#define EF_ARM_RELEXEC 0x01
elf/cris.h:#define EF_CRIS_UNDERSCORE 0x0001
elf/ia64.h:#define EF_IA_64_TRAPNIL (1 << 0)  /* Trap NIL pointer dereferences. 
 */
elf/vax.h:#define EF_VAX_NONPIC   0x0001  /* Object contains 
non-PIC code */
elf/iq2000.h:#define EF_IQ2000_CPU_IQ2000 0x0001  /* default */
elf/frv.h:#define EF_FRV_GPR_32   0x0001  /* -mgpr-32 */
to name just a few.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Define the EF_AS_NO_RANDOM e_flag bit

2007-01-23 Thread Jakub Jelinek
On Wed, Jan 24, 2007 at 12:06:45AM +0300, Samium Gromoff wrote:
> Should we introduce per-arch asm/elf.h files to hold the relevant flag 
> definitions then?

On some architectures there are no bits left.  On others you'd need to go
through whomever maintains the relevant psABI to get a bit officially
allocated.  Really, it is very bad idea to use e_flags for this.

If all you care about is running setuid LISP programs, you'd much better put
your energy into fixing the buggy ELF dumper in it.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Jakub Jelinek
On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote:
> On Wed, 2007-01-03 at 12:44 +, Alan wrote:
> > > > fixed. At that point an i686 kernel would contain i686 instructions and
> > > > actually run on all i686 processors ending all the i586 pain for most
> > > > users and distributions.
> > > 
> > > Could you explain why CMOV is pointless now? Are there any benchmarks 
> > > proving that?
> > 
> > Take a look at the recent ffmpeg bits on the mplayer list for one example
> > I have to hand - P4 cmov is pretty slow. The crypto folks find the same
> > things.
> 
> cmov is effectively the same cost as a compare and jump, in both cases
> the cpu needs to do a prediction, and on a mispredict, restart.
> 
> the reason cmov can make sense is because it's smaller code...

BTW, from GCC POV availability of CMOV is the only difference between
-march=i586 -mtune=something and -march=i686 -mtune=something.  So this is
just a naming thing, it could be called -march=i686cmov to make it more
obvious but it is too late (and too unimportant) to change it now.
Perhaps adding a note to info gcc/man gcc ought to be enough?
If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic
(or whatever other tuning you pick up), with -march=i686 -mtune=generic
you tell GCC you have CMOV.  Whether CMOV is actually used in generated
code is another matter, which should be decided based on the selected
-mtune.  For -Os CMOV should be used whenever available, as that means
usually smaller code, otherwise if on some particular chip CMOV is actually
slower than compare, jump and assignment, then CMOV should not be selected
for that particular tuning (say if Pentium4 has slower CMOV than
compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not
often), if you have examples of that, please file a bug to
http://gcc.gnu.org/bugzilla/.  -mtune=generic should emit resp. not emit
CMOV depending on whether it is a win on the currently common CPUs.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -freg-struct-return?

2007-02-22 Thread Jakub Jelinek
On Thu, Feb 22, 2007 at 12:09:04AM -0800, Jeremy Fitzhardinge wrote:
> Arjan van de Ven wrote:
> > Do we know how many gcc bugs this has? (regparm used to have many)
> > other than that.. sounds like a win...
> >   
> 
> The documentation suggests that its the preferred mode of operation, and
> that its the default on platforms where gcc is the primary compiler.  So
> the fact that it isn't for Linux suggests either an oversight or that it
> is actually broken...

It is used for Linux on many architectures (x86_64, sparc64, ia64,
ppc{,64}, arm, sh, m68k to name just a few), but it is an ABI decision,
so e.g. on i386 is not used by default as the ABI mandates structs/unions
are returned in memory.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANN] Userspace M-on-N threading model implementation. Alpha release.

2007-02-04 Thread Jakub Jelinek
On Sun, Feb 04, 2007 at 03:12:32PM -0500, Bill Davidsen wrote:
> Arjan van de Ven wrote:
> >>Because user threading can avoid context switches, there will always be 
> >>cases where it will outperform o/s threads for hardware reasons.
> >
> >actually.. switching from one "real" thread to another in Linux is not
> >an actual context switch in the hardware sense... at least this part of
> >your argument seems to be incorrect ;)
> >
> How does that work? Switching between kernel threads requires going into 
> the kernel, user level thread switches are all done in user mode.
> 
> Do you have some way to change o/s threads w/o going into the kernel?

But going into kernel is not very expensive on Linux.

On the other side, the overhead you need to add for every single syscall
that might block for the M:N threads and the associated complications
which make it far harder to conform to POSIX IMHO far outweight the costs
of going into the kernel for a context switch.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: dvb shared datastructure bug?

2007-02-13 Thread Jakub Jelinek
On Tue, Feb 13, 2007 at 03:14:23PM +0400, Manu Abraham wrote:
> >thanks for pointing out this issue.
> >
> >attached find a patch that fixes the problem.
> >
> >@mauro - please pull changeset a7ac92d208fe
> >   dvbdev: fix illegal re-usage of fileoperations struct
> >
> >from  http://www.linuxtv.org/hg/~mws/v4l-dvb-fixtree
> >
> 
> Ack'd-by: Manu Abraham <[EMAIL PROTECTED]>

Wouldn't it be better to kmalloc both struct dvb_device and
struct file_operations together instead of doing 2 separate allocations?
struct dvd_device_plus_fops
{
  struct dvb_device dev;
  struct file_operations fops;
} *dev_fops = kmalloc (sizeof (struct dvd_device_plus_fops), GFP_KERNEL);
*pdvbdev = dvbdev = (struct dvb_device *)dev_fops;
if (dev_fops == NULL)
  error handling;
memset (&dev_fops->fops, 0, sizeof (dev_fops->fops));
...
dvbdev->fops = &dev_fops->fops;

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Core Duo/Duo2 T2300/E6400 - Hyper-Threading (the absence of)

2007-01-08 Thread Jakub Jelinek
On Mon, Jan 08, 2007 at 01:44:32AM -0800, Robin H. Johnson wrote:
> (Please CC me, I am not subscribed to LKML [I have set the
> Mail-Followup-To header accordingly]).
> 
> On two of my new machines, with Intel Core Duo T2300 and Core2 Duo E6400
> chips respectively, I noticed some weirdness in how many CPUs are
> present. 
> 
> If the hyper-threading bit is present in the CPU info, should there
> always be a an extra CPU presented to the system per physical core?

No.  The ht flag just says whether HT reporting via CPUID is supported.
Core2 Duo E6400 is AFAIK not hyper-threaded, you just have 2 real sibling
CPUs (except that they share L2 cache).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20-rc4 1/4] futex priority based wakeup

2007-01-10 Thread Jakub Jelinek
On Wed, Jan 10, 2007 at 12:47:21PM +0100, Pierre Peiffer wrote:
> So, yes it (logically) has a cost, depending of the number of different 
> priorities used, so it's specially measurable with real-time threads.
> With SCHED_OTHER, I suppose that the priorities are not be very distributed.
> 
> May be, supposing it makes sense to respect the priority order only for 
> real-time pthreads, I can register all SCHED_OTHER threads to the same 
> MAX_RT_PRIO priotity ?
> Or do you think this must be set behind a CONFIG* option ?
> (Or finally not interesting enough for mainline ?)

As soon as there is at least one non-SCHED_OTHER thread among the waiters,
there is no question about whether plist should be used or not, that's
a correctness issue and if we want to conform to POSIX, we have to use that.

I guess Ulrich's question was mainly about performance differences
with/without plist wakeup when all threads are SCHED_OTHER.  I'd say for
that a pure pthread_mutex_{lock,unlock} benchmark or even just a program
which uses futex FUTEX_WAIT/FUTEX_WAKE in a bunch of threads would be
better.

In the past we talked with Ingo about the possibilities here, one is use
plist always and prove that it doesn't add measurable overhead over current
FIFO (when only SCHED_OTHER is involved), the other possibility would be
to start using FIFOs as before, but when the first non-SCHED_OTHER thread
decides to wait on the futex, switch it to plist wakeup mode (convert the
FIFO into a plist) and from that point on just use plist wakeups on it.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20-rc4 4/4][RFC] sys_futex64 : allows 64bit futexes

2007-01-11 Thread Jakub Jelinek
On Tue, Jan 09, 2007 at 05:25:26PM +0100, Pierre Peiffer wrote:
> This latest patch is an adaptation of the sys_futex64 syscall provided in 
> -rt
> patch (originally written by Ingo). It allows the use of 64bit futex.
> 
> I have re-worked most of the code to avoid the duplication of the code.
> 
> It does not provide the functionality for all architectures, and thus, it 
> can
> not be applied "as is".
> But, again, feedbacks and comments are welcome.

Why do you support all operations for 64-bit futexes?
IMHO PI futexes don't make sense for 64-bit futexes, PI futexes have
hardcoded bit layout of the 32-bit word.  Similarly, FUTEX_WAKE
is not really necessary for 64-bit futexes, 32-bit futex's FUTEX_WAKE
can wake it equally well (it never reads anything, all it cares
is about the futex's address).  Similarly, I don't see a need for
FUTEX_WAKE_OP (and this could simplify the patch quite a lot, no
need to change asm*/futex.h headers at all).
All that's needed is 64-bit FUTEX_WAIT and perhaps FUTEX_CMP_REQUEUE.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] work around gcc4 issue with -Os in Dwarf2 stack unwind code

2006-11-28 Thread Jakub Jelinek
On Tue, Nov 28, 2006 at 02:12:24PM +, Jan Beulich wrote:
> This fixes a problem with gcc4 mis-compiling the stack unwind code under
> -Os, which resulted in 'stuck' messages whenever an assembly routine was
> encountered.

"mis-compiling" and "work around" are wrong words, the code had undefined
behavior (there is no sequence point between evaluation of ptr and
get_uleb128(&ptr, end) and ptr is modified twice, so the compiler can
evaluate it e.g. as:
temp = ptr;
temp = temp + get_uleb128(&ptr, end);
ptr = temp;
or
temp = get_uleb128(&ptr, end);
ptr += temp;
While gcc has some warnings for sequence point semantics violations
(-Wsequence-point), this can't be one of the cases at least until IPA moves
much further, because get_uleb128 might very well not modify the variable
and at that point the code would be ok).

> Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
> 
> --- linux-2.6.19-rc6/kernel/unwind.c  2006-11-22 14:54:10.0 +0100
> +++ 2.6.19-rc6-unwind-stuck/kernel/unwind.c   2006-11-28 15:02:15.0 
> +0100
> @@ -938,8 +938,11 @@ int unwind(struct unwind_frame_info *fra
>   else {
>   retAddrReg = state.version <= 1 ? *ptr++ : 
> get_uleb128(&ptr, end);
>   /* skip augmentation */
> - if (((const char *)(cie + 2))[1] == 'z')
> - ptr += get_uleb128(&ptr, end);
> + if (((const char *)(cie + 2))[1] == 'z') {
> + uleb128_t augSize = get_uleb128(&ptr, end);
> +
> + ptr += augSize;
> + }
>   if (ptr > end
>  || retAddrReg >= ARRAY_SIZE(reg_info)
>  || REG_INVALID(retAddrReg)

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] work around gcc4 issue with -Os in Dwarf2 stack unwind code

2006-11-28 Thread Jakub Jelinek
On Tue, Nov 28, 2006 at 02:48:15PM +, Jan Beulich wrote:
> I disagree - the standard says there's a sequence point at a function
> call after evaluating all function arguments. To me this means that any

That's true, that sequence point makes sure e.g. all side effects such as
pre-{dec,inc}rement on the arguments happen before the call.
But as I said, no sequence point demands any particular ordering of
evaluation of the LHS and RHS of +=.

> (parts of an) expression the function call is contained in must be
> evaluated after the function call. Otherwise it would be illegal to e.g.
> modify a variable in both operands of && or ||.

That's different, there is a sequence point at the end of the first operand
of &&, ||, ?: and , operators (second bullet in ISO C99 Annex C).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-29 Thread Jakub Jelinek
On Wed, Nov 29, 2006 at 02:56:20PM +1100, Keith Owens wrote:
> Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote:
> >On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
> >> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
> >> wait_hpet_tick is optimized away to a never ending loop and the kernel
> >> hangs on boot in timer setup.
> >> 
> >> 001a :
> >>   1a:   55  push   %ebp
> >>   1b:   89 e5   mov%esp,%ebp
> >>   1d:   eb fe   jmp1d 
> >> 
> >> This is not a problem with gcc 3.3.5.  Adding barrier() calls to
> >> wait_hpet_tick does not help, making the variables volatile does.
> >> 
> >> Signed-off-by: Keith Owens 
> >> 
> >> ---
> >>  arch/i386/kernel/time_hpet.c |2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> Index: linux-2.6/arch/i386/kernel/time_hpet.c
> >> ===
> >> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
> >> +++ linux-2.6/arch/i386/kernel/time_hpet.c
> >> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
> >>   */
> >>  static void __devinit wait_hpet_tick(void)
> >>  {
> >> -  unsigned int start_cmp_val, end_cmp_val;
> >> +  unsigned volatile int start_cmp_val, end_cmp_val;
> >>  
> >>start_cmp_val = hpet_readl(HPET_T0_CMP);
> >>do {
> >
> >When you examine the inlined functions involved, this looks an awful lot
> >like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278
> >
> >Perhaps SUSE should fix their gcc instead of working around compiler
> >problems in the kernel?
> 
> Firstly, the fix for 22278 is included in gcc 4.1.0.

This actually sounds more like http://gcc.gnu.org/PR27236
And that one is broken in 4.1.0, fixed in 4.1.1.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 4/5][AIO] - AIO completion signal notification

2006-11-29 Thread Jakub Jelinek
On Wed, Nov 29, 2006 at 11:33:01AM +0100, S?bastien Dugu? wrote:
>   AIO completion signal notification
> 
>   The current 2.6 kernel does not support notification of user space via
> an RT signal upon an asynchronous IO completion. The POSIX specification
> states that when an AIO request completes, a signal can be delivered to
> the application as notification.
> 
>   This patch adds a struct sigevent *aio_sigeventp to the iocb.
> The relevant fields (pid, signal number and value) are stored in the kiocb
> for use when the request completes.
> 
>   That sigevent structure is filled by the application as part of the AIO
> request preparation. Upon request completion, the kernel notifies the
> application using those sigevent parameters. If SIGEV_NONE has been specified,
> then the old behaviour is retained and the application must rely on polling
> the completion queue using io_getevents().

Well, from what I see applications must rely on polling the completion
queue using io_getevents() in any case, isn't that the only way how to free
the kernel resources associated with the AIO request, even if it uses
SIGEV_SIGNAL or thread notification?  aio_error/aio_return/aio_suspend
will still need to io_getevents, the only important difference with this
patch is that a) the polling doesn't need to be asynchronous (i.e. have
a special thread which just loops doing io_getevents)
b) it doesn't need to care about notification itself.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-30 Thread Jakub Jelinek
On Fri, Dec 01, 2006 at 08:28:16AM +0100, Willy Tarreau wrote:
> Oh, I'm perfectly aware of this. That's in part why I started the hotfix
> branch in the past :-) But sometimes, fixes consist in merging all the
> patches from the maintenance branch (eg: from 4.1.0 to 4.1.1), and if
> this is the case, there would not be much justification not to simply
> update the version. In fact, what's really missing is a "fixlevel" in
> the packages, to inform the user that 4.1.0 as shipped by the distro
> has the same level of fixes as 4.1.1. But this is what the version is
> used for today.

This is even more complicated by the fact that upstream GCC release branches
(and also several Linux distributors) start announcing the upcoming version
already a few days after a release is tagged.
E.g. 14 days old gcc-4_1-branch says:
./xgcc -B ./ --version; ./xgcc -B ./ -dD -E -xc /dev/null | grep GNU
xgcc (GCC) 4.1.2 20061114 (prerelease)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#define __GNUC__ 4
#define __GNUC_MINOR__ 1
#define __GNUC_PATCHLEVEL__ 2

but GCC 4.1.2 has not been released yet.
In Fedora Core/RHEL and I think a few other distros the version number
is only changed when it is officially released, e.g.:

gcc --version; gcc -dD -E -xc /dev/null | grep GNU
gcc (GCC) 4.1.1 20061011 (Red Hat 4.1.1-30)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#define __GNUC__ 4
#define __GNUC_MINOR__ 1
#define __GNUC_PATCHLEVEL__ 1
#define __GNUC_RH_RELEASE__ 30

Note, 4.1.1 was released end of May this year and 4.1.2 has not been
released.  So, using __GNUC_PATCHLEVEL__ to detect if a bug has been fixed
or not isn't very useful (you'd need to rule out also __GNUC_PATCHLEVEL__ <= 1
because gcc-4_1-branch was announcing that patchlevel already since
beggining of March, on the other side there is a lot of GCCs with
__GNUC_PATCHLEVEL__ == 1 that certainly have that bug fixed).

You perhaps could parse the prerelease vs. release vs. vendor strings,
but that could be quite difficult, perhaps easier would be just parse the
date in the --version output.  Checking for the bug is best though, because
that will catch even backports of the bugfix without rebasing from the
release branch.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] get_random_long() and AT_ENTROPY for auxv, kernel 2.6.21.5

2007-06-25 Thread Jakub Jelinek
On Sun, Jun 24, 2007 at 09:43:03PM -0700, Arjan van de Ven wrote:
> > - something to do with aux vector headers
> 
> the primary goal is to pass a random value to userspace at process
> start; this to save glibc from having to open /dev/urandom on ever
> program start (which it does now for all apps compiled with
> -fstack-protector, which in various distros is "everything").

There are 2 ways to compile -fstack-protector supporting glibc actually,
only one opens /dev/urandom on every program initialization, the other
computes the stack guard from some bits of the stack address (so indirectly
depends on get_random_int() in stack randomization).
Nevertheless, having one random long (32-bit for 32-bit arches, 64-bit
otherwise) in aux vector would be useful.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RESEND] PIE randomization

2007-07-07 Thread Jakub Jelinek
On Sat, Jul 07, 2007 at 02:13:01AM +0200, Jiri Kosina wrote:
> On Thu, 5 Jul 2007, Rik van Riel wrote:
> 
> > So the original patch has:
> > #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
> > For some reason(?) it got changed to the clearly buggy:
> > #define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK)
> > Jiri's patch undoes that second buggy define, which is very
> > different from the original that was sent in by you and Ernie.
> 
> This is a part of execshield patch, fthe pie-compiled binary executable 
> memory layout randomization was extracted from - see 
> http://people.redhat.com/~mingo/exec-shield/exec-shield-nx-2.6.19.patch
> 
> Note that load_elf_interp() in vanilla kernel differs from the 
> execshield's (and pie-randomization.patch) version.
> 
> The fix makes the BAD_ADDR check whether the address belongs to the 
> ERR_PTR range, which seems valid for all uses of BAD_ADDR in the patched 
> binfmt_elf.c (do_brk(), elf_map(), do_mmap() etc return valid address or 
> err ptr) ... am I missing something obvious here?

I believe BAD_ADDR macro was changes from ((unsigned long)(x) >= TASK_SIZE)
(which is the right test for invalid user addresses, stronger check than
>= PAGE_MASK) to >= PAGE_MASK only because of the one check of the return
value of load_elf_interp.  All other uses of BAD_ADDR macro are either on
userland addresses (what do_mmap, elf_map, do_brk etc. return;
where TASK_SIZE or more is certainly wrong) or in one case still on unbiased
ELF p_vaddr:
if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
in load_elf_binary (where >= TASK_SIZE check is ok too).

So perhaps doing this instead of changing BAD_ADDR to IS_ERR_VAL
might be better:

Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]>

--- linux/fs/binfmt_elf.c   2007-06-08 21:53:45.0 +0200
+++ linux/fs/binfmt_elf.c   2007-07-07 14:19:14.0 +0200
@@ -80,7 +80,7 @@ static struct linux_binfmt elf_format = 
.hasvdso= 1
 };
 
-#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK)
+#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
 
 static int set_brk(unsigned long start, unsigned long end)
 {
@@ -1015,7 +1015,7 @@ static int load_elf_binary(struct linux_
interpreter,
&interp_map_addr,
load_bias);
-   if (!BAD_ADDR(elf_entry)) {
+   if (!IS_ERR((void *)elf_entry)) {
/* load_elf_interp() returns relocation 
adjustment */
interp_load_addr = elf_entry;
elf_entry += loc->interp_elf_ex.e_entry;


Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RESEND] PIE randomization

2007-07-10 Thread Jakub Jelinek
On Mon, Jul 09, 2007 at 11:58:07PM +0200, Jiri Kosina wrote:
> On Mon, 9 Jul 2007, Jiri Kosina wrote:
> > [ ... ]
> > > - if (!BAD_ADDR(elf_entry)) {
> > > + if (!IS_ERR((void *)elf_entry)) {
> > I agree that this is better solution. Andrew, this Jakub's patch should 
> > replace the pie-randomization-fix-bad_addr-macro.patch if possible. You 
> > can add 
> 
> as this raced :) with Andrew who already folded the 
> pie-randomization-fix-bad_addr-macro.patch into pie-randomization.patch, 
> do you think you could rebase this change against the current state of -mm 
> and resend it? Thanks,

Here it is:

Restore BAD_ADDR check strictness, use IS_ERR in the only place where
the stricter BAD_ADDR can't work, as the value is a load bias rather
than userland address.

Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]>

--- linux/fs/binfmt_elf.c   2007-07-10 11:39:29.0 +0200
+++ linux/fs/binfmt_elf.c   2007-07-10 11:41:03.0 +0200
@@ -80,7 +80,7 @@ static struct linux_binfmt elf_format = 
.hasvdso= 1
 };
 
-#define BAD_ADDR(x) IS_ERR_VALUE(x)
+#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
 
 static int set_brk(unsigned long start, unsigned long end)
 {
@@ -1005,7 +1005,7 @@ static int load_elf_binary(struct linux_
interpreter,
&interp_map_addr,
load_bias);
-   if (!BAD_ADDR(elf_entry)) {
+   if (!IS_ERR((void *)elf_entry)) {
/*
 * load_elf_interp() returns relocation
 * adjustment


Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce O_CLOEXEC (take >2)

2007-05-31 Thread Jakub Jelinek
On Thu, May 31, 2007 at 11:46:31AM -0700, Davide Libenzi wrote:
> On Thu, 31 May 2007, Ulrich Drepper wrote:
> > Davide Libenzi wrote:
> > > Isn't this better be a global process flag? Default should be, for legacy
> > > reasons,
> > 
> > No.  Policies are always wrong since it means code that cannot change
> > the policy (e.g, all runtime libraries) have no access to the
> > functionality.  I cannot set the policy to default to close-on-exit in
> > glibc all the while the application assumes this is not the case.
> 
> I was talking for a broader usage, not only glibc centric. Most ppl 
> writing MT+exec apps wants all but (eventually) and handfull of files 
> leaking across the exec boundary.

If open (and all other syscalls that create fds) have O_CLOEXEC (and
something similar for other syscalls), then such a policy can be easily
implemented on the userland, if desired.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_CLOEXEC: An alternate proposal

2007-06-08 Thread Jakub Jelinek
On Fri, Jun 08, 2007 at 03:47:12AM -0400, Daniel Colascione wrote:
> Hey, this is my first post to linux-kernel, so please be kind. :-)
> 
> Linus Torvalds wrote on May 31:
> > I'm with Uli on this one. "Stateful" stuff is bad. It's essentially
> > impossible to handle with libraries - either the library would have to
> > explciitly always turn the state the way _it_ needs it, or the library
> > will do the wrogn thing.
> 
> I agree that stateful stuff is generally not very elegant,
> but I think it's a win here -- we wouldn't have to create any
> new APIs except for the state-setting stuff.
> 
> The state just has to be thread-local.
> 
> If it's thread-local, a library, say, glibc,
> can use code like this:
> 
>   /* Internal library function */
>   old_fd_flags = kernel_default_fd_flags(FD_CLOEXEC | FD_RANDFD);
>   event_fd = super_duper_event_polling_mechanism_fd();
>   kernel_default_fd_flags(old_fd_flags);

It is not a win, what if a signal comes in between the two
kernel_default_fd_flags syscalls?  open and other functions
are async signal safe and programs will be certainly upset
if suddenly the syscalls in the signal handler start to behave
differently depending on which exact code the async signal
has interrupted.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Userspace compiler support of "long long"

2007-06-28 Thread Jakub Jelinek
On Thu, Jun 28, 2007 at 07:53:51AM -0400, Kyle Moffett wrote:
> On Jun 27, 2007, at 23:57:54, Matthew Wilcox wrote:
> >On Wed, Jun 27, 2007 at 06:30:52PM -0400, Kyle Moffett wrote:
> >>Then all 64-bit archs have:
> >>typedef   signed long  __s64;
> >>typedef unsigned long  __u64;
> >>
> >>While all 32-bit archs have:
> >>typedef   signed long long __s64;
> >>typedef unsigned long long __u64;
> >
> >include/asm-parisc/types.h:typedef unsigned long long __u64;
> >
> >For both 32 and 64-bit.
> >
> >include/asm-sh64/types.h:typedef unsigned long long __u64;
> >include/asm-x86_64/types.h:typedef unsigned long long  __u64;
> >
> >So that's three architectures that violate your first assertion.
> 
> Oh, ok, that makes it even easier to say this with certainty:   
> Changing the other 64-bit archs to use "long long" for their 64-bit  
> numbers will not cause additional warnings.  I'm also almost certain  
> there are no architectures which use "long long" for 128-bit  
> integers. (Moreover, I can't find hardly anything which does 128-bit  
> integers at all).

unsigned long and unsigned long long have the same size, precision
and alignment on all LP64 arches, that's true.  But they have
different ranks and more importantly they mangle differently in C++.
So, whether some user exposed type uses unsigned long or unsigned long long
is part of the ABI, whether that's size_t, uintptr_t, uint64_t, u_int64_t
or any other type, you can't change it without breaking the ABI.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RESEND] PIE randomization

2007-07-04 Thread Jakub Jelinek
On Wed, May 23, 2007 at 10:50:24AM +0200, Jiri Kosina wrote:
> From: Jan Kratochvil <[EMAIL PROTECTED]>
> 
> This patch is using mmap()'s randomization functionality in such a way 
> that it maps the main executable of (specially compiled/linked -pie/-fpie) 
> ET_DYN binaries onto a random address (in cases in which mmap() is allowed 
> to perform a randomization).
> 
> Origin of this patch is in exec-shield
> (http://people.redhat.com/mingo/exec-shield/)
> 
> Signed-off-by: Jan Kratochvil <[EMAIL PROTECTED]>
> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>
> Cc: Ingo Molnar <[EMAIL PROTECTED]>
> Cc: Roland McGrath <[EMAIL PROTECTED]>
> Cc: Jakub Jelinek <[EMAIL PROTECTED]>

> -#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
> +#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK)
...
> @@ -442,8 +491,7 @@ static unsigned long load_elf_interp(str
>   goto out_close;
>   }
>  
> - *interp_load_addr = load_addr;
> - error = ((unsigned long)interp_elf_ex->e_entry) + load_addr;
> + error = load_addr;
...
>   if (elf_interpreter) {
> - if (interpreter_type == INTERPRETER_AOUT)
> + if (interpreter_type == INTERPRETER_AOUT) {
>   elf_entry = load_aout_interp(&loc->interp_ex,
>interpreter);
> - else
> + } else {
> + unsigned long interp_map_addr;  /* unused */
> +
>   elf_entry = load_elf_interp(&loc->interp_elf_ex,
>   interpreter,
> - &interp_load_addr);
> + &interp_map_addr,
> + load_bias);
> + if (!BAD_ADDR(elf_entry)) {
> + /*
> +  * load_elf_interp() returns relocation
> +  * adjustment
> +  */
> + interp_load_addr = elf_entry;
> + elf_entry += loc->interp_elf_ex.e_entry;
> + }
> + }
>   if (BAD_ADDR(elf_entry)) {
>   force_sig(SIGSEGV, current);
>   retval = IS_ERR((void *)elf_entry) ?

The above highlighted changes are the cause of random segfaults of PIE
binaries.  See
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=246623
The problem is if ld.so is prelinked to some address in the area where
the kernel actually maps it, particularly if elf_map in load_elf_interp
returns an address one page below its first PT_LOAD segments vaddr.
Then load_addr (it is a load bias actually) returned from load_elf_interp
is 0xf000 (on 32-bit kernels) and BAD_ADDR are all
addresses >= 0xf000 (on i?86).
The fix should be either changing the definition of BAD_ADDR to
e.g. IS_ERR_VALUE(x), or at least changing the if (!BAD_ADDR(elf_entry)) {
above to if (!IS_ERR_VALUE(elf_entry)) {, the second BAD_ADDR can already
stay, because at that place elf_entry is no longer a bias (difference
between actual and preferred load address), but an actual address, where
very high addresses are of course invalid.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMP performance degradation with sysbench

2007-03-13 Thread Jakub Jelinek
On Tue, Mar 13, 2007 at 01:02:44PM +0100, Eric Dumazet wrote:
> On Tuesday 13 March 2007 12:42, Andrea Arcangeli wrote:
> 
> > My wild guess is that they're allocating memory after taking
> > futexes. If they do, something like this will happen:
> >
> >  taskA  taskB   taskC
> >  user lock
> > mmap_sem lock
> >  mmap sem -> schedule
> > user lock -> schedule
> >
> > If taskB wouldn't be there triggering more random trashing over the
> > mmap_sem, the lock holder wouldn't wait and task C wouldn't wait too.
> >
> > I suspect the real fix is not to allocate memory or to run other
> > expensive syscalls that can block inside the futex critical sections...
> 
> glibc malloc uses arenas, and trylock() only. It should not block because if 
> an arena is already locked, thread automatically chose another arena, and 
> might create a new one if necessary.

Well, only when allocating it uses trylock, free uses normal lock.
glibc malloc will by default use the same arena for all threads, only when
it sees contention during allocation it gives different threads different
arenas.  So, e.g. if mysql did all allocations while holding some global
heap lock (thus glibc wouldn't see any contention on allocation), but
freeing would be done outside of application's critical section, you would
see contention on main arena's lock in the free path.
Calling malloc_stats (); from e.g. atexit handler could give interesting
details, especially if you recompile glibc malloc with -DTHREAD_STATS=1.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >