Re: [regression] Re: brk randomization breaks columns
On Tue, Feb 05, 2008 at 01:54:26PM +0100, Ingo Molnar wrote: > * Jiri Kosina <[EMAIL PROTECTED]> wrote: > > > On Tue, 5 Feb 2008, Pavel Machek wrote: > > > > > > Actually, this clearly shows that either prehistoric libc.so.5 or the > > > > program itself are broken. > > > I believe it shows clear regression in latest 2.6.25 kernel. > > > > I am still not completely sure. It might be a regression, but it also > > might just trigger the bug in ancient version in libc.so.5 which might > > be fixed in some later version [...] > > which too is a regression ... > > really, lets add a sysctl for this, and a .config option that either > disables or enables it. Then we will default to disabled. (but users can > enable it - and distros can build their kernels with this .config option > enabled) I don't think kernel should care about programs which are buggy and make invalid assumptions, and that's the case here. I remember we have been through this 5 years ago when brk randomization has been added to Red Hat kernels. There was one or two broken programs which made assumptions on what brk(0) is supposed to return at program startup, everything else was ok. For the buggy apps there is always setarch i386 -R ./the_buggy_program so I don't think we need to add another sysctl for this. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
asm-x86/sigcontext.h changes break userland
Hi! The x86: use generic register names in struct sigcontext http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=742fa54a62be6a263df14a553bf832724471dfbe changeset breaks userland, e.g. it is not possible to compile gcc anymore (both 32-bit and 64-bit libgcc), and I expect any other program which pokes into struct sigcontext. The register names with e resp. r have been in use for years, what's the point breaking it now? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: asm-x86/sigcontext.h changes break userland
On Wed, Feb 13, 2008 at 08:26:50AM +0100, Ingo Molnar wrote: > > * Jakub Jelinek <[EMAIL PROTECTED]> wrote: > > > x86: use generic register names in struct sigcontext > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=742fa54a62be6a263df14a553bf832724471dfbe > > > > changeset breaks userland, e.g. it is not possible to compile gcc > > anymore (both 32-bit and 64-bit libgcc), and I expect any other > > program which pokes into struct sigcontext. The register names with e > > resp. r have been in use for years, what's the point breaking it now? > > ok - does the patch below solve the problem for you? Yes, this fixes it. Thanks. FYI, gcc uses glibc headers to get at struct sigcontext, but on i386 (and many other arches) glibc's just includes . On x86_64, ia64 and sparc* glibc doesn't include asm/sigcontext.h, but provides its own definitions, so for gcc itself only changing 32-bit parts woiuld be enough. That said, there are certainly programs which include asm/sigcontext.h directly (plus there are other c libraries, some of which may use asm/sigcontext.h). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86.git#mm] stack protector fixes, vmsplice exploit
On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote: > The per function call overhead from stackprotector is already pretty > serious IMO, but at least that's something that GCC _could_ be doing > (much) smarter (why doesnt it jne forward out to __check_stk_failure, > instead of generating 4 instructions, one of them a default-mispredicted > branch instruction??), so that overhead could in theory be something > like 4 fall-through instructions per function, instead of the current 6. Where do you see a mispredicted branch? int foo (void) { char buf[64]; bar (buf); return 6; } -O2 -fstack-protector -m64: subq$88, %rsp movq%fs:40, %rax movq%rax, 72(%rsp) xorl%eax, %eax movq%rsp, %rdi callbar movq72(%rsp), %rdx xorq%fs:40, %rdx movl$6, %eax jne .L5 addq$88, %rsp ret .L5: .p2align 4,,6 .p2align 3 call__stack_chk_fail -O2 -fstack-protector -m32: pushl %ebp movl%esp, %ebp subl$88, %esp movl%gs:20, %eax movl%eax, -4(%ebp) xorl%eax, %eax leal-68(%ebp), %eax movl%eax, (%esp) callbar movl$6, %eax movl-4(%ebp), %edx xorl%gs:20, %edx jne .L5 leave ret .L5: .p2align 4,,7 .p2align 3 call__stack_chk_fail -O2 -fstack-protector -m64 -mcmodel=kernel: subq$88, %rsp movq%gs:40, %rax movq%rax, 72(%rsp) xorl%eax, %eax movq%rsp, %rdi callbar movq72(%rsp), %rdx xorq%gs:40, %rdx movl$6, %eax jne .L5 addq$88, %rsp ret .L5: .p2align 4,,6 .p2align 3 call__stack_chk_fail both with gcc 4.1.x and 4.3.0. BTW, you can use -fstack-protector --param=ssp-buffer-size=4 etc. to tweak the size of buffers to trigger stack protection, the default is 8, but e.g. whole Fedora is compiled with 4. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote: > On 10/08, Linus Torvalds wrote: > > > > (not yet merged), see: > > > > > > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d > > I do not really understand inline assembly constraints, but I'll ask > anyway. > > +#define __GEN_RMWcc(fullop, var, cc, ...) \ > +do { \ > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \ > + : : "m" (var), ## __VA_ARGS__ \ > ^ > > don't we need > > "+m" (var) > > here? You actually can't have output operands with asm goto, only inputs and clobbers. But the "memory" clobber should be enough here. If you suspect a compiler bug, can somebody please narrow it down to a single object file (if I've skimmed the patch right, it is just an optimization, where object files compiled without and with the patch should actually coexist fine in the same kernel), ideally to a single routine if possible and post a preprocessed source + gcc command line + version of gcc? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 04:46:56PM +0200, Peter Zijlstra wrote: > On Wed, Oct 09, 2013 at 04:33:59PM +0200, Peter Zijlstra wrote: > > On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote: > > > Once I force a x86_64 build using the 'same' config it goes away and > > > generates 'sensible' code again (although I don't see why L9 isn't > > > merged with L2): > > > > i386-SMP also generates correct code afaict; a tad stupid but not wrong. > > > > If I remove ftrace from the .config its still broken.. > > If I also remove the likely/unlikely tracer its still broken and lots > > smaller: > > OK, its -march=winchip2 that's buggered. Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 > > Seems all of 4.[6-9] miscompile it. Will have a look tomorrow > > unless somebody beats me to it. But historically, the case where > > asm goto labels jump to fallthru basic block had numerous problems in the > > past. > > That bug lists the component as middle end; this suggests x86_64 would > be vulnerable too, can you confirm? So far we've only observed the wrong > code on i386 targets, x86_64 targets appeared correct. Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and even say on ppc64 (sure, one would have to rewrite the asm to have it fail at runtime). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Thu, Oct 10, 2013 at 08:22:38AM +0200, Ingo Molnar wrote: > > On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: > > > > > > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of > > > > 4.[6-9] miscompile it. Will have a look tomorrow unless somebody > > > > beats me to it. But historically, the case where asm goto labels > > > > jump to fallthru basic block had numerous problems in the past. > > > > > > That bug lists the component as middle end; this suggests x86_64 would > > > be vulnerable too, can you confirm? So far we've only observed the > > > wrong code on i386 targets, x86_64 targets appeared correct. > > > > Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and > > even say on ppc64 (sure, one would have to rewrite the asm to have it > > fail at runtime). > > Please let us know once you know enough about the bug to suggest > workarounds. Because it's a nice optimization even extra instruction(s) > would be acceptable I suspect: we could perhaps put a NOP into a slowpath, > with an (unused) goto to it, or something like that? IMHO you don't need to put there a nop, I guess asm (""); would be enough, that will still make sure the label is never in the fallthru basic block and the whole class of issues with asm goto with labels in the fallthru bb can't hit. The disadvantage is that it will generate worse code. @@ -8,6 +8,7 @@ foo (int a, int b) asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); return 0; lab: + asm (""); return 0; } on the testcase from the PR results in something like: #APP # 8 "pr58670-1.c" 1 bts $1, -4(%rsp); jc .L3 # 0 "" 2 #NO_APP .L5: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L3: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L4: movl$-3, %eax ret while code without the extra asm (""); and with a fixed compiler: #APP # 6 "pr58670.c" 1 bts $1, -4(%rsp); jc .L3 # 0 "" 2 #NO_APP .L3: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L4: .L2: movl$-3, %eax ret FYI, list of past compiler issues with asm goto include: PR54127, PR46226, PR44071, PR52650, PR54455, PR51767. I hope we get this fixed for 4.8.2, so you could then avoid these hacks for GCC 4.8.2 and later. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Thu, Oct 10, 2013 at 08:51:04AM +0200, Jakub Jelinek wrote: > @@ -8,6 +8,7 @@ foo (int a, int b) >asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); >return 0; > lab: > + asm (""); >return 0; > } Or alternatively put the asm (""); right after asm goto, asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); asm (""); return ...; lab; return ...; What generates better code remains to be tested. In any case, please conditionalize the hacks on non-fixed compilers once the fix is released. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] gcc4: Add 'asm goto' miscompilation quirk
On Thu, Oct 10, 2013 at 10:24:30AM +0200, Ingo Molnar wrote: > Something like the patch below? (Totally untested and all that.) > > Notes: > > - If the bug is fixed in 4.8.3 then the version check can be sharpened > from 9 to 40803. The bug is likely going to be fixed already for 4.8.2 (to be released next week or so). > - I'd really prefer this quirk versus having to add the extra barrier to > the label, as it makes the actual usage sites a lot less painful. Please check how much it bloats the generated code. Also, for the bitops patch, you probably want an asm_volatile_goto variant. > --- a/include/linux/compiler-gcc4.h > +++ b/include/linux/compiler-gcc4.h > @@ -65,6 +65,19 @@ > #define __visible __attribute__((externally_visible)) > #endif > > +/* > + * GCC 'asm goto' miscompiles certain code sequences: > + * > + * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > + * > + * Work it around via quirk suggested by Jakub Jelinek. > + * Not yet fixed, so use the quirk on all compiler versions: > + */ > +#if GCC_VERSION <= 9 > +# define asm_goto(x...) do { asm goto(x); asm (""); } while (0) > +#else > +# define asm_goto(x...) do { asm goto(x); } while (0) > +#endif > > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP > #if GCC_VERSION >= 40400 Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH, -v2] compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
On Thu, Oct 10, 2013 at 01:56:17PM +0200, Peter Zijlstra wrote: > On Thu, Oct 10, 2013 at 10:55:06AM +0200, Ingo Molnar wrote: > > +/* > > + * GCC 'asm goto' miscompiles certain code sequences: > > + * > > + * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > > + * > > + * Work it around via quirk suggested by Jakub Jelinek. > > + * Fixed in GCC 4.8.2 and later versions. > > + */ > > +#if GCC_VERSION <= 40801 > > We didn't do version checks for CC_HAVE_ASM_GOTO because of vendor > backports; can't we detect this in the same way? The problem is that it will be harder to check for this as compile time only check, and for runtime check you'd need to have the assembly string for every architecture and you couldn't do it for cross-compiling anyway. For compile time only check, it wouldn't be 100% reliable, you could e.g. check for that using -S -O2 -xc - -o - on: int foo (int a, int b) { if (a) return -3; asm volatile goto ("asm volatile goto to %l[lab]" : : "m" (b) : "memory" : lab); return 0; lab: return 0; } and use awk on the resulting assembly to find out if the asm volatile goto to (.*)$ string, then skip lines starting in column 0 with an assembly comment character(s) (#, %, //, not sure if those 3 are all you can see) and check that the first non-skipped line starts with the string matching (.*) earlier followed by : (or perhaps skip other labels too?). That said, the check could fail even in fixed gccs, so perhaps you want to combine that with both version check and test, if version is >= 4.8.3 (note, while I hope it will be fixed in 4.8.2 release, people using prerelease compilers would still have __GNUC_PATCHLEVEL__ == 2, at least in upstream gcc (e.g. in Fedora/RHEL we patch down the patchlevel version, so that __GNUC_PATCHLEVEL__ is 2 only for GCC release x.y.2 and following snapshots, while upstream bumps patchlevel immediately after a release is made), even with gcc containing that bug. So for >= 4.8.3 just assume no workaround is needed, otherwise scan assembly. > > > +# define __asm_goto(vol, x...) do { asm vol goto(x); asm (""); } while (0) > > +#else > > +# define __asm_goto(vol, x...) do { asm vol goto(x); } while (0) > > +#endif > > This places the asm("") in the fallthrough case; but Jakub wrote: > > > @@ -8,6 +8,7 @@ foo (int a, int b) > >asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : > > lab); > >return 0; > > lab: > > + asm (""); > >return 0; > > } > > Which places the asm ("") after the label, these two are not the same. See the follow-up mails, I think placing it immediately after asm goto might be better. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] gcc4: Add 'asm goto' miscompilation quirk
On Thu, Oct 10, 2013 at 07:04:18AM -0700, Richard Henderson wrote: > On 10/10/2013 01:31 AM, Jakub Jelinek wrote: > > Also, for the bitops patch, you probably want an asm_volatile_goto variant. > > Why? Asm without output (which asm goto must be) are automatically volatile. You're right. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Friendlier EPERM - Request for input
On Wed, Jan 09, 2013 at 12:53:40PM -0800, Casey Schaufler wrote: > I'm suggesting that the string returned by get_extended_error_info() > ought to be the audit record the system call would generate, regardless > of whether the audit system would emit it or not. What system call would that info be for and would it be reset on next syscall that succeeded, or also failed? The thing is, various functions e.g. perform some syscall, save errno, do some other syscall, and if they decide that the first syscall should be what determines the whole function's errno, just restore errno from the saved value and return. Similarly, various functions just set errno upon detecting some error condition in userspace. There is no 1:1 mapping between many libc library calls and syscalls. So, when would it be safe to call this new get_extended_error_info function and how to determine to which syscall it was relevant? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] amusing copy_from_user bug
On Tue, Apr 10, 2001 at 03:11:05AM -0700, Dawson Engler wrote: > As a side question: is it still true that verify_area's must be done before > any use of __put_user/__get_user/__copy_from_user/etc? I believe so, at least in generic code. In architecture specific code (non-i386) it is usually sufficient just to do one put_user/get_user/copy_from_user and then do the rest of __put_user/__get_user etc. from nearby area (<4K is safe e.g. on sparc) and some architectures don't care at all, because verify_area is a noop (sparc64). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: shm_open doesn't work (fix maybe).
On Tue, Apr 24, 2001 at 11:46:20AM -0500, Tom Brusehaver (N-Sysdyne Corporation) wrote: > > I have been chasing all around trying to find out why > shm_open always returns ENOSYS. It is implemented > in glibc-2.2.2, and seems the 2.4.3 kernel knows about > shmfs. > > It seems the file linux/mm/shmem.c has: > #define SHMEM_MAGIC 0x01021994 > > And the glibc-2.2.2/sysdeps/unix/sysv/linux/linux_fsinfo.h has: > #define SHMFS_SUPER_MAGIC 0x02011994 > > Well, which is correct? Update your glibc, 2.2.3pre* matches 2.4.x kernel: 2001-03-03 Ulrich Drepper <[EMAIL PROTECTED]> * sysdeps/unix/sysv/linux/linux_fsinfo.h (SHMFS_SUPER_MAGIC): Update for real 2.4 kernels. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sendfile64?
On Tue, Feb 20, 2001 at 02:51:24PM +1300, Chris Wedgwood wrote: > Why isn't there a sendfile64? > > because nobody has implemented on -- arguably it's not needed; the > different between: > > sendfile64(...) > > and > > while(blah){ > sendfile( ... 1G or so ...) > } > > probably won't be detectable anyhow. I see no reason why sendfile64 > should be purely user-space (then again, I see no reason why not to > extend the kernel API as is, but last time I tested it is was busted > WRT signals so I would rather that be fixed before further > proliferation there). Wrong. sendfile takes a pointer to off_t, not loff_t, so you cannot replace sendfile64 with multiple sendfile's if offset is non-NULL from userland. It simply won't work properly on big files (no matter what size you transfer at a time). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Posible bug in gcc
On Mon, Feb 26, 2001 at 05:15:28PM +, Alan Cox wrote: > > I think I heve found a bug in gcc. I have tried both egcs 1.1.2 (gcc > > 2.91.66) and gcc 2.95.2 versions. > > > > I am attaching you a simplified test program ('bug.c', a really simple > > program). > > Well gcc-bugs would be the better place to send it but this is a known problem > fixed in CVS gcc 2.95.3, CVS gcc 3.0 branch and gcc 2.96 (unofficial, Red Hat) I'm not sure if it is known, at least not known to me, but definitely not fixed in any of gcc 2.95.2, CVS gcc 3.0 branch, CVS gcc 3.1 head, gcc 2.96-RH. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is sendfile all that sexy?
On Tue, Jan 16, 2001 at 10:05:06AM -0500, David L. Parsley wrote: > Felix von Leitner wrote: > > > close (0); > > > close (1); > > > close (2); > > > open ("/dev/console", O_RDWR); > > > dup (); > > > dup (); > > > > So it's not actually part of POSIX, it's just to get around fixing > > legacy code? ;-) > > This makes me wonder... > > If the kernel only kept a queue of the three smallest unused fd's, and > when the queue emptied handed out whatever it liked, how many things > would break? I suspect this would cover a lot of bases... First it would break Unix98 and other standards: The Single UNIX (R) Specification, Version 2 Copyright (c) 1997 The Open Group ... int open(const char *path, int oflag, ... ); ... The open() function will return a file descriptor for the named file that is the lowest file descriptor not currently open for that process. The open file description is new, and therefore the file descriptor does not share it with any other process in the system. The FD_CLOEXEC file descriptor flag associated with the new file descriptor will be cleared. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Modprobe local root exploit
On Tue, Nov 14, 2000 at 10:42:41AM +, Malcolm Beattie wrote: > Keith Owens writes: > > All these patches against request_module are attacking the problem at > > the wrong point. The kernel can request any module name it likes, > > using any string it likes, as long as the kernel generates the name. > > The real problem is when the kernel blindly accepts some user input and > > passes it straight to modprobe, then the kernel is acting like a setuid > > wrapper for a program that was never designed to run setuid. > > Rather than add sanity checking to modprobe, it would be a lot easier > and safer from a security audit point of view to have the kernel call > /sbin/kmodprobe instead of /sbin/modprobe. Then kmodprobe can sanitise > all the data and exec the real modprobe. That way the only thing that > needs auditing is a string munging/sanitising program. Well, no matter what kernel needs auditing as well, the fact that dev_load will without any check load any module the user wants is already problematic and no munging helps with it at all, especially loading old ISA drivers might not be a good idea. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
POSIX message queue passing (was Re: State of Posix compliance in v2.2/v2.4 kernel?)
On Sun, Nov 19, 2000 at 07:24:16PM +0900, GOTO Masanori wrote: > At Mon, 13 Nov 2000 11:13:19 -0500, > Jakub Jelinek <[EMAIL PROTECTED]> wrote: > > ago were done in the kernel, POSIX message queue passing is not doable in > > userland without kernel help either (I have a message queue filesystem > > kernel patch for this, but it is a 2.5 thing). > > Interesting. Is yours ready for? > (I'm also working with it. I agree it's for 2.5) Below is my preliminary version from Sep, 16th if you're interested. I haven't had time for it since then, so it most probably will not apply cleanly to current kernel. Things still to do: - clean it up - implement poll on message queues - handle __SI_RT in architectural copy_siginfo_to_user routines - test much more than I have done so far - fix mq_notify - see below - avoid doing linear searches - see below Message queues are presented as a new filesystem, mounted usually on /dev/msg. The objects in that filesystems are fifos with special MQ semantics. One can use normal open/read/write on fifos in /dev/msg, which means mq_open with mq_attr NULL, mq_receive which does not tell the priority and mq_send with default priority. Then there are a few ioctls which allow to open with special queue attributes, send with priority and receive so that you get priority back, etc. Things I'm not sure about is mq_notify, because it states the signal should be sent to the process (ie. I'd think it is tgid, not pid in 2.4.0-test8, but then I don't know which close/exit should cause the notification registration to be freed). Also, I wonder how many pending messages typical message queues have pending, if not too many, then the current linear search is fine, otherwise I should put the messages into some heap which would allow O(1) mq_receive. If you find any races/problems, please let me know. I've coded mqueue.h public glibc userland header and mqueue.c which has hacks on top and then basically what could end up in glibc's mq_*.c (after shm_open.c code for locating mount points is copied in). Jakub --- linux/Documentation/ioctl-number.txt.jj Thu Jun 22 13:42:24 2000 +++ linux/Documentation/ioctl-number.txtFri Sep 8 13:16:42 2000 @@ -183,5 +183,6 @@ CodeSeq#Include FileComments 0xB0 all RATIO devices in development: <mailto:[EMAIL PROTECTED]> 0xB1 00-1F PPPoX <mailto:[EMAIL PROTECTED]> +0xB2 00-1F linux/mqueue.h 0xCB 00-1F CBM serial IEC bus in development: <mailto:[EMAIL PROTECTED]> --- linux/include/asm-alpha/siginfo.h.jjSat May 27 02:49:37 2000 +++ linux/include/asm-alpha/siginfo.h Mon Sep 11 13:30:50 2000 @@ -104,7 +104,7 @@ typedef struct siginfo { #define SI_KERNEL 0x80/* sent by the kernel from somewhere */ #define SI_QUEUE -1 /* sent by sigqueue */ #define SI_TIMER __SI_CODE(__SI_TIMER,-2) /* sent by timer expiration */ -#define SI_MESGQ -3 /* sent by real time mesq state change */ +#define SI_MESGQ __SI_CODE(__SI_RT,-3) /* sent by real time mesq state change */ #define SI_ASYNCIO -4 /* sent by AIO completion */ #define SI_SIGIO -5 /* sent by queued SIGIO */ --- linux/include/asm-arm/siginfo.h.jj Sat May 27 02:49:37 2000 +++ linux/include/asm-arm/siginfo.h Mon Sep 11 13:31:02 2000 @@ -104,7 +104,7 @@ typedef struct siginfo { #define SI_KERNEL 0x80/* sent by the kernel from somewhere */ #define SI_QUEUE -1 /* sent by sigqueue */ #define SI_TIMER __SI_CODE(__SI_TIMER,-2) /* sent by timer expiration */ -#define SI_MESGQ -3 /* sent by real time mesq state change */ +#define SI_MESGQ __SI_CODE(__SI_RT,-3) /* sent by real time mesq state change */ #define SI_ASYNCIO -4 /* sent by AIO completion */ #define SI_SIGIO -5 /* sent by queued SIGIO */ --- linux/include/asm-i386/siginfo.h.jj Thu Sep 7 10:38:08 2000 +++ linux/include/asm-i386/siginfo.hMon Sep 11 13:31:15 2000 @@ -104,7 +104,7 @@ typedef struct siginfo { #define SI_KERNEL 0x80/* sent by the kernel from somewhere */ #define SI_QUEUE -1 /* sent by sigqueue */ #define SI_TIMER __SI_CODE(__SI_TIMER,-2) /* sent by timer expiration */ -#define SI_MESGQ -3 /* sent by real time mesq state change */ +#define SI_MESGQ __SI_CODE(__SI_RT,-3) /* sent by real time mesq state change */ #define SI_ASYNCIO -4 /* sent by AIO completion */ #define SI_SIGIO -5 /* sent by queued SIGIO */ --- linux/include/asm-ia64/siginfo.h.jj Tue Aug 15 10:09:41 2000 +++ linux/include/asm-ia64/siginfo.hMon Sep 11 13:31:23 2000 @@ -113,7 +113,7 @@ typedef struct siginfo { #de
Re: Where did kgcc go in 2.4.0-test10 ?
On Wed, Nov 01, 2000 at 04:54:18PM -0700, Cort Dougan wrote: > Since you're setting yourself up as a proponent of this can you explain why > RedHat includes a compiler that doesn't work with the kernel? Don't get It actually does not compile only 2.2 kernels unless they are patched (the patches so that they can work with gcc we ship are available from H.J.'s site). With 2.4, the gcc we shipped just prints some wrong cpp warnings (which have been fixed long time ago) but compiles a workable kernel. The thing then is really about what is the recommended compiler for compiling kernel, and it is egcs 1.1.2 at the moment, not 2.95.2, nor our 2.96, nor CVS head (the last one is known to miscompile some things in the kernel on x86). > grumpy about who did it first or what the old one is named but be clear > what I'm asking. I want to know if the 'gcc' on RedHat 7.0 fixes some > problems that the older compilers suffered from? If there's a good reason Yes, it fixes several problems the older compilers suffered from, see Richard Henderson's posting about this on lkml from end of September. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: beware of dead string constants
On Tue, Nov 21, 2000 at 06:02:35AM -0600, Peter Samuelson wrote: > > While trying to clean up some code recently (CONFIG_MCA, hi Jeff), I > discovered that gcc 2.95.2 (i386) does not remove dead string > constants: > > void foo (void) > { > if (0) > printk(KERN_INFO "bar"); > } > > Annoyingly, gcc forgets to drop the "<6>bar\0". It shows up in the > object file, needlessly clogging your cachelines. gcc was never dropping such strings, I've commited a patch to fix this a week ago into CVS. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: gcc-2.95.2-51 is buggy
On Fri, Nov 24, 2000 at 06:20:33AM +0100, [EMAIL PROTECTED] wrote: > >> ... RedHat's GCC snapshot "2.96" handles this case just fine. > > > Now, if you can isolate the relevant part of the diff between > > 2.95.2 and RH 2.96... > > Maybe I have to be more precise in the statement "gcc 2.95.2 is buggy". > > I just installed gcc 2.95.2 freshly ftp'ed from ftp.gnu.org, and > > % /usr/bin/gcc -v > Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs > gcc version 2.95.2 19991024 (release) > % /usr/bin/gcc -Wall -O2 -o bug bug.c; ./bug > 0x8480 > % /usr/gcc/aeb/bin/gcc -v > Reading specs from /usr/gcc/aeb/lib/gcc-lib/i686-pc-linux-gnu/2.95.2/specs > gcc version 2.95.2 19991024 (release) > % /usr/gcc/aeb/bin/gcc -Wall -O2 -o nobug bug.c; ./nobug > 0x0 > > So, not all versions of gcc 2.95.2 are equal. I believe all 2.95.2's are equal in this, I think the fact that it gives 0 in the nobug case is some other reason: $ for i in gcc kgcc '/usr/src/gcc-trunk/obj/gcc/xgcc -B /usr/src/gcc-trunk/obj/gcc/' '/usr/src/gcc-2.95.2/obj/gcc/xgcc -B /usr/src/gcc-2.95.2/obj/gcc/'; do $i -v; for j in -mcpu=i386 -mcpu=i586 -mcpu=i686; do $i $j -O2 -o aeb aeb.c; echo -n "$i $j "; ./aeb; done; done Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs gcc version 2.96 2731 (Red Hat Linux 7.0) gcc -mcpu=i386 0x0 gcc -mcpu=i586 0x0 gcc -mcpu=i686 0x0 Reading specs from /usr/lib/gcc-lib/i386-glibc21-linux/egcs-2.91.66/specs gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) kgcc -mcpu=i386 0x0 kgcc -mcpu=i586 0x0 kgcc -mcpu=i686 0x0 Reading specs from /usr/src/gcc-trunk/obj/gcc/specs Configured with: gcc version 2.97 20001120 (experimental) /usr/src/gcc-trunk/obj/gcc/xgcc -B /usr/src/gcc-trunk/obj/gcc/ -mcpu=i386 0x0 /usr/src/gcc-trunk/obj/gcc/xgcc -B /usr/src/gcc-trunk/obj/gcc/ -mcpu=i586 0x0 /usr/src/gcc-trunk/obj/gcc/xgcc -B /usr/src/gcc-trunk/obj/gcc/ -mcpu=i686 0x0 Reading specs from /usr/src/gcc-2.95.2/obj/gcc/specs gcc version 2.95.2 19991024 (release) /usr/src/gcc-2.95.2/obj/gcc/xgcc -B /usr/src/gcc-2.95.2/obj/gcc/ -mcpu=i386 0x8480 /usr/src/gcc-2.95.2/obj/gcc/xgcc -B /usr/src/gcc-2.95.2/obj/gcc/ -mcpu=i586 0x8480 /usr/src/gcc-2.95.2/obj/gcc/xgcc -B /usr/src/gcc-2.95.2/obj/gcc/ -mcpu=i686 0x0 so the reason why it did not show up in the gcc you picked up from ftp.gnu.org is that you have compiled it so that it defaults to -mcpu=i686 where the bug does not show up. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: initdata for modules?
On Mon, Nov 27, 2000 at 09:54:57AM +1100, Keith Owens wrote: > On Sun, 26 Nov 2000 07:30:44 -0800, > "Adam J. Richter" <[EMAIL PROTECTED]> wrote: > > In reading include/linux/init.h, I was surprised to discover > >that __init{,data} expands to nothing when compiling a module. > >I was wondering if anyone is contemplating adding support for > >__init{,data} in module loading, to reduce the memory footprints > >of modules after they have been loaded. > > It has been discussed a few times but nothing was ever done about it. Well, I've actually implemented it few years ago and even current modutils you maintain support that already (see runsize member of struct module and how is it assigned). __init stuff was not stored in a separate page and was initially vmalloced together with the whole module, the only vm addition was a shrink for a vmalloc area where it would free some pages from the end of the area. It lived in sparclinux-cvs for quite some time, but Linus have not accepted it (I've posted several times). I can dig the patch out of sparclinux CVS if anyone is interested. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] modutils 2.3.20 and beyond
On Mon, Nov 27, 2000 at 05:48:28PM +0100, Jes Sorensen wrote: > > "Keith" == Keith Owens <[EMAIL PROTECTED]> writes: > > Keith> On Sun, 26 Nov 2000 16:36:55 -0700, "Jeff V. Merkey" > Keith> <[EMAIL PROTECTED]> wrote: > >> Keith, > >> > >> Please consider the attached patch for inclusion in all future > >> versions of the modutils depmod program for compatiblity with > >> RedHat and RedHat derived Linux distributions. > > Keith> I have a big problem with Redhat. They make incompatible > Keith> changes to utilities, do not feed patches back to maintainers > Keith> then expect the rest of the world to follow their lead. The -i > Keith> and -m flags to modutils are not the only example, I recently > Keith> found IA64 and Sparc patches they had added to modutils code > Keith> and not bothered to tell me. Other distributors are much > Keith> better about sending me patches, Debian and SuSe in particular > Keith> do the right thing. > > I don't remember where the ia64 modutils patches come from, there were > some floating around between the ia64 developers for a while. The > sparc patches I don't have a clue about where come from. The sparc patches were not sent just because of lack of time on my part, Jeff Johnson wrote it so that modules compiled with sparc64 gcc 2.96 (basically anything which generates OLO10 relocations) can be inserted and I wanted to review/test it first myself (and did not get to it early enough). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Compiler warnings
On Wed, Sep 06, 2000 at 10:05:46PM +0200, [EMAIL PROTECTED] wrote: > > I'm trying to compile 2.2.17 with gcc 2.96, and it shows a lot of > warnings like this in several files. First of all, you should not use gcc 2.96 for 2.2.x kernel compiles, only 2.4 should work. > warning: pasting would not give a valid preprocessing token I've fixed this recently. Some of these warnings were actually valid, I'll post a kernel patch for these soon, but most of them were bogus warnings when kernel was using GNU , ## restargs extension. > > And fails to compile with the error: > checksum.S:231: badly punctuated parameter list in #define One cannot preprocess with -traditional and use macros with variable arguments in gcc 2.96. 2.4 does not use -traditional for this file. > > It's the update to gcc2.96 causing this problems?? How can i get to > compile the kernel? If you're using recent Red Hat distributions, use kgcc compiler instead of gcc to compile the kernel. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Compiler warnings
On Thu, Sep 07, 2000 at 08:53:29AM +1100, Keith Owens wrote: > On Wed, 6 Sep 2000 21:49:44 +0100 (BST), > Alan Cox <[EMAIL PROTECTED]> wrote: > >Use a different gcc. There are reasons people shipping 2.96 for intel x86 also > >include egcs. The kernel isnt ready for 2.96 > > Out of curiousity, which compiler would you recommend for IA64 kernels? > The latest unwind code is in the bleeding edge version of gcc, which > just happens to have the problems with '##' as well. Obviously 2.96. I'm using it for 2.4 x86 and sparc64 kernels as well. We were talking about 2.2.17 though, and I don't think 2.2 kernels work on ia64... If you want the '##' fix, grab http://gcc.gnu.org/ml/gcc-patches/2000-09/msg6.html Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: files bigger than 2 GB
On Tue, Sep 12, 2000 at 03:12:34PM +0100, Alan Cox wrote: > > I need support for files larger than 2GB. What's the status for that ? > > 2.2 + patches or 2.4 test and glibc 2.1.9x And make sure the utilities you want to work with those 2GB+ files were compiled with -D_FILE_OFFSET_BITS=64 (check e.g. with nm -uD /your/binary | grep 64\$ ). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.2.17 with RedHat 7 Problem !
On Mon, Oct 23, 2000 at 12:06:31PM +, David Wragg wrote: > Gregory Maxwell <[EMAIL PROTECTED]> writes: > > If 2.96 is broken, I'd appreciate it if you would describe the breakage. > > As in the RedHat 2.96? Try compiling the following on RedHat 7.0 x86 > with "gcc -O2" and take a look at the generated code. Nice, isn't it? > > > #include > > void foo(void) > { > struct itimerval iv; > > iv.it_interval.tv_sec = 0; > iv.it_interval.tv_usec = 25; > iv.it_value = iv.it_interval; > > setitimer(ITIMER_REAL, &iv, NULL); > } Yes, this is a bug in the compiler (which I hope to fix today, CVS gcc is broken as well), though the actual place which causes this to be miscompiled is in the system headers where a restrict keyword is used on an incomplete struct timeval forward definitions pointer and due to bug is set in the type structure itself (at least that's my guess, need to run it under debugger today - but if the select prototype is moved after the full struct timeval definition, everything works correctly). Note that gcc 2.95.2 has some restrict keyword related bugs as well (which glibc had to work around in the headers; the bug was in 2.95.x only), it is not just 2.96. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test10-pre6: Use of abs()
On Mon, Oct 30, 2000 at 03:01:16PM +0100, Martin Dalecki wrote: > Horst von Brand wrote: > > > > Red Hat 7.0, i686, gcc-20001027 (Yes, I know. Just to flush out bugs on > > both sides). > > > > abs() is used at least in: > > > > arch/i386/kernel/time.c > > drivers/md/raid1.c > > drivers/sound/sb_ess.c > > > > gcc warns about use of a non-declared function each time. > > > > No definition for the function is to be found (grep over all include/ comes > > up clean, except for extern definitions in asm-{mips,ppc}; ditto for lib/). > > Presumably gcc is using a builtin (it doesn't show up in System.map). Is > > this the desired state of affairs? Should a include/linux/stdlib.h be > > Yes abs will be transformed into an internal function, which will be > fully > unrolled due to -O2. No matter what it should be prototyped in some header. And all uses should be checked, because abs is int abs (int) __attribute__ ((__const__)); and sometimes people use it on `long' instead (such a bug has been fixed in the kernel some months ago). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Recommended compiler? - Re: [patch] kernel/module.c (plus gratuitous rant)
On Mon, Oct 30, 2000 at 05:50:07PM -0300, Horst von Brand wrote: > Martin Dalecki <[EMAIL PROTECTED]> said: > > Peter Samuelson wrote: > > [...] > > > > * Red Hat "2.96" or CVS 2.97 will probably break any known kernel. > > > Works fine for me and 2.4.0-test10-pre5... however there are tons of > > preprocessor warnings in some drivers. > > CVS (from 20001028 or so) gave a 2.4.0.10.6/i686 that crashed on boot, no > time to dig deeper yet. CVS 2.97 is known to miscompile e.g. buffer.c. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: non-gcc linux?
On Sun, Nov 05, 2000 at 01:52:24PM -0700, Tim Riker wrote: > Alan, > > Perhaps I did not explain myself, or perhaps I misunderstand your > comments. I was responding to a comment that we could just copy some of > the optimizations from Pro64 over into gcc. That's hard to do, because the whole gcc has copyright assigned to FSF, which means that either gcc steering committee would have to make an exception from this for SGI, or SGI would have to be willing to assign some code to FSF. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: State of Posix compliance in v2.2/v2.4 kernel?
On Mon, Nov 13, 2000 at 11:00:09AM -0500, Jeff Garzik wrote: > [EMAIL PROTECTED] wrote: > > Sorry if this is a FAQ, but I've searched the archives for this list > > (http://www.uwsg.iu.edu/hypermail/linux/kernel/) and only come with references > > from 1996! > > > > What is the state of Posix-compliant services (threads, semaphores, timers, > > etc.) in the current (v2.2/v2.4) Linux kernels? > > IMHO this is a question better asked of glibc people, not kernel people. > > The kernel does its best to facilitate POSIX compliances, Well, it does not do its best. There are several areas where kernel should help, things like POSIX semaphores would be much faster with kernel support, likewise threads if some things Ulrich stated here a couple of months ago were done in the kernel, POSIX message queue passing is not doable in userland without kernel help either (I have a message queue filesystem kernel patch for this, but it is a 2.5 thing). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, Dec 14, 2000 at 04:42:03AM -0800, Clayton Weaver wrote: > There has a been a thread on the teTeX mailing list the last few days > about a (RedHat, but probably more general than just their rpms) > gcc-2.9.6 w/glibc-2.2.x bug. At -O2, it can miscompile > > unsigned varname; /* "unsigned int varname;" is ok */ > > (no problem at -O or no optimization at all, and doesn't happen if teTeX > is compiled with kgcc). That one is fixed already for some time, it was a bug in loop unrolling (that patch is still pending review for the mainline CVS though). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, Dec 14, 2000 at 11:11:28AM -0800, Linus Torvalds wrote: > user applications and (b) gcc-2.96 is so broken that it requires special > libraries for C++ vtable chunks handling that is different, so the > _working_ gcc can only be used with programs that do not need such > library support. Every major g++ release had incompatible libstdc++, even g++ 2.95.2 if bootstrapped under glibc 2.1.x is binary incompatible with g++ 2.95.2 bootstrapped under glibc 2.2.x (libstdc++ uses different soname then; even if we used g++ 2.95.2 we would not have C++ binary compatible with other distributions). This will change once 3.0 is out, but it will still take some time. > compiler to something that works better RSN. It apparently has problems > compiling stuff like the CVS snapshots of X etc too (and obviously, > anything you compile under gcc-2.96 is not likely to work anywhere else > except with the broken libraries). Can you point to things in X which were actually miscompiled because of bugs in gcc 2.96? So far I was aware about X bugs (already fixed in X CVS) which were triggered with -fstrict-aliasing which is now the default while gcc 2.95.2 had -fstrict-aliasing disabled by default. That is not to say there were not bugs in the gcc we shipped, but the bugs which were reported against it have been fixed already. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: i386: gcc & asm(): wrong constraint for "mull"
On Fri, Dec 29, 2000 at 10:54:38AM +0100, Ulrich Windl wrote: > Hello, > > I noticed (with some inspiration from Andy Kleen) that some asm() > instructions for the ia32 use the "g" constraint for "mull", where my > Intel 386 Assembly Language Manual suggests the "MUL" instruction needs > an r/m operand. So I guess the correct constraint is "rm" in gcc, and > not "g". That change identical assembly output for gcc-2.95.2, but some > gcc-2.96.x will try a multiplication with an immediate (constant) > operand for the "g" constarint, and the as will choke on that. > (Redhat 7.0 ships such a version of gcc). gcc 2.95.2 md.texi sais: @cindex @samp{g} in constraint @item @samp{g} Any register, memory or immediate integer operand is allowed, except for registers that are not general registers. (2.95.2 was chosen to make it clear it is not something new in gcc). That means gcc is really free to choose which of register, memory or immediate it puts in and the fact that some gcc version choose one and others choose other is perfectly correct. Fix the constraints and be happy (at least during the upcoming millenium) :) Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: ["Michael N. Lipp" ] Can't compile linus 2.2.17 with latest gcc due to checksum.S
On Tue, Sep 26, 2000 at 08:20:49AM +0200, Michael N. Lipp wrote: > Hi, > > I can't compile the latest linux kernel with the latest gcc due to a > strange define in checksum.S. The gcc preprocessor complains about > the usage of elipses in the macros > > #define SRC(y...) \ > : y;\ > .section __ex_table, "a"; \ > .long b, 6001f ; \ > .previous > > #define DST(y...) \ > : y;\ > .section __ex_table, "a"; \ > .long b, 6002f ; \ > .previous > > And I do agree, they look very strange. I tried adding comma > (#define SRC(y,...)) as this is what it should look like, but then > I get errors for the usage lines (SRC(1:movw (%esi), %bx)) and > again I understand the preprocessor very well. > > As egcs and gcc have re-merged and thus the latest gcc is really > the next egcs, I consider this a real problem. You should not compile 2.2.x kernels with latest gcc, use egcs-1.1.2 for it. Nobody has actually tested if 2.2.x kernels work with gcc 2.96, so even if you get over this (hint - remove -traditional from checksum.S's gcc options), you might be surprised by other things. The -traditional preprocessor in current gcc is really K&R, so stuff like GNU restargs extensions are not present there. If you're on Red Hat Linux 7, use kgcc compiler instead of gcc to build the kernel, otherwise check out your distribution to see where egcs (or gcc 2.95) lives. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Can't compile linus 2.2.17 with latest gcc due to checksum.S
On Tue, Sep 26, 2000 at 05:42:59PM +0200, Mads Martin Joergensen wrote: > * Timur Tabi <[EMAIL PROTECTED]> [Sep 26. 2000 17:36]: > > > Maybe this can be fixed for 2.96, but it breaks badly elsewhere (doesn't > > > compile; kernel builds but hangs/crashes at boot; kernel appears to work > > > fine while it is busy eating your disk; ...) > > > > Why is 2.96 so screwed up? I mean, the version numbers imply that 2.96 is a > > minor bugfix over 2.95, but your comments make it sound like it's a major > > change. > > Maybe because gcc 2.96 have not been released yet, and therefore not is > bugfree yet? Have you actually seen a bugfree compiler? I don't expect to ever seen any. Anyway, this is more about 2.2 kernels relying on certain things which 2.96 might no longer guarantee because of some optimizations. E.g. 2.96 compiled 2.4.0-testx kernels work pretty well on ia32, sparc64, alpha and ia64 AFAIK. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: warning message posted from apic.h
On Fri, Sep 29, 2000 at 11:09:33AM +, Stephen Torri wrote: > I get the following message compiling 2.4.0-test6 or test8 on a RedHat 7 > system. "/usr/src/linux/include/asm/apic.h:13:29: warning: nothing can be > posted after this token". Is this an issue with apic? Yes, this one is apic.h bug which RHL 7 cpp warns about: --- linux/include/asm-i386/apic.h.jj Mon Oct 2 20:01:18 2000 +++ linux/include/asm-i386/apic.h Tue Oct 3 23:50:33 2000 @@ -10,7 +10,7 @@ #ifdef CONFIG_X86_LOCAL_APIC #if APIC_DEBUG -#define Dprintk(x...) printk(##x) +#define Dprintk(x...) printk(x) #else #define Dprintk(x...) #endif Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test9-pre8 on SPARC build failure
On Tue, Oct 03, 2000 at 10:41:57PM -0700, Dr. Kelsey Hudson wrote: > > Question is, is this still broken on -test9-final or did > > the fix Linus merged earlier today get rid of your problems? > > Let me try this and find out... ... > making dep... > > ::curses his SS20 for being so SLOW!:: > I need better than a 50MHz processor in this damn thing. :) Better yet, I > need a better machine! :) Got any donations? Just kidding. > > ...Ok...Making boot... > > Damn. A good 2 hours later and it looks as though the compile exited > cleanly :) yaaay! > > The answer to your question is yes, the fix Linus put in today fixed the > problem :) This does tell nothing if the pcibios thing is fixed or not, because you most probably did not configure PCI on your sparc32 (why would you do that, when you don't have a JavaStation?). So you have to either look at the code or configure PCI in... Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why does everyone hate gcc 2.95?
On Tue, Oct 03, 2000 at 11:12:24PM -0700, [EMAIL PROTECTED] wrote: > No, better yet, > what is a good version to use when porting to a new processor (actually > an old processor)? I've pulled the source to gcc (2.95.2) and binutils > (2.10) in prep for a port to a new/old machine. If these versions aren't > good to start from, what versions are and where can I find them? Those versions surely are not good to start from for doing new ports. There is almost 2 years of development gone since 2.95 was frozen and many things have changed, so if you start with 2.95.2, you'll have a hard time forward porting it to gcc 3. With binutils it probably does not matter much, but it could be easier to use CVS as well. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Updated 2.4 TODO List -- new addition WAS(test9 PCI resourcecollisions (fwd)
On Tue, Oct 10, 2000 at 11:32:43PM -0500, Gnea wrote: > > On Tue, 10 Oct 2000 19:56:46 -0400 (EDT), jamal blurted forth: > > > > > Ted, > > > > Please add this to your list. Linux is unusable in these machines. > > I have cc'ed Martin and Linus because they play in that PCI area. > > erm, looking at your list it says that you're using Redhat 7.0, which > is known to ship with a buggy gcc, which is KNOWN to do nasty things > with kernels. Can you tell me (when it is KNOWN) what nasty things does that gcc do to kernels? The thing that it does not compile vanilla 2.2.x kernels is not its fault, and if you choose to either use K&R preprocessing in assembly (but then no GNU extensions) or ANSI preprocessing plus you export memset/memcpy, it will actually build and work, see H.J.'s patchlets: http://www.lucon.org/linux/linux-2.2.14-gcc.patch http://www.lucon.org/linux/linux-2.2.17-library.patch The fact that we recommend using kgcc (especially for 2.2 kernels) does not mean that the default gcc is broken, but simply that using it for kernels has not been tested yet too much and there can be e.g. bugs in the way kernel uses inline assembly and the likes. > > Linux version 2.4.0-test9-JHS1 ([EMAIL PROTECTED]) (gcc > version 2.96 2 > 731 (Red Hat Linux 7.0)) #2 Thu Oct 5 11:59:31 EDT 2000 > > yeah, that pretty much sums it up right there.. you may want to try > something else. See above, it does not sum up anything. The only thing is that if somebody is reporting a bug on lkml, he'd just better first made sure it is reproduceable with kgcc as well (bug reports for kernels compiled with gcc 2.95 have been handled this way for a long time on lkml). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: TODO: drivers/pcmcia/ds.c: ds_read & ds_write. SMP locks are missing fix
On Thu, Oct 12, 2000 at 11:38:11AM -0400, Yong Chi wrote: > Hopefully this will do for SMP locks. =) Holding a spinlock for this long (especially when you might sleep there in two places (interruptible_sleep_on, put_user)) is basically a bad idea. spinlocks are designed to be holded only for short time. Either protect just a small critical section with a spinlock, or use semaphores. > --- ds.c.bak Wed Oct 11 13:05:16 2000 > +++ ds.c Thu Oct 12 11:25:20 2000 > @@ -95,6 +95,7 @@ > u_intuser_magic; > int event_head, event_tail; > event_t event[MAX_EVENTS]; > +spinlock_t lock; > struct user_info_t *next; > } user_info_t; > > @@ -567,6 +568,7 @@ > user->event_tail = user->event_head = 0; > user->next = s->user; > user->user_magic = USER_MAGIC; > +spin_lock_init(&user->lock); > s->user = user; > file->private_data = user; > > @@ -616,6 +618,7 @@ > socket_t i = MINOR(file->f_dentry->d_inode->i_rdev); > socket_info_t *s; > user_info_t *user; > +ssize_t retval=4; > > DEBUG(2, "ds_read(socket %d)\n", i); > > @@ -625,16 +628,23 @@ > return -EINVAL; > s = &socket_table[i]; > user = file->private_data; > -if (CHECK_USER(user)) > - return -EIO; > - > +spin_lock(&user->lock); > +if (CHECK_USER(user)) { > + retval= -EIO; > +goto read_out; > +} > + > if (queue_empty(user)) { > interruptible_sleep_on(&s->queue); > if (signal_pending(current)) > - return -EINTR; > + retval= -EINTR; > +goto read_out; > } > put_user(get_queued_event(user), (int *)buf); > -return 4; > + > +read_out: > +spin_unlock(&user->lock); > +return retval; > } /* ds_read */ > > /**/ > @@ -645,6 +655,7 @@ > socket_t i = MINOR(file->f_dentry->d_inode->i_rdev); > socket_info_t *s; > user_info_t *user; > +ssize_t retval=4; > > DEBUG(2, "ds_write(socket %d)\n", i); > > @@ -656,18 +667,25 @@ > return -EBADF; > s = &socket_table[i]; > user = file->private_data; > -if (CHECK_USER(user)) > - return -EIO; > +spin_lock(&user->lock); > +if (CHECK_USER(user)) { > + retval= -EIO; > + goto write_out; > +} > > if (s->req_pending) { > s->req_pending--; > get_user(s->req_result, (int *)buf); > if ((s->req_result != 0) || (s->req_pending == 0)) > wake_up_interruptible(&s->request); > -} else > - return -EIO; > +} else { > + retval= -EIO; > + goto write_out; > +} > > -return 4; > +write_out: > +spin_unlock(&user->lock); > +return retval; > } /* ds_write */ > > /**/ Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Updated Linux 2.4 Status/TODO List (from the ALS show)
On Fri, Oct 13, 2000 at 02:17:23PM -0700, Richard Henderson wrote: > On Fri, Oct 13, 2000 at 12:45:47PM +0100, Alan Cox wrote: > > Can we always be sure the rss will fit in an atomic_t - is it > 32bits on the > > ultrsparc/alpha ? > > It is not. It is not even 32bit on sparc32 (24bit only). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: pthreads & fork & execve
On Mon, Apr 02, 2001 at 09:54:25AM -0300, Gustavo Niemeyer wrote: > Hi Richard! Hi Dennis! > > > I tracked this down to a corrupt jumptable somewhere in the pthreads > > part of the libc (didnt have the source handy at that time, though). So > > I think this is a libc bug (version does not matter) - I even did a > > followup to a similar bug in the libc gnats database (I think I should > > have opened a new one, though...). But I failed to construct a "simple" > > testcase showing the bug (We use rather large amount of threads and > > in one or two doing popen() calls - or handcrafted fork() && execv(), > > the SIGSEGV is during fork()). > > We're going trough two similar problems here. One is KDE, and the other > is Linuxconf. Linuxconf is core dumping on a module when it is linked > with pthread and dlopen()'ed with RTLD_GLOBAL. We must reduce one of > them to a testcase. By any chance, are you dlopening a DSO linked against -lpthread from program not linked against -lpthread? Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [reiserfs-list] Re: ReiserFS Oops (2.4.1, deterministic, symlink
On Sat, Feb 03, 2001 at 12:40:03AM +0100, J . A . Magallon wrote: > Please, do not do so. That depends on the PACKAGE name and version, and there > is no standard way of versioning a patched gcc. > The -54 is a RH'ism, for example Mandrake Cooker includes patches from > different sources, and gcc is versioned like You can do: if [ "$CC" = gcc ]; then echo 'inline void f(unsigned int n){int i,j=-1;for(i=0;i<10&&j<0;i++)if((1UL< test.c gcc -O2 -o test test.c if ./test; then echo "*** Please don't use this compiler to compile kernel"; fi rm -f test.c test fi (the $CC = gcc test is there e.g. so that the test is not done when cross-compiling or when there is a separate kernel compiler and userland compiler (e.g. on sparc64). This test will barf on gcc-2.96 up to -67 and on 2.97 until end of November or so). Similarly a testcase for the reload bug which caused in 2.95.2 miscompilation of some long long stuff in the kernel could be added as well if you want to go that way. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [reiserfs-list] Re: ReiserFS Oops (2.4.1, deterministic, symlink
On Sat, Feb 03, 2001 at 04:25:20AM +, Paul Jakma wrote: > On Fri, 2 Feb 2001, Jakub Jelinek wrote: > > > You can do: > > if [ "$CC" = gcc ]; then > > echo 'inline void f(unsigned int n){int >i,j=-1;for(i=0;i<10&&j<0;i++)if((1UL< > test.c > > gcc -O2 -o test test.c > > if ./test; then echo "*** Please don't use this compiler to compile kernel"; fi > > rm -f test.c test > > fi > > > > (the $CC = gcc test is there e.g. so that the test is not done when > > cross-compiling or when there is a separate kernel compiler and userland > > compiler (e.g. on sparc64). This test will barf on gcc-2.96 up to -67 and > > > > Jakub > > ehhmm.. > > [root@fogarty /tmp]# rpm -q gcc > gcc-2.96-70 > [root@fogarty /tmp]# cat test.c > inline void f(unsigned int n){int > i,j=-1;for(i=0;i<10&&j<0;i++)if((1UL< exit(1);} > [root@fogarty /tmp]# gcc -o test test.c > [root@fogarty /tmp]# ./test > > didn't barf here with 2.96-70. I used a wrong word (the test originally had abort() instead of exit(0) and exit(0) instead of exit(1)). The test will exit with 0 if it was miscompiled, 1 if it was not. And on 2.96-70 it should exit with 1 as it should not be miscompiled. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] Re: PCI-SCI Drivers v1.1-7 released
On Wed, Feb 07, 2001 at 11:08:52AM -0700, Jeff V. Merkey wrote: > Not supporting #ident for CVS managed code bases would see to > me, at first glance, to be a show stopper to shipping a release > of anything, since many folks need CVS support. Could you please explain what you mean by not supporting #ident? It works just fine for me in all our gcc packages I've checked. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Thread_Id
On Thu, Jul 14, 2005 at 02:25:43PM +0200, Arjan van de Ven wrote: > pure luck. NPTL threading uses it to store a pointer to per thread info > structure; other threading (linuxthreads) may have stored a pid there to > identify the internal thread. nptl is 2.6 only so you might have > switched implementation of threading when you switched kernels. Actually, in linuxthreads what pthread_self () returned has the first slot in its internal threads array (up to max number of supported threads) that was unused at thread creation time in the low order bits and sequence number of thread creation in its high order bits. So unless you are using yet another threading library (I thought NGPT is dead for years...), the claim that you get the same numbers from gettid() syscall under NPTL as pthread_self () gives you under LinuxThreads is simply not true. And you certainly shouldn't be using gettid () syscall in NPTL, as it is just an implementation detail that there is a 1:1 mapping between NPTL threads and kernel threads. It can change at any time. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ipc
On Thu, Jul 07, 2005 at 02:13:02PM +0200, Paolo Ornati wrote: > You need to tell GCC to use "libmqueue"... something like this: > > gcc -Wall -O2 -o prog prog.c -lmqueue If you have glibc 2.3.4 or later, you should use -lrt instead. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Realtime Preemption, 2.6.12, Beginners Guide?
On Fri, Jul 08, 2005 at 06:42:53PM +0100, Alistair John Strachan wrote: > > btw., which gcc version are you using? > > Not the GCC version known to bloat stacks ;-) > > 3.4.4, on both my machines. I'm not touching 4.x until 4.0.1 is released with > the miscompiled-code fixes. GCC 4.0.x bloats stacks less than 3.4.4. And, if you are looking for 4.0.1, it has been released yesterday. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux AIO status & todo
On Tue, Aug 23, 2005 at 01:14:38PM +0530, Suparna Bhattacharya wrote: > 2. No support for propagating IO completion events to user space > threads using RT signals. User threads need to poll the completion > queue using io_getevents. POSIX specifies that when an AIO > request completes, a signal can be delivered to the application > to indicate the completion of the IO. POSIX AIO needs to handle SIGEV_NONE, SIGEV_SIGNAL and SIGEV_THREAD notification. Obviously kernel shouldn't create threads for SIGEV_THREAD itself, as kernel shouldn't hardcode all the implementation details how a thread can be created. But it would be good if AIO signalling e.g. handled both SIGEV_SIGNAL and SIGEV_SIGNAL | SIGEV_THREAD_ID, with the same usage as e.g. timer_* syscalls. If kernel makes sure SI_ASYNCIO si_code is set in the notification signal siginfos, glibc could even use just one helper thread for timer_*/[al]io_* and maybe in the future other SIGEV_THREAD notification. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] FUTEX_WAKE_OP (pthread_cond_signal speedup)
Hi! ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter (which at least on UP usually means an immediate context switch to one of the waiter threads). This waiter wakes up and after a few instructions it attempts to acquire the cv internal lock, but that lock is still held by the thread calling pthread_cond_signal. So it goes to sleep and eventually the signalling thread is scheduled in, unlocks the internal lock and wakes the waiter again. Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal to avoid this performance issue, but it was removed when locks were redesigned to the 3 state scheme (unlocked, locked uncontended, locked contended). Following scenario shows why simply using FUTEX_REQUEUE in pthread_cond_signal together with using lll_mutex_unlock_force in place of lll_mutex_unlock is not enough and probably why it has been disabled at that time: The number is value in cv->__data.__lock. thr1thr2thr3 0 pthread_cond_wait 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) 0 lll_futex_wait (&cv->__data.__futex, futexval) 0 pthread_cond_signal 1 lll_mutex_lock (cv->__data.__lock) 1 pthread_cond_signal 2 lll_mutex_lock (cv->__data.__lock) 2 lll_futex_wait (&cv->__data.__lock, 2) 2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock) # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE 2 lll_mutex_unlock_force (cv->__data.__lock) 0 cv->__data.__lock = 0 0 lll_futex_wake (&cv->__data.__lock, 1) 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) # Here, lll_mutex_unlock doesn't know there are threads waiting # on the internal cv's lock Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal, but it will cost us not one, but 2 extra syscalls and, what's worse, one of these extra syscalls will be done for every single waiting loop in pthread_cond_*wait. We would need to use lll_mutex_unlock_force in pthread_cond_signal after requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait. Another alternative is to do the unlocking pthread_cond_signal needs to do (the lock can't be unlocked before lll_futex_wake, as that is racy) in the kernel. I have implemented both variants, futex-requeue-glibc.patch is the first one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel. The kernel interface allows userland to specify how exactly an unlocking operation should look like (some atomic arithmetic operation with optional constant argument and comparison of the previous futex value with another constant). It has been implemented just for ppc*, x86_64 and i?86, for other architectures I'm including just a stub header which can be used as a starting point by maintainers to write support for their arches and ATM will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running 32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL. With the following benchmark on UP x86-64 I get: for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \ for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done time elf/ld.so --library-path .:nptl-orig /tmp/bench real 0m0.655s user 0m0.253s sys 0m0.403s real 0m0.657s user 0m0.269s sys 0m0.388s time elf/ld.so --library-path .:nptl-requeue /tmp/bench real 0m0.496s user 0m0.225s sys 0m0.271s real 0m0.531s user 0m0.242s sys 0m0.288s time elf/ld.so --library-path .:nptl-wake_op /tmp/bench real 0m0.380s user 0m0.176s sys 0m0.204s real 0m0.382s user 0m0.175s sys 0m0.207s The benchmark is at: http://sourceware.org/ml/libc-alpha/2005-03/txt1.txt Older futex-requeue-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt2.txt Older futex-wake_op-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt3.txt Will post a new version (just x86-64 fixes so that the patch applies against pthread_cond_signal.S) to libc-hacker ml soon. Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded testcase that will not test the atomicity of the operation, but at least check if the threads that should have been woken up are woken up and whether the arithmetic operation in the kernel gave the expected results. Jakub --- linux-2.6.12/include/linux/futex.h.jj 2005-06-17 21:48:29.0 +0200 +++ linux-2.6.12/include/linux/futex.h 2005-08-23 11:11:41.0 +0200 @@ -4,14 +4,40 @@ /* Second argument to futex syscall */ -#define FUTEX_WAIT
Re: [PATCH] FUTEX_WAKE_OP (pthread_cond_signal speedup)
On Tue, Aug 23, 2005 at 10:36:08AM -0400, Ingo Molnar wrote: > a detail: many of the futex_atomic_op_inuser() seem to be duplicated > across architectures. Might be worth putting into asm-generic, to avoid > the duplication? Those are stub files waiting for arch maintainers to actually implement them, so they will be eventually different, but for the time being they just -ENOSYS, so that things compile. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MAX_ARG_PAGES has no effect?
On Wed, Aug 31, 2005 at 02:11:44PM +0200, Ingo Molnar wrote: > > I recompiled and installed the kernel, but there's no change (getconf > > ARG_MAX still gives 131072.) What am I missing? > > MAX_ARG_PAGES should work just fine. I think the 'getconf ARG_MAX' > output is hardcoded. (because the kernel does not provide the > information dynamically) Yeah, you get the value of ARG_MAX from that was compiled in when you compiled glibc. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [discuss] [2.6 patch] include/asm-x86_64 "extern inline" -> "static inline"
On Mon, Sep 05, 2005 at 08:00:05PM +0200, Adrian Bunk wrote: > It isn't the same, but "static inline" is the correct variant. > > "extern inline __attribute__((always_inline))" (which is what > "extern inline" is expanded to) doesn't make sense. It does make sense and is different from static inline __attribute__((always_inline)). Try: static inline __attribute__((always_inline)) void foo (void) {} void (*fn)(void) = foo; vs. extern inline __attribute__((always_inline)) void foo (void) {} void (*fn)(void) = foo; In the former case, GCC will emit the out of line static copy of foo if you take its address, in the latter case either you provide foo function by other means, or you get linker error. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Possible 2.6.24-rc7 issue w/respect to pthreads
On Wed, Jan 09, 2008 at 02:35:32AM -0800, [EMAIL PROTECTED] wrote: > After I patched my 2.6.23 kernel to 2.6.24-rc7 this morning, I noticed > some odd behavior with respect to POSIX threads in a test program I had > written (originally to test epoll.) > > The behavior is as follows: > > 1. main() creates a new thread of execution with pthread_create > 2. thread_func() immediately calls pthread_detach(), which is supposed to > ensure that thread resources are cleaned up when the thread terminates. > 3. The spawned thread sleeps and then prints a message "got here" > 4. The main thread calls pthread_join(). According to the POSIX > documentation, this should suspend execution until the spawned thread has > terminated. Your testcase is buggy. Detached threads aren't joinable, you can't call pthread_join on them. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fedora's latest gcc produces unbootable kernels
On Mon, Dec 03, 2007 at 09:17:22AM +0100, Thomas Gleixner wrote: > I looked at the disassembly but I can not spot the problem. > > I think the real problem is somewhere else. Likely candidates are > hrtimer_forward() or hrtimer_start() - in that order. Should be hopefully fixed in latest Fedora gcc. The problem was in code like typedef union { long long int s; } U; typedef struct { U u; } S; void foo (S *s, long long int x, unsigned long int y) { s->u = ({ (U) { .s = s->u.s + x * y }; }); } where a backport of a recent optimization of mine, without which gcc handles terribly initializers from compound literals (which is something hrtimer uses just everywhere - why can't ktime.h for #if BITS_PER_LONG == 64 || defined(CONFIG_KTIME_SCALAR) just use a scalar rather than union with a scalar in it??), sets the LHS object to the compound literal's initializer rather than forcing creation of a temporary object (the compound literal). Unfortunately the gimplifier had some bugs in case the initializer references (or at least might reference) parts of LHS object. Fixed by backporting 2 Ada bugfixes for the gimplifier from GCC trunk (Ada was hitting those bugs even without this compound literal optimization). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fedora's latest gcc produces unbootable kernels
On Mon, Dec 03, 2007 at 12:34:17PM +0100, Thomas Gleixner wrote: > Of course just to annoy you :) It doesn't matter whether I'm annoyed about this or not, but whether gcc is able to generate decent code with it or not. And especially with union it is not, at least through all the tree ssa passes. You already have a lot of the details hidden in ktime.h accessor inlines, so I don't think it would be hard to add further one or two. Anyway, even just using typedef struct ktime { s64 tv64; } ktime_t; could make things better in case you have just one field. Unlike unions, structs can be (and in this case most likely will be) scalarized by SRA, so half of tree SSA passes will see it as integral var and will be able to perform optimizations on it. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Futex queue_me/get_user ordering
On Thu, Nov 18, 2004 at 02:47:26PM -0500, Jakub Jelinek wrote: > The scenario described in futex_wait-fix.patch IMHO can happen even > if all calls to pthread_cond_signal are done with mutex held around it, i.e. > A B X Y > pthread_mutex_lock (&mtx); > pthread_cond_wait (&cv, &mtx); > - mtx release *) > total++ [1/0/0] (0) {} > pthread_mutex_lock (&mtx); > pthread_cond_signal (&cv); > - wake++ [1/1/0] (1) {} > FUTEX_WAKE, 1 (returns, nothing is queued) > pthread_mutex_unlock (&mtx); > pthread_mutex_lock (&mtx); > pthread_cond_wait (&cv, &mtx); > - mtx release *) > total++ [2/1/0] (1) {} > FUTEX_WAIT, 0 > queue_me [2/1/0] (1) {A} > 0 != 1 > FUTEX_WAIT, 1 > queue_me [2/1/0] (1) {A,B} > 1 == 1 > pthread_mutex_lock (&mtx); > pthread_cond_signal (&cv); > - wake++ [2/2/0] (2) {A,B} > FUTEX_WAKE, 1 (unqueues > incorrectly A) > [2/2/0] (2) {B} > pthread_mutex_unlock (&mtx); > try to dequeue but already dequeued > would normally return EWOULDBLOCK here > but as unqueue_me failed, returns 0 > woken++ [2/2/1] (2) {B} > schedule_timeout (forever) > - mtx reacquire > pthread_cond_wait returns > pthread_mutex_unlock (&mtx); > > --- > the code would like to say pthread_mutex_unlock (&mtx); > and pthread_exit here, but never reaches there. ... http://www.ussg.iu.edu/hypermail/linux/kernel/0411.2/0953.html Your argument in November was that you don't want to slow down the kernel and that userland must be able to cope with the non-atomicity of futex syscall. But with the recent changes to futex.c I think kernel can ensure atomicity for free. With get_futex_value_locked doing the user access in_atomic () and repeating if that failed, I think it would be just a matter of something as in the patch below (totally untested though). It would simplify requeue implementation (getting rid of the nqueued field), as well as never enqueue a futex in futex_wait until the *uaddr == val uaccess check has shown it should be enqueued. And I don't think the kernel will be any slower because of that, in the common case where get_futex_value_locked does not cause a mm fault (userland typically accessed that memory a few cycles before the syscall), the futex_wait change is just about doing first half of queue_me before the user access and second half after it. --- linux-2.6.11/kernel/futex.c.jj 2005-03-17 04:42:29.0 -0500 +++ linux-2.6.11/kernel/futex.c 2005-03-17 05:13:45.0 -0500 @@ -97,7 +97,6 @@ struct futex_q { */ struct futex_hash_bucket { spinlock_t lock; - unsigned intnqueued; struct list_head chain; }; @@ -265,7 +264,6 @@ static inline int get_futex_value_locked inc_preempt_count(); ret = __copy_from_user_inatomic(dest, from, sizeof(int)); dec_preempt_count(); - preempt_check_resched(); return ret ? -EFAULT : 0; } @@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u struct list_head *head1; struct futex_q *this, *next; int ret, drop_count = 0; - unsigned int nqueued; retry: down_read(¤t->mm->mmap_sem); @@ -354,23 +351,24 @@ static int futex_requeue(unsigned long u bh1 = hash_futex(&key1); bh2 = hash_futex(&key2); - nqueued = bh1->nqueued; + if (bh1 < bh2) + spin_lock(&bh1->lock); + spin_lock(&bh2->lock); + if (bh1 > bh2) + spin_lock(&bh1->lock); + if (likely(valp != NULL)) { int curval; - /* In order to avoid doing get_user while - holding bh1->lock and bh2->lock, nqueued - (monotonically increasing field) must be first - read, then *uaddr1 fetched from userland and - after acquiring lock nqueued field compared with - the stored value. The smp_mb () below - makes sure that bh1->nqueued is read from memory - before *uaddr1. */ - smp_mb(); - ret = get_futex_value_locked(&curval, (int __us
Re: Futex queue_me/get_user ordering
On Thu, Mar 17, 2005 at 03:20:31PM +, Jamie Lokier wrote: > If you change futex_wait to be "atomic", and then have userspace locks > which _depend_ on that atomicity, it becomes impossible to wait on > multiple of those locks, or make poll-driven state machines which can > wait on those locks. The futex man pages that have been around for years (certainly since mid 2002) certainly don't document FUTEX_WAIT as token passing operation, but as atomic operation: Say http://www.icewalkers.com/Linux/ManPages/futex-2.html FUTEX_WAIT This operation atomically verifies that the futex address still contains the value given, and sleeps awaiting FUTEX_WAKE on this futex address. If the timeout argument is non-NULL, its contents describe the maximum duration of the wait, which is infinite otherwise. For futex(4), this call is executed if decrementing the count gave a negative value (indi cating contention), and will sleep until another process releases the futex and executes the FUTEX_WAKE operation. RETURN VALUE FUTEX_WAIT Returns 0 if the process was woken by a FUTEX_WAKE call. In case of timeout, ETIMEDOUT is returned. If the futex was not equal to the expected value, the operation returns EWOULDBLOCK. Signals (or other spurious wakeups) cause FUTEX_WAIT to return EINTR. so there very well might be programs other than glibc that depend on this behaviour. Given that in most cases the race is not hit every day (after all, we have been living with it for several years), they probably wouldn't know there is a problem like that. > You can do userspace threading and simulate most blocking system calls > by making them non-blocking and using poll). Sure, but then you need to write your own locking as well and can just use the token passing property of futexes there. > It's not a _huge_ loss, but considering it's only Glibc which is > demanding this and futexes have another property, token-passing, which > Glibc could be using instead - why not use it? Because that requires requeue being done with the cv lock held, which means an extra context switch. > > @@ -265,7 +264,6 @@ static inline int get_futex_value_locked > > inc_preempt_count(); > > ret = __copy_from_user_inatomic(dest, from, sizeof(int)); > > dec_preempt_count(); > > - preempt_check_resched(); > > > > return ret ? -EFAULT : 0; > > } > > inc_preempt_count() and dec_preempt_count() aren't needed, as > preemption is disabled by the queue spinlocks. So > get_futex_value_locked isn't needed any more: with the spinlocks held, > __get_user will do. They aren't needed if CONFIG_PREEMPT. But with !CONFIG_PREEMPT, they are IMHO still needed, as spin_lock/spin_unlock call preempt_{disable,enable}, which is a nop if !CONFIG_PREEMPT. __get_user can't be used though, it should be __get_user_inatomic (or __copy_from_user_inatomic if the former doesn't exist). > > [numerous instances of...] > > + preempt_check_resched(); > > Not required. The spin unlocks will do this. True, preempt_check_resched() is a nop if !CONFIG_PREEMPT and for CONFIG_PREEMPT spin_unlock will handle it. Will remove them from the patch. > > But with the recent changes to futex.c I think kernel can ensure > > atomicity for free. > > I agree it would probably not slow the kernel, but I would _strongly_ > prefer that Glibc were fixed to use the token-passing property, if > Glibc is the driving intention behind this patch - instead of this > becoming a semantic that application-level users of futex (like > database and IPC libraries) come to depend on and which can't be > decomposed into a multiple-waiting form. > > (I admit that the kernel code does look nicer with > get_futex_value_locked gone, though). > > By the way, do you know of Scott Snyder's recent work on fixing Glibc > in this way? He bumped into one of Glibc's currently broken corner > cases, fixed it (according to the algorithm I gave in November), and > reported that it works fine with the fix. I certainly haven't seen his patch. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Futex queue_me/get_user ordering
On Thu, Mar 17, 2005 at 03:20:31PM +, Jamie Lokier wrote: > > [numerous instances of...] > > + preempt_check_resched(); > > Not required. The spin unlocks will do this. Here is updated patch with those removed (all of them are preceeded by spin_unlock) and out_unqueue label and following unused code removed too. --- linux-2.6.11/kernel/futex.c.jj 2005-03-17 04:42:29.0 -0500 +++ linux-2.6.11/kernel/futex.c 2005-03-18 05:45:29.0 -0500 @@ -97,7 +97,6 @@ struct futex_q { */ struct futex_hash_bucket { spinlock_t lock; - unsigned intnqueued; struct list_head chain; }; @@ -265,7 +264,6 @@ static inline int get_futex_value_locked inc_preempt_count(); ret = __copy_from_user_inatomic(dest, from, sizeof(int)); dec_preempt_count(); - preempt_check_resched(); return ret ? -EFAULT : 0; } @@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u struct list_head *head1; struct futex_q *this, *next; int ret, drop_count = 0; - unsigned int nqueued; retry: down_read(¤t->mm->mmap_sem); @@ -354,23 +351,22 @@ static int futex_requeue(unsigned long u bh1 = hash_futex(&key1); bh2 = hash_futex(&key2); - nqueued = bh1->nqueued; + if (bh1 < bh2) + spin_lock(&bh1->lock); + spin_lock(&bh2->lock); + if (bh1 > bh2) + spin_lock(&bh1->lock); + if (likely(valp != NULL)) { int curval; - /* In order to avoid doing get_user while - holding bh1->lock and bh2->lock, nqueued - (monotonically increasing field) must be first - read, then *uaddr1 fetched from userland and - after acquiring lock nqueued field compared with - the stored value. The smp_mb () below - makes sure that bh1->nqueued is read from memory - before *uaddr1. */ - smp_mb(); - ret = get_futex_value_locked(&curval, (int __user *)uaddr1); if (unlikely(ret)) { + spin_unlock(&bh1->lock); + if (bh1 != bh2) + spin_unlock(&bh2->lock); + /* If we would have faulted, release mmap_sem, fault * it in and start all over again. */ @@ -385,21 +381,10 @@ static int futex_requeue(unsigned long u } if (curval != *valp) { ret = -EAGAIN; - goto out; + goto out_unlock; } } - if (bh1 < bh2) - spin_lock(&bh1->lock); - spin_lock(&bh2->lock); - if (bh1 > bh2) - spin_lock(&bh1->lock); - - if (unlikely(nqueued != bh1->nqueued && valp != NULL)) { - ret = -EAGAIN; - goto out_unlock; - } - head1 = &bh1->chain; list_for_each_entry_safe(this, next, head1, list) { if (!match_futex (&this->key, &key1)) @@ -435,13 +420,9 @@ out: return ret; } -/* - * queue_me and unqueue_me must be called as a pair, each - * exactly once. They are called with the hashed spinlock held. - */ - /* The key must be already stored in q->key. */ -static void queue_me(struct futex_q *q, int fd, struct file *filp) +static inline struct futex_hash_bucket * +queue_lock(struct futex_q *q, int fd, struct file *filp) { struct futex_hash_bucket *bh; @@ -455,11 +436,35 @@ static void queue_me(struct futex_q *q, q->lock_ptr = &bh->lock; spin_lock(&bh->lock); - bh->nqueued++; + return bh; +} + +static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *bh) +{ list_add_tail(&q->list, &bh->chain); spin_unlock(&bh->lock); } +static inline void +queue_unlock(struct futex_q *q, struct futex_hash_bucket *bh) +{ + spin_unlock(&bh->lock); + drop_key_refs(&q->key); +} + +/* + * queue_me and unqueue_me must be called as a pair, each + * exactly once. They are called with the hashed spinlock held. + */ + +/* The key must be already stored in q->key. */ +static void queue_me(struct futex_q *q, int fd, struct file *filp) +{ + struct futex_hash_bucket *bh; + bh = queue_lock(q, fd, filp); + __queue_me(q, bh); +} + /* Return 1 if we were still queued (ie. 0 means we were woken) */ static int unqueue_me(struct futex_q *q) { @@ -503,6 +508,7 @@ static int futex_wait(unsigned long uadd DECLARE_WAITQUEUE(wait, current); int ret, curval; struct futex_q q; + struct futex_hash_bucket *bh; retry: down_read(¤t->mm->mmap_sem); @@ -511,7 +517,7 @@ static int futex_wait(unsigned long uadd if (unlikely(ret != 0)) goto out_release_sem; - queue_me(&q, -1,
Re: kernel bug: futex_wait hang
On Tue, Mar 22, 2005 at 12:30:53AM -0500, Lee Revell wrote: > On Mon, 2005-03-21 at 21:08 -0800, Andrew Morton wrote: > > Jamie Lokier <[EMAIL PROTECTED]> wrote: > > > > > > The most recent messages under "Futex queue_me/get_user ordering", > > > with a patch from Jakub Jelinek will fix this problem by changing the > > > kernel. Yes, you should apply Jakub's most recent patch, message-ID > > > "<[EMAIL PROTECTED]>". > > > > > > I have not tested the patch, but it looks convincing. > > > > OK, thanks. Lee && Paul, that's at > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc1/2.6.12-rc1-mm1/broken-out/futex-queue_me-get_user-ordering-fix.patch > > > > Does not fix the problem. Have you analyzed the use of mutexes/condvars in the program? The primary suspect is a deadlock, race of some kind or other bug in the program. All these will show up as a hang in FUTEX_WAIT. The argument that it works with LinuxThreads doesn't count, the timing and internals of both threading libraries are so different that a program bug can only show up with one of the threading libraries and not both. Only once you distill a minimal self-contained testcase that proves the program is correct and it gets analyzed, it is time to talk about NPTL or kernel bugs. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel bug: futex_wait hang
On Wed, Mar 23, 2005 at 05:12:59AM -0800, [EMAIL PROTECTED] wrote: > the hang occurs during an attempted thread cancel+join. we know from > strace that one thread calls tgkill() on the other. the other thread is > blocked in a poll call on a FIFO. after tgkill, the first thread enters a > futex wait, apparently waiting for the thread ID of the cancelled thread > to appear at some location (just a guess based on the info from strace). > the wait never returns, and so the first thread ends up hung in > pthread_join(). there are no user-defined mutexes or condvars involved. If the thread that is to be cancelled is in async cancel state (it should be when waiting in a poll and if cancellation is not disabled in that thread), then pthread_cancel sends a SIGCANCEL signal to it via tgkill. If tgkill succeeds (and thus pthread_cancel succeeds too) and you call pthread_join on it, in the likely case the thread is still alive pthread_join will FUTEX_WAIT on pd->tid, waiting until the thread dies. NPTL threads are created with CLONE_CHILD_CLEARTID &self->tid, so this futex will be FUTEX_WAKEd by mm_release in kernel whenever the thread is exiting (or dying in some other way). So, if pthread_join waits for the thread forever, the thread must be around (otherwise pthread_join would not block on it; well, there could be memory corruption in the program and anything would be possible then). This would mean either that the poll has not been awaken by the SIGCANCEL signal, or e.g. that one of the registered cleanup handlers (or C++ destructors) in the thread that is being cancelled get stuck for whatever reason (deadlock, etc.). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Patch 4/6 randomize the stack pointer
On Sat, Jan 29, 2005 at 01:31:46AM -0500, John Richard Moser wrote: > Finally, although an NX stack is nice, you should probably take into > account IBM's stack smash protector, ProPolice. Any attack that can > evade SSP reliably can evade an NX stack; but ProPolice protects from > other overflows. Now I'm sure RH is over there inventing something that > detects buffer overflows at compile time and misses or warns about the > ones it can't identify: > > if (strlen(a) > 4) > a[5] = '\0'; > foo(a); > > void foo(char *a) { >char b[5]; >strcpy(b,a); > } > > This code is safe, but you can't tell from looking at foo(). You don't > get a look at every other object being compiled against this one that > may call foo() either. So compile time buffer overflow detection is a > best-effort at best. If strlen(a) > 4 above, then -D_FORTIFY_SOURCE={1,2} compiled program will be terminated in the strcpy call. At compile time it computes that the strcpy call can fill in at most 5 bytes and if it copies more, then it terminates. > ProPolice protects local variables with 0 overhead; passed arguments > with a few instructions; and the return pointer and stack frame pointer > with a couple instructions. At runtime. Want to impress me? Actually > deploy ProPolice instead of showing up 3 years from now waving around > your own patch that you wrote that half-impliments half of it. If you > want "something better," it's GPL, so grab it and start hacking. __builtin_object_size () checking/-D_FORTIFY_SOURCE=n changes are (partly) orthogonal to ProPolice. There are exploits prevented by -D_FORTIFY_SOURCE={1,2} checking and not ProPolice and vice versa. Things that the former protects and the latter does not are e.g. some non-automatic buffer overflows or heap overflows, some format string vulnerabilities and for automatic variables e.g. those that don't overflow into another function's frame, but just overwrite other local variables in the same function. ProPolice on the other side will detect stack overflows that overflow into another function's frame, even if they aren't done through string operations (, s*printf, gets, etc.) or if the compiler can't figure out what certain arguments to these functions points to (and where) at compile time. The ideas in IBM's ProPolice changes are good and worth implementing, but the current implementation is bad. FYI, you can find some details about -D_FORTIFY_SOURCE=n in http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02055.html Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ppc32 weirdness with gcc-4.0 in 2.6.11-rc4
On Thu, Feb 24, 2005 at 04:08:47PM +0100, Mikael Pettersson wrote: > /* gcc4bug.c > * Written by Mikael Pettersson <[EMAIL PROTECTED]>, 2005-02-24. ... Reproduced, thanks for the testcase. Looking into it... Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ppc32 weirdness with gcc-4.0 in 2.6.11-rc4
On Thu, Feb 24, 2005 at 04:08:47PM +0100, Mikael Pettersson wrote: > _However_, the 0k data message is due to a gcc-4.0 bug, and below > you'll find a test program which illustrates it. http://gcc.gnu.org/PR20196 Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix binfmt_elf.c
Hi! There is a bug in binfmt_elf.c if the dynamic linker has non-zero base vaddr (e.g. if it is prelinked). The issue is that in such case ld-linux.so.2 is loaded at ELF_ET_DYN_BASE + p_vaddr instead of ELF_ET_DYN_BASE, on some architectures into non-desirable places in virtual memory. Best explained on a ld-linux.so.2 prelink(1)ed to 0x4000 on ia32: $ LD_TRACE_LOADED_OBJECTS=1 ./ld-linux.so.2 ./libc.so.6 /lib/ld-linux.so.2 => ./ld-linux.so.2 (0x6000) ELF_ET_DYN_BASE is defined to 0x2000 in ia32 (see the patch, it was meant to be 0x8000), so ld-linux.so.2 should have l_map_start 0x2000 while as you see in reality it has 0x6000. If this prelinked VMA + ELF_ET_DYN_BASE fits into kernel reserved address space, ./ld-linux.so.2 running won't work at all. Also, many platforms such as i386 use #define ELF_ET_DYN_BASE (2 * TASK_SIZE / 3) which I guess is not what was originally intended (on i386 this is usually 0x2aaa). As this value gets passed to elf_map which rounds it down to ELF page boundary anyway, I think (TASK_SIZE / 3 * 2) is far better. I've changed it on ia32 only, but if someone would test it on other platforms which set ELF_ET_DYN_BASE this way it would be probably good to change elsewhere as well. --- linux/fs/binfmt_elf.c.jjThu May 24 11:11:36 2001 +++ linux/fs/binfmt_elf.c Thu May 24 11:32:26 2001 @@ -396,7 +396,7 @@ out: static int load_elf_binary(struct linux_binprm * bprm, struct pt_regs * regs) { struct file *interpreter = NULL; /* to shut gcc up */ - unsigned long load_addr = 0, load_bias; + unsigned long load_addr = 0, load_bias = 0; int load_addr_set = 0; char * elf_interpreter = NULL; unsigned int interpreter_type = INTERPRETER_NONE; @@ -595,12 +595,6 @@ static int load_elf_binary(struct linux_ setup_arg_pages(bprm); /* XXX: check error */ current->mm->start_stack = bprm->p; - /* Try and get dynamic programs out of the way of the default mmap - base, as well as whatever program they might try to exec. This - is because the brk will follow the loader, and is not movable. */ - - load_bias = ELF_PAGESTART(elf_ex.e_type==ET_DYN ? ELF_ET_DYN_BASE : 0); - /* Now we do a little grungy work by mmaping the ELF image into the correct location in memory. At this point, we assume that the image should be loaded at fixed address, not at a variable @@ -624,6 +618,11 @@ static int load_elf_binary(struct linux_ vaddr = elf_ppnt->p_vaddr; if (elf_ex.e_type == ET_EXEC || load_addr_set) { elf_flags |= MAP_FIXED; + } else if (elf_ex.e_type == ET_DYN) { + /* Try and get dynamic programs out of the way of the default +mmap + base, as well as whatever program they might try to exec. +This + is because the brk will follow the loader, and is not +movable. */ + load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr); } error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags); --- linux/include/asm-i386/elf.h.jj Mon Mar 26 18:48:10 2001 +++ linux/include/asm-i386/elf.hThu May 24 11:49:38 2001 @@ -55,7 +55,7 @@ typedef struct user_fxsr_struct elf_fpxr the loader. We need to make sure that it is out of the way of the program that it will "exec", and that there is sufficient room for the brk. */ -#define ELF_ET_DYN_BASE (2 * TASK_SIZE / 3) +#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) /* Wow, the "main" arch needs arch dependent functions too.. :) */ Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix kernel linker scripts
Hi! Apparently all kernel scripts only have .rodata and not also .rodata.* input sections in it. This has been no problem so far, but since binutils and gcc support SHF_MERGE sections (so that string constant (and other constant too) duplicates can be removed at link time) the compiler creates sections like .rodata.str1.1 and they really should be merged into the rodata output section (or whatever else) during linking (the default binutils linker scripts are doing this for ages). On some architectures it creates no problems, just one more section in section table (like i386), on others it causes the kernel not to boot at all (e.g. on ia64). Please apply. --- linux/arch/alpha/boot/bootloader.lds.jj Sun Sep 6 13:34:33 1998 +++ linux/arch/alpha/boot/bootloader.ldsTue Jun 26 11:05:14 2001 @@ -6,7 +6,7 @@ SECTIONS .text : { *(.text) } _etext = .; PROVIDE (etext = .); - .rodata : { *(.rodata) } + .rodata : { *(.rodata) *(.rodata.*) } .data : { *(.data) CONSTRUCTORS } .got : { *(.got) } .sdata : { *(.sdata) } --- linux/arch/alpha/vmlinux.lds.in.jj Mon Jun 26 14:26:56 2000 +++ linux/arch/alpha/vmlinux.lds.in Tue Jun 26 11:05:24 2001 @@ -53,7 +53,7 @@ SECTIONS /* Global data */ _data = .; .data.cacheline_aligned : { *(.data.cacheline_aligned) } - .rodata : { *(.rodata) } + .rodata : { *(.rodata) *(.rodata.*) } .data : { *(.data) CONSTRUCTORS } .got : { *(.got) } .sdata : { *(.sdata) } --- linux/arch/arm/boot/compressed/vmlinux.lds.in.jjThu Feb 8 19:32:44 2001 +++ linux/arch/arm/boot/compressed/vmlinux.lds.in Tue Jun 26 11:05:35 2001 @@ -24,6 +24,7 @@ SECTIONS *(.fixup) *(.gnu.warning) *(.rodata) +*(.rodata.*) *(.glue_7) *(.glue_7t) input_data = .; --- linux/arch/arm/vmlinux-armo.lds.in.jj Thu Feb 8 19:32:44 2001 +++ linux/arch/arm/vmlinux-armo.lds.in Tue Jun 26 11:05:49 2001 @@ -47,6 +47,7 @@ SECTIONS *(.gnu.warning) *(.text.lock) /* out-of-line lock text */ *(.rodata) + *(.rodata.*) *(.glue_7) *(.glue_7t) *(.kstrtab) --- linux/arch/arm/vmlinux-armv.lds.in.jj Wed May 16 18:25:16 2001 +++ linux/arch/arm/vmlinux-armv.lds.in Tue Jun 26 11:05:57 2001 @@ -42,6 +42,7 @@ SECTIONS *(.gnu.warning) *(.text.lock) /* out-of-line lock text */ *(.rodata) + *(.rodata.*) *(.glue_7) *(.glue_7t) *(.got) /* Global offset table */ --- linux/arch/cris/boot/compressed/decompress.ld.jjFri Apr 6 13:42:55 2001 +++ linux/arch/cris/boot/compressed/decompress.ld Tue Jun 26 11:06:04 2001 @@ -13,6 +13,7 @@ SECTIONS _stext = . ; *(.text) *(.rodata) + *(.rodata.*) _etext = . ; } > dram .data : --- linux/arch/cris/cris.ld.jj Tue May 1 19:04:56 2001 +++ linux/arch/cris/cris.ld Tue Jun 26 11:06:23 2001 @@ -24,7 +24,7 @@ SECTIONS *(.fixup) *(.text.__*) *(.rodata) - *(.rodata.__*) + *(.rodata.*) } . = ALIGN(4);/* Exception table */ --- linux/arch/i386/vmlinux.lds.jj Wed Jan 3 23:45:26 2001 +++ linux/arch/i386/vmlinux.lds Tue Jun 26 11:06:33 2001 @@ -17,7 +17,7 @@ SECTIONS _etext = .; /* End of text section */ - .rodata : { *(.rodata) } + .rodata : { *(.rodata) *(.rodata.*) } .kstrtab : { *(.kstrtab) } . = ALIGN(16); /* Exception table */ --- linux/arch/ia64/boot/bootloader.lds.jj Sun Feb 6 21:42:40 2000 +++ linux/arch/ia64/boot/bootloader.lds Tue Jun 26 11:06:42 2001 @@ -12,7 +12,7 @@ SECTIONS /* Global data */ _data = .; - .rodata : { *(.rodata) } + .rodata : { *(.rodata) *(.rodata.*) } .data: { *(.data) *(.gnu.linkonce.d*) CONSTRUCTORS } __gp = ALIGN (8) + 0x20; .got : { *(.got.plt) *(.got) } --- linux/arch/ia64/sn/fprom/fprom.lds.jj Thu Jan 4 16:00:15 2001 +++ linux/arch/ia64/sn/fprom/fprom.lds Tue Jun 26 11:07:02 2001 @@ -24,7 +24,7 @@ SECTIONS _data = .; .rodata : AT(ADDR(.rodata) - 0x ) - { *(.rodata) } + { *(.rodata) *(.rodata.*) } .opd : AT(ADDR(.opd) - 0x ) { *(.opd) } .data : AT(ADDR(.data) - 0x ) --- linux/arch/ia64/vmlinux.lds.S.jjThu Apr 5 15:51:47 2001 +++ linux/arch/ia64/vmlinux.lds.S Tue Jun 26 11:07:15 2001 @@ -83,7 +83,7 @@ SECTIONS ia64_unw_end = .; .rodata : AT(ADDR(.rodata) - PAGE_OFFSET) - { *(.rodata) } + { *(.rodata) *(.rodata.*) } .kstrtab : AT(ADDR(.kstrtab) - PAGE_OFFSET) { *(.kstrtab) } .opd : AT(ADDR(.opd) - PAGE_
Re: memcpy(a,b,CONST) is not inlined by gcc 3.4.1 in Linux kernel
On Tue, Mar 29, 2005 at 05:37:06PM +0300, Denis Vlasenko wrote: > typedef unsigned int size_t; > > static inline void * __memcpy(void * to, const void * from, size_t n) > { > int d0, d1, d2; > __asm__ __volatile__( > "rep ; movsl\n\t" > "testb $2,%b4\n\t" > "je 1f\n\t" > "movsw\n" > "1:\ttestb $1,%b4\n\t" > "je 2f\n\t" > "movsb\n" > "2:" > : "=&c" (d0), "=&D" (d1), "=&S" (d2) > :"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from) > : "memory"); > return (to); > } > > /* > * This looks horribly ugly, but the compiler can optimize it totally, > * as the count is constant. > */ > static inline void * __constant_memcpy(void * to, const void * from, size_t n) > { > if (n <= 128) > return __builtin_memcpy(to, from, n); > > #define COMMON(x) \ > __asm__ __volatile__( \ > "rep ; movsl" \ > x \ > : "=&c" (d0), "=&D" (d1), "=&S" (d2) \ > : "0" (n/4),"1" ((long) to),"2" ((long) from) \ > : "memory"); > { > int d0, d1, d2; > switch (n % 4) { > case 0: COMMON(""); return to; > case 1: COMMON("\n\tmovsb"); return to; > case 2: COMMON("\n\tmovsw"); return to; > default: COMMON("\n\tmovsw\n\tmovsb"); return to; > } > } > > #undef COMMON > } > > #define memcpy(t, f, n) \ > (__builtin_constant_p(n) ? \ > __constant_memcpy((t),(f),(n)) : \ > __memcpy((t),(f),(n))) > > int f3(char *a, char *b) { memcpy(a,b,3); } The problem is that in GCC < 4.0 there is no constant propagation pass before expanding builtin functions, so the __builtin_memcpy call above sees a variable rather than a constant. Either use GCC 4.0+, where this works just fine, or move the n <= 128 case into the macro: #define memcpy(t, f, n) \ (__builtin_constant_p(n) ? \ ((n) <= 128 ? __builtin_memcpy(t,f,n) : __constant_memcpy(t,f,n) : \ __memcpy(t,f,n)) Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC: 2.6 patch] add -fno-tree-scev-cprop to KBUILD_CFLAGS
On Sun, Nov 11, 2007 at 07:48:29AM +0100, Adrian Bunk wrote: > The gcc from svn that will become gcc 4.3 generates libgcc calls in > cases like the following (on 32bit architectures): > > <-- snip --> > > static inline void timespec_add_ns(struct timespec *a, u64 ns) > { > ... > while(ns >= NSEC_PER_SEC) { > ns -= NSEC_PER_SEC; > a->tv_sec++; > } > ... > > <-- snip --> Blindly using -fno-tree-scev-cprop just to get rid of one case where this turns out to be a pessimization when kernel knows ns is usually very small is IMHO a wrong thing, you'd lose many cases where this optimization can actually improve performance. Instead, for this exact case just add an optimization barrier to avoid gcc doing this. Adding asm ("" : "=r" (ns) : "0" (ns)); (or hide it in some macro) into the loop will do the job just fine. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso.so mislinked by buggy linker was Re: Linus 2.6.23-rc1
On Mon, Jul 23, 2007 at 01:56:20AM +0200, Andi Kleen wrote: > On Monday 23 July 2007 01:38:40 Andre Noll wrote: > [readded linux-kernel, Linus] > > > [Nr] Name Type Address Offset > >Size EntSize Flags Link Info Align > > [ 0] NULL > > 0 0 0 > > [ 1] .hash HASH ff700120 0120 > >00b4 0004 A 2 0 8 > > [ 2] .dynsym DYNSYM ff7001d8 01d8 > >0270 0018 A 312 8 > > [ 3] .dynstr STRTAB ff700448 0448 > >0059 A 0 0 1 > > [ 4] .gnu.version VERSYM ff7004a2 04a2 > >0034 0002 A 2 0 2 > > [ 5] .gnu.version_dVERDEF ff7004d8 04d8 > >0038 A 3 2 8 > > [ 6] .text PROGBITS ff700c00 00100bab > > >02e4 AX 0 0 64 > > It puts .text at 1MB. Your vdso file must be huge? > > It looks like it ignores the > -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 > options passed to it. The AMD64 ABI has a 1MB minimum page size, but > these options are supposed to disable it. These options are fairly new, before they were ignored (like all unknown -z options). They were added 2006-05-30 to CVS binutils. I guess the problem is caused by the gap being too big and old binutils. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Mon, Jul 23, 2007 at 01:31:00AM +0200, Andi Kleen wrote: > On Monday 23 July 2007 01:23:38 Andre Noll wrote: > > On 00:22, Andi Kleen wrote: > > > > /usr/bin/ld: section .text [ff700500 -> ff7007e3] > > > > overlaps section .gnu.version_d [ff7004d8 -> ff70050f] > > > > > > Does this patch fix it? > > > > Nope, with 0x600 I still get the same error. But it helped to further > > increase VDSO_TEXT_OFFSET to 0xc00. I tried 0x700, 0x800,... and 0xc00 > > is the smallest value in this series that makes the error go away, i.e. > > the patch below works for me. > > Can you send (privately) readelf -a output from your vdso.so ? > Your linker must be doing something weird. > > 0xc00 is quite wasteful. I think Roland's --build-id doesn't create very big section, the likely culprit would be a hacked up ld that e.g. defaults to --hash-style=both. Can you retry with --hash-style=sysv? vdso really has to include the traditional .hash section, otherwise it wouldn't be compatible with old glibcs, and an additional .gnu.hash might be an overkill for it - doesn't the vdso define only very few symbols? Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso.so mislinked by buggy linker was Re: Linus 2.6.23-rc1
On Mon, Jul 23, 2007 at 01:56:20AM +0200, Andi Kleen wrote: > On Monday 23 July 2007 01:38:40 Andre Noll wrote: > [readded linux-kernel, Linus] > > > [Nr] Name Type Address Offset > >Size EntSize Flags Link Info Align > > [ 0] NULL > > 0 0 0 > > [ 1] .hash HASH ff700120 0120 > >00b4 0004 A 2 0 8 > > [ 2] .dynsym DYNSYM ff7001d8 01d8 > >0270 0018 A 312 8 > > [ 3] .dynstr STRTAB ff700448 0448 > >0059 A 0 0 1 > > [ 4] .gnu.version VERSYM ff7004a2 04a2 > >0034 0002 A 2 0 2 > > [ 5] .gnu.version_dVERDEF ff7004d8 04d8 > >0038 A 3 2 8 > > [ 6] .text PROGBITS ff700c00 00100bab > > >02e4 AX 0 0 64 > > It puts .text at 1MB. Your vdso file must be huge? > > It looks like it ignores the > -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 > options passed to it. The AMD64 ABI has a 1MB minimum page size, but > these options are supposed to disable it. > > Not sure how to work around this, but having an 1+MB vdso would be incredibly > wasteful. What version is it? Perhaps we just drop support for this. I can't > think of a workaround currently. Looking at vdso.lds.S, if you change just VDSO_TEXT_OFFSET to 0xc00 and don't tweak the linker script, then you jump backwards with the dot, you should even get a linker warning about it: . = VDSO_PRELINK + VDSO_TEXT_OFFSET; .text : { *(.text) }:text .text.ptr : { *(.text.ptr) }:text . = VDSO_PRELINK + 0x900; Guess that 0x900 should have been VDSO_TEXT_OFFSET + 0x400 or something similar. Also note that it is highly desirable to fit the whole vdso into one page, so increasing VDSO_TEXT_OFFSET etc. offsets too much is just wasting memory. From the above dump, VDSO_TEXT_OFFSET 0x500 is too low, but 0x600 should work, assuming .data section is moved 0x100 higher as well. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: gcc fixed size char array initialization bug - known?
On Thu, Aug 02, 2007 at 09:55:51PM +0200, Guennadi Liakhovetski wrote: > I've run across the following gcc "feature": > > char c[4] = "01234"; > > gcc emits a nice warning > > warning: initializer-string for array of chars is too long > > But do a > > char c[4] = "0123"; > > and - a wonder - no warning. No warning with gcc 3.3.2, 3.3.5, 3.4.5, > 4.1.2. I was told 4.2.x does produce a warning. 4.2.x nor 4.3 doesn't warn either and it is correct not to warn about perfectly valid code. ISO C99 is very obvious in that the terminating '\0' (resp. L'\0') from the string literal is only added if there is room in the array or if the array has unknown size. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Implementation of POSIX mqueues in Linux 2.6
On Fri, Aug 03, 2007 at 09:59:32AM +, gregfe wrote: > I find little documentation on the actual implementation of POSIX message > queues in Linux, and need some advise. In particular, I am wondering > whether it supports inter-process *and* inter-thread communication, and if Not sure what exactly you mean by inter-thread communication, whether communication between threads within one process or between threads from different processes. You can use mq_* for either, except that mq_notify registered signal notification is sent to the process that called mq_notify, not thread (and for SIGEV_THREAD a new thread is created). Though of course for communication between threads within one process mq_* is a huge overkill. > On more thing: kernel's "make menuconfig" of > version 2.6.11 says : > > >> To use this feature you will also need mqueue library, available > > >> from <... a URL ... to M. Wronski's and K. Benedyczak's home page>" > > Is it still up to date ? No, glibc supports mq_* APIs for more than 3 years now. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: smaller kernel with no real time futexes
On Wed, Aug 01, 2007 at 09:24:34PM +0200, Andi Kleen wrote: > Adrian, > > You said earlier you're looking at smaller allnoconfig kernels. > One thing I noticed recently that realtime pi futexes are always > enabled and that pulls in a lot of other code (like the plists) > > Userland needs to handle them not being available anyways for older > kernels. > > Might be worth looking into turning that into a CONFIG. That's a very bad idea. glibc configured for 2.6.18 and higher kernels assumes PI futexes are present. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RESEND] PIE executable randomization
On Wed, Aug 08, 2007 at 04:03:07PM +0200, Jiri Kosina wrote: > @@ -870,11 +917,15 @@ static int load_elf_binary(struct linux_binprm *bprm, > struct pt_regs *regs) >* default mmap base, as well as whatever program they >* might try to exec. This is because the brk will >* follow the loader, and is not movable. */ > +#ifdef CONFIG_X86 > + load_bias = 0; > +#else > load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr); > +#endif > } > > error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, > - elf_prot, elf_flags); > + elf_prot, elf_flags,0); > if (BAD_ADDR(error)) { > send_sig(SIGKILL, current, 0); > retval = IS_ERR((void *)error) ? If I'm reading the above hunk correctly, this means we will randomize all PIEs and even all dynamic linkers invoked as executables on i?86 and x86_64, and on the rest of arches we won't randomize at all, instead load ET_DYN objects at ELF_ET_DYN_BASE address. But I don't see anything i?86/x86_64 specific on this. What would make much more sense to me would be conditionalizing on whether we are loading a dynamic linker (in which case loading it at ELF_ET_DYN_BASE is desirable or not (PIEs, ...; and for PIEs we want to randomize on all architectures). So something like if (elf_interpreter) load_bias = 0; else /* Probably dynamic linker invoked as /lib*/ld*so* program args - load at ELF_ET_DYN_BASE. */ load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr); instead of #ifdef CONFIG_X86 load_bias = 0; #else load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr); #endif Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Define the EF_AS_NO_RANDOM e_flag bit
On Tue, Jan 23, 2007 at 11:28:13PM +0300, Samium Gromoff wrote: > Author: Samium Gromoff <[EMAIL PROTECTED]> > Date: Tue Jan 23 22:31:13 2007 +0300 > > Define the ELF binary header flag EF_AS_NO_RANDOM > > EF_AS_NO_RANDOM should mean that the binary requests to not apply > randomisation to address spaces of its processes. > > diff --git a/include/linux/elf.h b/include/linux/elf.h > index 60713e6..58ebb47 100644 > --- a/include/linux/elf.h > +++ b/include/linux/elf.h > @@ -172,6 +172,8 @@ typedef struct elf64_sym { > > #define EI_NIDENT 16 > > +#define EF_AS_NO_RANDOM 0x1/* do not randomise the address space */ > + You can't make up EF_* flags this way, they are arch specific, the LSB bit (but many others too) are already used on many architectures. E.g.: elf/mt.h:#define EF_MT_CPU_MRISC 0x0001 /* default */ elf/sparc.h:#define EF_SPARCV9_PSO0x1 /* partial store ordering */ elf/bfin.h:#define EF_BFIN_PIC0x0001 /* -fpic */ elf/alpha.h:#define EF_ALPHA_32BIT0x0001 elf/mips.h:#define EF_MIPS_NOREORDER 0x0001 elf/m68k.h:#define EF_M68K_CF_ISA_A_NODIV 0x01 /* ISA A except for div */ elf/sh.h:#define EF_SH1 1 elf/arm.h:#define EF_ARM_RELEXEC 0x01 elf/cris.h:#define EF_CRIS_UNDERSCORE 0x0001 elf/ia64.h:#define EF_IA_64_TRAPNIL (1 << 0) /* Trap NIL pointer dereferences. */ elf/vax.h:#define EF_VAX_NONPIC 0x0001 /* Object contains non-PIC code */ elf/iq2000.h:#define EF_IQ2000_CPU_IQ2000 0x0001 /* default */ elf/frv.h:#define EF_FRV_GPR_32 0x0001 /* -mgpr-32 */ to name just a few. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Define the EF_AS_NO_RANDOM e_flag bit
On Wed, Jan 24, 2007 at 12:06:45AM +0300, Samium Gromoff wrote: > Should we introduce per-arch asm/elf.h files to hold the relevant flag > definitions then? On some architectures there are no bits left. On others you'd need to go through whomever maintains the relevant psABI to get a bit officially allocated. Really, it is very bad idea to use e_flags for this. If all you care about is running setuid LISP programs, you'd much better put your energy into fixing the buggy ELF dumper in it. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel + gcc 4.1 = several problems
On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote: > On Wed, 2007-01-03 at 12:44 +, Alan wrote: > > > > fixed. At that point an i686 kernel would contain i686 instructions and > > > > actually run on all i686 processors ending all the i586 pain for most > > > > users and distributions. > > > > > > Could you explain why CMOV is pointless now? Are there any benchmarks > > > proving that? > > > > Take a look at the recent ffmpeg bits on the mplayer list for one example > > I have to hand - P4 cmov is pretty slow. The crypto folks find the same > > things. > > cmov is effectively the same cost as a compare and jump, in both cases > the cpu needs to do a prediction, and on a mispredict, restart. > > the reason cmov can make sense is because it's smaller code... BTW, from GCC POV availability of CMOV is the only difference between -march=i586 -mtune=something and -march=i686 -mtune=something. So this is just a naming thing, it could be called -march=i686cmov to make it more obvious but it is too late (and too unimportant) to change it now. Perhaps adding a note to info gcc/man gcc ought to be enough? If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic (or whatever other tuning you pick up), with -march=i686 -mtune=generic you tell GCC you have CMOV. Whether CMOV is actually used in generated code is another matter, which should be decided based on the selected -mtune. For -Os CMOV should be used whenever available, as that means usually smaller code, otherwise if on some particular chip CMOV is actually slower than compare, jump and assignment, then CMOV should not be selected for that particular tuning (say if Pentium4 has slower CMOV than compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not often), if you have examples of that, please file a bug to http://gcc.gnu.org/bugzilla/. -mtune=generic should emit resp. not emit CMOV depending on whether it is a win on the currently common CPUs. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -freg-struct-return?
On Thu, Feb 22, 2007 at 12:09:04AM -0800, Jeremy Fitzhardinge wrote: > Arjan van de Ven wrote: > > Do we know how many gcc bugs this has? (regparm used to have many) > > other than that.. sounds like a win... > > > > The documentation suggests that its the preferred mode of operation, and > that its the default on platforms where gcc is the primary compiler. So > the fact that it isn't for Linux suggests either an oversight or that it > is actually broken... It is used for Linux on many architectures (x86_64, sparc64, ia64, ppc{,64}, arm, sh, m68k to name just a few), but it is an ABI decision, so e.g. on i386 is not used by default as the ABI mandates structs/unions are returned in memory. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANN] Userspace M-on-N threading model implementation. Alpha release.
On Sun, Feb 04, 2007 at 03:12:32PM -0500, Bill Davidsen wrote: > Arjan van de Ven wrote: > >>Because user threading can avoid context switches, there will always be > >>cases where it will outperform o/s threads for hardware reasons. > > > >actually.. switching from one "real" thread to another in Linux is not > >an actual context switch in the hardware sense... at least this part of > >your argument seems to be incorrect ;) > > > How does that work? Switching between kernel threads requires going into > the kernel, user level thread switches are all done in user mode. > > Do you have some way to change o/s threads w/o going into the kernel? But going into kernel is not very expensive on Linux. On the other side, the overhead you need to add for every single syscall that might block for the M:N threads and the associated complications which make it far harder to conform to POSIX IMHO far outweight the costs of going into the kernel for a context switch. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: dvb shared datastructure bug?
On Tue, Feb 13, 2007 at 03:14:23PM +0400, Manu Abraham wrote: > >thanks for pointing out this issue. > > > >attached find a patch that fixes the problem. > > > >@mauro - please pull changeset a7ac92d208fe > > dvbdev: fix illegal re-usage of fileoperations struct > > > >from http://www.linuxtv.org/hg/~mws/v4l-dvb-fixtree > > > > Ack'd-by: Manu Abraham <[EMAIL PROTECTED]> Wouldn't it be better to kmalloc both struct dvb_device and struct file_operations together instead of doing 2 separate allocations? struct dvd_device_plus_fops { struct dvb_device dev; struct file_operations fops; } *dev_fops = kmalloc (sizeof (struct dvd_device_plus_fops), GFP_KERNEL); *pdvbdev = dvbdev = (struct dvb_device *)dev_fops; if (dev_fops == NULL) error handling; memset (&dev_fops->fops, 0, sizeof (dev_fops->fops)); ... dvbdev->fops = &dev_fops->fops; Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Intel Core Duo/Duo2 T2300/E6400 - Hyper-Threading (the absence of)
On Mon, Jan 08, 2007 at 01:44:32AM -0800, Robin H. Johnson wrote: > (Please CC me, I am not subscribed to LKML [I have set the > Mail-Followup-To header accordingly]). > > On two of my new machines, with Intel Core Duo T2300 and Core2 Duo E6400 > chips respectively, I noticed some weirdness in how many CPUs are > present. > > If the hyper-threading bit is present in the CPU info, should there > always be a an extra CPU presented to the system per physical core? No. The ht flag just says whether HT reporting via CPUID is supported. Core2 Duo E6400 is AFAIK not hyper-threaded, you just have 2 real sibling CPUs (except that they share L2 cache). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20-rc4 1/4] futex priority based wakeup
On Wed, Jan 10, 2007 at 12:47:21PM +0100, Pierre Peiffer wrote: > So, yes it (logically) has a cost, depending of the number of different > priorities used, so it's specially measurable with real-time threads. > With SCHED_OTHER, I suppose that the priorities are not be very distributed. > > May be, supposing it makes sense to respect the priority order only for > real-time pthreads, I can register all SCHED_OTHER threads to the same > MAX_RT_PRIO priotity ? > Or do you think this must be set behind a CONFIG* option ? > (Or finally not interesting enough for mainline ?) As soon as there is at least one non-SCHED_OTHER thread among the waiters, there is no question about whether plist should be used or not, that's a correctness issue and if we want to conform to POSIX, we have to use that. I guess Ulrich's question was mainly about performance differences with/without plist wakeup when all threads are SCHED_OTHER. I'd say for that a pure pthread_mutex_{lock,unlock} benchmark or even just a program which uses futex FUTEX_WAIT/FUTEX_WAKE in a bunch of threads would be better. In the past we talked with Ingo about the possibilities here, one is use plist always and prove that it doesn't add measurable overhead over current FIFO (when only SCHED_OTHER is involved), the other possibility would be to start using FIFOs as before, but when the first non-SCHED_OTHER thread decides to wait on the futex, switch it to plist wakeup mode (convert the FIFO into a plist) and from that point on just use plist wakeups on it. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20-rc4 4/4][RFC] sys_futex64 : allows 64bit futexes
On Tue, Jan 09, 2007 at 05:25:26PM +0100, Pierre Peiffer wrote: > This latest patch is an adaptation of the sys_futex64 syscall provided in > -rt > patch (originally written by Ingo). It allows the use of 64bit futex. > > I have re-worked most of the code to avoid the duplication of the code. > > It does not provide the functionality for all architectures, and thus, it > can > not be applied "as is". > But, again, feedbacks and comments are welcome. Why do you support all operations for 64-bit futexes? IMHO PI futexes don't make sense for 64-bit futexes, PI futexes have hardcoded bit layout of the 32-bit word. Similarly, FUTEX_WAKE is not really necessary for 64-bit futexes, 32-bit futex's FUTEX_WAKE can wake it equally well (it never reads anything, all it cares is about the futex's address). Similarly, I don't see a need for FUTEX_WAKE_OP (and this could simplify the patch quite a lot, no need to change asm*/futex.h headers at all). All that's needed is 64-bit FUTEX_WAIT and perhaps FUTEX_CMP_REQUEUE. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] work around gcc4 issue with -Os in Dwarf2 stack unwind code
On Tue, Nov 28, 2006 at 02:12:24PM +, Jan Beulich wrote: > This fixes a problem with gcc4 mis-compiling the stack unwind code under > -Os, which resulted in 'stuck' messages whenever an assembly routine was > encountered. "mis-compiling" and "work around" are wrong words, the code had undefined behavior (there is no sequence point between evaluation of ptr and get_uleb128(&ptr, end) and ptr is modified twice, so the compiler can evaluate it e.g. as: temp = ptr; temp = temp + get_uleb128(&ptr, end); ptr = temp; or temp = get_uleb128(&ptr, end); ptr += temp; While gcc has some warnings for sequence point semantics violations (-Wsequence-point), this can't be one of the cases at least until IPA moves much further, because get_uleb128 might very well not modify the variable and at that point the code would be ok). > Signed-off-by: Jan Beulich <[EMAIL PROTECTED]> > > --- linux-2.6.19-rc6/kernel/unwind.c 2006-11-22 14:54:10.0 +0100 > +++ 2.6.19-rc6-unwind-stuck/kernel/unwind.c 2006-11-28 15:02:15.0 > +0100 > @@ -938,8 +938,11 @@ int unwind(struct unwind_frame_info *fra > else { > retAddrReg = state.version <= 1 ? *ptr++ : > get_uleb128(&ptr, end); > /* skip augmentation */ > - if (((const char *)(cie + 2))[1] == 'z') > - ptr += get_uleb128(&ptr, end); > + if (((const char *)(cie + 2))[1] == 'z') { > + uleb128_t augSize = get_uleb128(&ptr, end); > + > + ptr += augSize; > + } > if (ptr > end > || retAddrReg >= ARRAY_SIZE(reg_info) > || REG_INVALID(retAddrReg) Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] work around gcc4 issue with -Os in Dwarf2 stack unwind code
On Tue, Nov 28, 2006 at 02:48:15PM +, Jan Beulich wrote: > I disagree - the standard says there's a sequence point at a function > call after evaluating all function arguments. To me this means that any That's true, that sequence point makes sure e.g. all side effects such as pre-{dec,inc}rement on the arguments happen before the call. But as I said, no sequence point demands any particular ordering of evaluation of the LHS and RHS of +=. > (parts of an) expression the function call is contained in must be > evaluated after the function call. Otherwise it would be illegal to e.g. > modify a variable in both operands of && or ||. That's different, there is a sequence point at the end of the first operand of &&, ||, ?: and , operators (second bullet in ISO C99 Annex C). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away
On Wed, Nov 29, 2006 at 02:56:20PM +1100, Keith Owens wrote: > Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote: > >On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote: > >> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux), > >> wait_hpet_tick is optimized away to a never ending loop and the kernel > >> hangs on boot in timer setup. > >> > >> 001a : > >> 1a: 55 push %ebp > >> 1b: 89 e5 mov%esp,%ebp > >> 1d: eb fe jmp1d > >> > >> This is not a problem with gcc 3.3.5. Adding barrier() calls to > >> wait_hpet_tick does not help, making the variables volatile does. > >> > >> Signed-off-by: Keith Owens > >> > >> --- > >> arch/i386/kernel/time_hpet.c |2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> Index: linux-2.6/arch/i386/kernel/time_hpet.c > >> === > >> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c > >> +++ linux-2.6/arch/i386/kernel/time_hpet.c > >> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d, > >> */ > >> static void __devinit wait_hpet_tick(void) > >> { > >> - unsigned int start_cmp_val, end_cmp_val; > >> + unsigned volatile int start_cmp_val, end_cmp_val; > >> > >>start_cmp_val = hpet_readl(HPET_T0_CMP); > >>do { > > > >When you examine the inlined functions involved, this looks an awful lot > >like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278 > > > >Perhaps SUSE should fix their gcc instead of working around compiler > >problems in the kernel? > > Firstly, the fix for 22278 is included in gcc 4.1.0. This actually sounds more like http://gcc.gnu.org/PR27236 And that one is broken in 4.1.0, fixed in 4.1.1. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 4/5][AIO] - AIO completion signal notification
On Wed, Nov 29, 2006 at 11:33:01AM +0100, S?bastien Dugu? wrote: > AIO completion signal notification > > The current 2.6 kernel does not support notification of user space via > an RT signal upon an asynchronous IO completion. The POSIX specification > states that when an AIO request completes, a signal can be delivered to > the application as notification. > > This patch adds a struct sigevent *aio_sigeventp to the iocb. > The relevant fields (pid, signal number and value) are stored in the kiocb > for use when the request completes. > > That sigevent structure is filled by the application as part of the AIO > request preparation. Upon request completion, the kernel notifies the > application using those sigevent parameters. If SIGEV_NONE has been specified, > then the old behaviour is retained and the application must rely on polling > the completion queue using io_getevents(). Well, from what I see applications must rely on polling the completion queue using io_getevents() in any case, isn't that the only way how to free the kernel resources associated with the AIO request, even if it uses SIGEV_SIGNAL or thread notification? aio_error/aio_return/aio_suspend will still need to io_getevents, the only important difference with this patch is that a) the polling doesn't need to be asynchronous (i.e. have a special thread which just loops doing io_getevents) b) it doesn't need to care about notification itself. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away
On Fri, Dec 01, 2006 at 08:28:16AM +0100, Willy Tarreau wrote: > Oh, I'm perfectly aware of this. That's in part why I started the hotfix > branch in the past :-) But sometimes, fixes consist in merging all the > patches from the maintenance branch (eg: from 4.1.0 to 4.1.1), and if > this is the case, there would not be much justification not to simply > update the version. In fact, what's really missing is a "fixlevel" in > the packages, to inform the user that 4.1.0 as shipped by the distro > has the same level of fixes as 4.1.1. But this is what the version is > used for today. This is even more complicated by the fact that upstream GCC release branches (and also several Linux distributors) start announcing the upcoming version already a few days after a release is tagged. E.g. 14 days old gcc-4_1-branch says: ./xgcc -B ./ --version; ./xgcc -B ./ -dD -E -xc /dev/null | grep GNU xgcc (GCC) 4.1.2 20061114 (prerelease) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #define __GNUC__ 4 #define __GNUC_MINOR__ 1 #define __GNUC_PATCHLEVEL__ 2 but GCC 4.1.2 has not been released yet. In Fedora Core/RHEL and I think a few other distros the version number is only changed when it is officially released, e.g.: gcc --version; gcc -dD -E -xc /dev/null | grep GNU gcc (GCC) 4.1.1 20061011 (Red Hat 4.1.1-30) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #define __GNUC__ 4 #define __GNUC_MINOR__ 1 #define __GNUC_PATCHLEVEL__ 1 #define __GNUC_RH_RELEASE__ 30 Note, 4.1.1 was released end of May this year and 4.1.2 has not been released. So, using __GNUC_PATCHLEVEL__ to detect if a bug has been fixed or not isn't very useful (you'd need to rule out also __GNUC_PATCHLEVEL__ <= 1 because gcc-4_1-branch was announcing that patchlevel already since beggining of March, on the other side there is a lot of GCCs with __GNUC_PATCHLEVEL__ == 1 that certainly have that bug fixed). You perhaps could parse the prerelease vs. release vs. vendor strings, but that could be quite difficult, perhaps easier would be just parse the date in the --version output. Checking for the bug is best though, because that will catch even backports of the bugfix without rebasing from the release branch. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] get_random_long() and AT_ENTROPY for auxv, kernel 2.6.21.5
On Sun, Jun 24, 2007 at 09:43:03PM -0700, Arjan van de Ven wrote: > > - something to do with aux vector headers > > the primary goal is to pass a random value to userspace at process > start; this to save glibc from having to open /dev/urandom on ever > program start (which it does now for all apps compiled with > -fstack-protector, which in various distros is "everything"). There are 2 ways to compile -fstack-protector supporting glibc actually, only one opens /dev/urandom on every program initialization, the other computes the stack guard from some bits of the stack address (so indirectly depends on get_random_int() in stack randomization). Nevertheless, having one random long (32-bit for 32-bit arches, 64-bit otherwise) in aux vector would be useful. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND] PIE randomization
On Sat, Jul 07, 2007 at 02:13:01AM +0200, Jiri Kosina wrote: > On Thu, 5 Jul 2007, Rik van Riel wrote: > > > So the original patch has: > > #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) > > For some reason(?) it got changed to the clearly buggy: > > #define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK) > > Jiri's patch undoes that second buggy define, which is very > > different from the original that was sent in by you and Ernie. > > This is a part of execshield patch, fthe pie-compiled binary executable > memory layout randomization was extracted from - see > http://people.redhat.com/~mingo/exec-shield/exec-shield-nx-2.6.19.patch > > Note that load_elf_interp() in vanilla kernel differs from the > execshield's (and pie-randomization.patch) version. > > The fix makes the BAD_ADDR check whether the address belongs to the > ERR_PTR range, which seems valid for all uses of BAD_ADDR in the patched > binfmt_elf.c (do_brk(), elf_map(), do_mmap() etc return valid address or > err ptr) ... am I missing something obvious here? I believe BAD_ADDR macro was changes from ((unsigned long)(x) >= TASK_SIZE) (which is the right test for invalid user addresses, stronger check than >= PAGE_MASK) to >= PAGE_MASK only because of the one check of the return value of load_elf_interp. All other uses of BAD_ADDR macro are either on userland addresses (what do_mmap, elf_map, do_brk etc. return; where TASK_SIZE or more is certainly wrong) or in one case still on unbiased ELF p_vaddr: if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz || in load_elf_binary (where >= TASK_SIZE check is ok too). So perhaps doing this instead of changing BAD_ADDR to IS_ERR_VAL might be better: Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]> --- linux/fs/binfmt_elf.c 2007-06-08 21:53:45.0 +0200 +++ linux/fs/binfmt_elf.c 2007-07-07 14:19:14.0 +0200 @@ -80,7 +80,7 @@ static struct linux_binfmt elf_format = .hasvdso= 1 }; -#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK) +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) static int set_brk(unsigned long start, unsigned long end) { @@ -1015,7 +1015,7 @@ static int load_elf_binary(struct linux_ interpreter, &interp_map_addr, load_bias); - if (!BAD_ADDR(elf_entry)) { + if (!IS_ERR((void *)elf_entry)) { /* load_elf_interp() returns relocation adjustment */ interp_load_addr = elf_entry; elf_entry += loc->interp_elf_ex.e_entry; Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND] PIE randomization
On Mon, Jul 09, 2007 at 11:58:07PM +0200, Jiri Kosina wrote: > On Mon, 9 Jul 2007, Jiri Kosina wrote: > > [ ... ] > > > - if (!BAD_ADDR(elf_entry)) { > > > + if (!IS_ERR((void *)elf_entry)) { > > I agree that this is better solution. Andrew, this Jakub's patch should > > replace the pie-randomization-fix-bad_addr-macro.patch if possible. You > > can add > > as this raced :) with Andrew who already folded the > pie-randomization-fix-bad_addr-macro.patch into pie-randomization.patch, > do you think you could rebase this change against the current state of -mm > and resend it? Thanks, Here it is: Restore BAD_ADDR check strictness, use IS_ERR in the only place where the stricter BAD_ADDR can't work, as the value is a load bias rather than userland address. Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]> --- linux/fs/binfmt_elf.c 2007-07-10 11:39:29.0 +0200 +++ linux/fs/binfmt_elf.c 2007-07-10 11:41:03.0 +0200 @@ -80,7 +80,7 @@ static struct linux_binfmt elf_format = .hasvdso= 1 }; -#define BAD_ADDR(x) IS_ERR_VALUE(x) +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) static int set_brk(unsigned long start, unsigned long end) { @@ -1005,7 +1005,7 @@ static int load_elf_binary(struct linux_ interpreter, &interp_map_addr, load_bias); - if (!BAD_ADDR(elf_entry)) { + if (!IS_ERR((void *)elf_entry)) { /* * load_elf_interp() returns relocation * adjustment Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce O_CLOEXEC (take >2)
On Thu, May 31, 2007 at 11:46:31AM -0700, Davide Libenzi wrote: > On Thu, 31 May 2007, Ulrich Drepper wrote: > > Davide Libenzi wrote: > > > Isn't this better be a global process flag? Default should be, for legacy > > > reasons, > > > > No. Policies are always wrong since it means code that cannot change > > the policy (e.g, all runtime libraries) have no access to the > > functionality. I cannot set the policy to default to close-on-exit in > > glibc all the while the application assumes this is not the case. > > I was talking for a broader usage, not only glibc centric. Most ppl > writing MT+exec apps wants all but (eventually) and handfull of files > leaking across the exec boundary. If open (and all other syscalls that create fds) have O_CLOEXEC (and something similar for other syscalls), then such a policy can be easily implemented on the userland, if desired. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_CLOEXEC: An alternate proposal
On Fri, Jun 08, 2007 at 03:47:12AM -0400, Daniel Colascione wrote: > Hey, this is my first post to linux-kernel, so please be kind. :-) > > Linus Torvalds wrote on May 31: > > I'm with Uli on this one. "Stateful" stuff is bad. It's essentially > > impossible to handle with libraries - either the library would have to > > explciitly always turn the state the way _it_ needs it, or the library > > will do the wrogn thing. > > I agree that stateful stuff is generally not very elegant, > but I think it's a win here -- we wouldn't have to create any > new APIs except for the state-setting stuff. > > The state just has to be thread-local. > > If it's thread-local, a library, say, glibc, > can use code like this: > > /* Internal library function */ > old_fd_flags = kernel_default_fd_flags(FD_CLOEXEC | FD_RANDFD); > event_fd = super_duper_event_polling_mechanism_fd(); > kernel_default_fd_flags(old_fd_flags); It is not a win, what if a signal comes in between the two kernel_default_fd_flags syscalls? open and other functions are async signal safe and programs will be certainly upset if suddenly the syscalls in the signal handler start to behave differently depending on which exact code the async signal has interrupted. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Userspace compiler support of "long long"
On Thu, Jun 28, 2007 at 07:53:51AM -0400, Kyle Moffett wrote: > On Jun 27, 2007, at 23:57:54, Matthew Wilcox wrote: > >On Wed, Jun 27, 2007 at 06:30:52PM -0400, Kyle Moffett wrote: > >>Then all 64-bit archs have: > >>typedef signed long __s64; > >>typedef unsigned long __u64; > >> > >>While all 32-bit archs have: > >>typedef signed long long __s64; > >>typedef unsigned long long __u64; > > > >include/asm-parisc/types.h:typedef unsigned long long __u64; > > > >For both 32 and 64-bit. > > > >include/asm-sh64/types.h:typedef unsigned long long __u64; > >include/asm-x86_64/types.h:typedef unsigned long long __u64; > > > >So that's three architectures that violate your first assertion. > > Oh, ok, that makes it even easier to say this with certainty: > Changing the other 64-bit archs to use "long long" for their 64-bit > numbers will not cause additional warnings. I'm also almost certain > there are no architectures which use "long long" for 128-bit > integers. (Moreover, I can't find hardly anything which does 128-bit > integers at all). unsigned long and unsigned long long have the same size, precision and alignment on all LP64 arches, that's true. But they have different ranks and more importantly they mangle differently in C++. So, whether some user exposed type uses unsigned long or unsigned long long is part of the ABI, whether that's size_t, uintptr_t, uint64_t, u_int64_t or any other type, you can't change it without breaking the ABI. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND] PIE randomization
On Wed, May 23, 2007 at 10:50:24AM +0200, Jiri Kosina wrote: > From: Jan Kratochvil <[EMAIL PROTECTED]> > > This patch is using mmap()'s randomization functionality in such a way > that it maps the main executable of (specially compiled/linked -pie/-fpie) > ET_DYN binaries onto a random address (in cases in which mmap() is allowed > to perform a randomization). > > Origin of this patch is in exec-shield > (http://people.redhat.com/mingo/exec-shield/) > > Signed-off-by: Jan Kratochvil <[EMAIL PROTECTED]> > Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> > Cc: Ingo Molnar <[EMAIL PROTECTED]> > Cc: Roland McGrath <[EMAIL PROTECTED]> > Cc: Jakub Jelinek <[EMAIL PROTECTED]> > -#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) > +#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK) ... > @@ -442,8 +491,7 @@ static unsigned long load_elf_interp(str > goto out_close; > } > > - *interp_load_addr = load_addr; > - error = ((unsigned long)interp_elf_ex->e_entry) + load_addr; > + error = load_addr; ... > if (elf_interpreter) { > - if (interpreter_type == INTERPRETER_AOUT) > + if (interpreter_type == INTERPRETER_AOUT) { > elf_entry = load_aout_interp(&loc->interp_ex, >interpreter); > - else > + } else { > + unsigned long interp_map_addr; /* unused */ > + > elf_entry = load_elf_interp(&loc->interp_elf_ex, > interpreter, > - &interp_load_addr); > + &interp_map_addr, > + load_bias); > + if (!BAD_ADDR(elf_entry)) { > + /* > + * load_elf_interp() returns relocation > + * adjustment > + */ > + interp_load_addr = elf_entry; > + elf_entry += loc->interp_elf_ex.e_entry; > + } > + } > if (BAD_ADDR(elf_entry)) { > force_sig(SIGSEGV, current); > retval = IS_ERR((void *)elf_entry) ? The above highlighted changes are the cause of random segfaults of PIE binaries. See https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=246623 The problem is if ld.so is prelinked to some address in the area where the kernel actually maps it, particularly if elf_map in load_elf_interp returns an address one page below its first PT_LOAD segments vaddr. Then load_addr (it is a load bias actually) returned from load_elf_interp is 0xf000 (on 32-bit kernels) and BAD_ADDR are all addresses >= 0xf000 (on i?86). The fix should be either changing the definition of BAD_ADDR to e.g. IS_ERR_VALUE(x), or at least changing the if (!BAD_ADDR(elf_entry)) { above to if (!IS_ERR_VALUE(elf_entry)) {, the second BAD_ADDR can already stay, because at that place elf_entry is no longer a bias (difference between actual and preferred load address), but an actual address, where very high addresses are of course invalid. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
On Tue, Mar 13, 2007 at 01:02:44PM +0100, Eric Dumazet wrote: > On Tuesday 13 March 2007 12:42, Andrea Arcangeli wrote: > > > My wild guess is that they're allocating memory after taking > > futexes. If they do, something like this will happen: > > > > taskA taskB taskC > > user lock > > mmap_sem lock > > mmap sem -> schedule > > user lock -> schedule > > > > If taskB wouldn't be there triggering more random trashing over the > > mmap_sem, the lock holder wouldn't wait and task C wouldn't wait too. > > > > I suspect the real fix is not to allocate memory or to run other > > expensive syscalls that can block inside the futex critical sections... > > glibc malloc uses arenas, and trylock() only. It should not block because if > an arena is already locked, thread automatically chose another arena, and > might create a new one if necessary. Well, only when allocating it uses trylock, free uses normal lock. glibc malloc will by default use the same arena for all threads, only when it sees contention during allocation it gives different threads different arenas. So, e.g. if mysql did all allocations while holding some global heap lock (thus glibc wouldn't see any contention on allocation), but freeing would be done outside of application's critical section, you would see contention on main arena's lock in the free path. Calling malloc_stats (); from e.g. atexit handler could give interesting details, especially if you recompile glibc malloc with -DTHREAD_STATS=1. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/