[PATCH] perf: remove duplicate block from Makefile

2013-10-05 Thread Ulrich Drepper
This looks like a merge error, the code is duplicated with the first
copy doing something else as well.  Just remove the second block.

Signed-off-by: Ulrich Drepper 

 Makefile |8 
 1 file changed, 8 deletions(-)


Index: perf/config/Makefile
===
--- perf.orig/config/Makefile
+++ perf/config/Makefile
@@ -200,14 +200,6 @@ endif # NO_DWARF
 
 endif # NO_LIBELF
 
-ifndef NO_LIBELF
-CFLAGS += -DLIBELF_SUPPORT
-FLAGS_LIBELF=$(CFLAGS) $(LDFLAGS) $(EXTLIBS)
-ifeq ($(call try-cc,$(SOURCE_ELF_MMAP),$(FLAGS_LIBELF),-DLIBELF_MMAP),y)
-  CFLAGS += -DLIBELF_MMAP
-endif # try-cc
-endif # NO_LIBELF
-
 # There's only x86 (both 32 and 64) support for CFI unwind so far
 ifneq ($(ARCH),x86)
   NO_LIBUNWIND := 1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] apparently broken RLIMIT_CORE

2013-10-06 Thread Ulrich Drepper
On Sun, Oct 6, 2013 at 4:42 PM, Linus Torvalds
 wrote:
> I doubt it is intentional, but I also cannot really feel that we care
> deeply. Afaik we don't really honor the size limit exactly anyway, ie
> we tend to check only at page boundaries etc. So do we really care?

I could imagine in the case Al brought up (a pipe as core file filter)
we might want to have some assurance the limits are not breached.  If
it doesn't cost that much I'd say implement it precisely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sendfile and EAGAIN

2013-03-02 Thread Ulrich Drepper
On Mon, Feb 25, 2013 at 2:22 PM, Eric Dumazet  wrote:
> I don't understand the issue.
>
> sendfile() returns -EAGAIN only if no bytes were copied to the socket.

There is something wrong/unexpected/...

I have a program which can use either sendfile or send.  When using
sendfile to transmit a large block (I've seen it with 900k) the
sendfile call does not transmit everything.  There receiver gets only
about 600k.  This is the situation when I think I've seen EAGAIN
errors from sendmail but I cannot just now reproduce it.  This is with
sockets of AF_UNIX type.

Are there any limits to take into account?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sendfile and EAGAIN

2013-03-02 Thread Ulrich Drepper
On Sat, Mar 2, 2013 at 10:09 PM, Eric Dumazet  wrote:
>
> Using non blocking IO means the sender (and the receiver) must be able
> to perform several operations, as long as the whole transfert is not
> finished.

Certainly, and this is implemented.  But the receiver never gets the
rest of the data while the sender (most of the time) gets notified
that everything is sent.

I don't have a reduced test case yet.  Hopefully I'll get to it
sometime soon.  For now I worked around it by not using sendfile.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] perf: use XSI-complaint version of strerror_r() instead of GNU-specific

2012-07-23 Thread Ulrich Drepper
On Mon, Jul 23, 2012 at 11:00 AM, Kirill A. Shutemov
 wrote:
> The right way to fix it is to switch to XSI-compliant version.

And why exactly would this be "the right way"?  Just fix the use of
strerror_r or use strerror_l.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] perf: use XSI-complaint version of strerror_r() instead of GNU-specific

2012-07-23 Thread Ulrich Drepper
On Mon, Jul 23, 2012 at 4:31 PM, Kirill A. Shutemov
 wrote:
> +   const char *err = strerror_r(errnum, buf, buflen);
> +
> +   if (err != buf && buflen > 0) {
> +   size_t len = strlen(err);
> +   char *c = mempcpy(buf, err, min(buflen - 1, len));
> +   *c = '\0';
> +   }

No need to check for err == NULL.   buflen == 0 is a possibility given
the interface but I'd say this is an error and should be tested for at
the beginning of the function and the call should fail or even abort
the program.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] perf: use XSI-complaint version of strerror_r() instead of GNU-specific

2012-07-23 Thread Ulrich Drepper
On Mon, Jul 23, 2012 at 5:06 PM, Kirill A. Shutemov
 wrote:
> They are bugs.
>
> Let's fix strerror_r() usage.
>
> Signed-off-by: Kirill A. Shutemov 

Acked-by: Ulrich Drepper 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] perf tool: Adding ratios support

2013-01-16 Thread Ulrich Drepper
On Tue, Jan 15, 2013 at 8:39 AM, Jiri Olsa  wrote:
>   $ perf stat -f formula.conf:cpi kill
>   usage: kill [ -s signal | -p ] [ -a ] pid ...
>  kill -l [ signal ]

I do like this proposal.  The only comment I have is that perhaps the
command line syntax isn't ideal.  What you use above is tied to the
ratios be defined in the config file.  I would imagine that at least
over time (for some ratios probably right away) they become available
by default and don't require a config file.  Also, users might want to
put individualized ratio definitions in a config file which is read by
default.

How about the formulas becoming available whenever the config file is
read.  Maybe this means a few more keywords in the config file (ratio,
ratio-set, ...).  E.g.:

ratio-set branch {
  events = {instructions,branch-instructions,branch-misses}:u

  ratio branch-rate {
  formula = branch-instructions / instructions
  desc = branch rate
  }

  ratio branch-miss-rate {
  formula = branch-misses / instructions
  desc = branch misprediction rate
  }

  ratio branch-miss-ratio{
  formula = branch-misses / branch-instructions
  desc = branch misprediction ratio
  }
  }

You get the idea.  Maybe substitute "ratio":with "formula". Then allow
such a ratio/formula to be used just like a normal event, perhaps with
a special suffix/prefix to designate it.  This should then also mark
the events as part of a group so that the underlying counters are
scheduled in together.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] perf tool: Adding ratios support

2013-01-16 Thread Ulrich Drepper
On Wed, Jan 16, 2013 at 9:25 AM, Jiri Olsa  wrote:
> I was thinking having config files (global and arch specific)
> comming with perf having predefined formulas.

All the more reason to not mention the file name or really any source
for the definition of the formula in the name,


> 1)  -e 'ratio/branch-rate/'  # special event class
> 2)  -e 'ratio-branch-rate'   # 'ratio-' prefix
> 3)  -e cpu/branch-rate/  # handled like aliases, ratio name would need to 
> be unique
>   ... ?

I think 3 is the most extensible.  Perhaps use the syntax used in
other places.  We have these :u suffixes etc.  Perhaps have :r or :R
or whatever.

Given the other comments, we might want to avoid right away "ratio".
If the mechanism is generalized it could be used to express "counter1
- counter2" for events which cannot be expressed with a single counter
but are not really ratios.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: struct stat{st_blksize} for /dev entries in 2.4.3

2001-04-08 Thread Ulrich Drepper

Russell Coker <[EMAIL PROTECTED]> writes:

> diff -ru textutils-2.0/src/cat.c textutils-new/src/cat.c
> --- textutils-2.0/src/cat.c Sun Apr  8 22:55:10 2001
> +++ textutils-new/src/cat.c Sun Apr  8 23:23:54 2001
> @@ -790,6 +790,9 @@
>if (options == 0)
> {
>   insize = max (insize, outsize);
> +#ifdef _SC_PHYS_PAGES
> + insize = max (insize, sysconf(_SC_PAGESIZE));
> +#endif
>   inbuf = (unsigned char *) xmalloc (insize);
>  
>   simple_cat (inbuf, insize);

The #ifdef is certainly wrong.  And there is no guarantee that any of
the _SC_* constants are defined as macros.  You'll have to add a
configure test.

-- 
---.          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: List of all-zero .data variables in linux-2.4.3 available

2001-04-12 Thread Ulrich Drepper

"Adam J. Richter" <[EMAIL PROTECTED]> writes:

> >Shouldn't a compiler be able to deal with this instead?
> 
>   Yes.

No.  gcc must not do this.  There are situations where you must place
a zero-initialized variable in .data.  It is a programmer problem.

-- 
---.      ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: List of all-zero .data variables in linux-2.4.3 available

2001-04-12 Thread Ulrich Drepper

"Adam J. Richter" <[EMAIL PROTECTED]> writes:

>   I am aware of a couple of cases where code relied on static
> variables being allocated contiguously, but, in both cases, those
> variables were either all zeros or all non-zeros, so my proposed
> change would not break such code.

Continuous placement is not the only property defined by
initialization.  There are many more.  You cannot change this since it
will quite a few programs and libraries and subtle and hard to
impossible to identify ways.  Simply educate programmers to not
initialize.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

2001-04-13 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> spawn() is trivial to implement if you want to. I don't think it's all
> that much more interesting than vfork()+execve(), though.

spawn() (actually posix_spawn) is currently implemented in the libc.
If anybody for whatever reason thinks it is necessary to implement
this in the kernel look at the interface.  It is really only
interesting for systems with limited VMs but it would be trivial to
add another flag which allow different scheduling characteristics
which some people apparently want.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-18 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

Sounds good so far.  Some comments.

>  - FS_create is responsible for allocating a shared memory region
>at "FS_create()" time.

This is not so great.  The POSIX shared semaphores require that an
pthread_mutex_t object placed in a shared memory region can be
initialized to work across process boundaries.  I.e., the FS_create
function would actually be FS_init.  There is no problem with the
kernel or the helper code at user level allocating more storage (for
the waitlist of whatever) but it must not be necessary for the user to
know about them and place them in share memory themselves.

The situation for non-shared (i.e. intra-process) semaphores are
easier.  What I didn't understand is your remark about fork.  The
semaphores should be cloned.  Unless the shared flag is set there
should be no sharing among processes.


The rest seems OK.  Thanks,

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> Looks good to me. Anybody want to try this out and test some benchmarks?

I fail to see how this works across processes.  How can you generate a
file descriptor for this pipe in a second process which simply shares
some memory with the first one?  The first process is passive: no file
descriptor passing must be necessary.

How these things are working elsewhere is that a memory address
(probably a physical address) is used as a token.  The semaphore
object is placed in the memory shared by the processes and the virtual
address is passed in the syscall.

Note that semaphores need not always be shared between processes.
This is a property the user has to choose.  So the implementation can
be easier in the normal intra-process case.

In any case all kinds of user-level operations are possible as well
and all the schemes suggested for dealing with the common case without
syscalls can be applied here as well.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox <[EMAIL PROTECTED]> writes:

> > can libraries use fast semaphores behind the back of the user? They might
> > well want to use the semaphores exactly for things like memory allocator
> > locking etc. But libc certainly cant use fd's behind peoples backs.
> 
> libc is entitled to, and most definitely does exactly that. Take a look at
> things like gethostent, getpwent etc etc.

You are mixing two completely different things.

Functions like gethostent() and catopen() are explicitly allowed to be
implemented using file descriptors.  If this is allowed the standard
contains appropriate wording.

Other functions like setlocale() do use file descriptors, yes, but
these are not kept.  Before the function returns they are closed.
This can cause disruptions in other threads which find descriptors not
allocated sequentially but this has to be taken into account.  Rules
for multi-threaded applications are different.  A single-threaded
application will not see such a difference.

Now, the standards do not allow POSIX mutexes to be implemented using
file descriptors.  The same is true for unnamed POSIX semaphores.  So
Linus is right, though for a different reason than he thought.

The situation is a bit different for named POSIX semaphores.  These
can be implemented using file descriptors.  But they don't have to and
IMO they shouldn't.  A memory reference based semaphore implementation
would allow a named semaphore to be implemented using

  fd = open (name)
  addr = mmap (..fd..)
  close (fd)
  sem_syscall (addr)

i.e., it can be mapped to a memory reference again.

-- 
---.      ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox <[EMAIL PROTECTED]> writes:

> mknod foo p. Or use sockets (although AF_UNIX sockets are higher latency)
> Thats why I suggested using flock - its name based. Whether you mkstemp()
> stuff and pass it around isnt something I care about
> 
> Files give you permissions for free too

I don't want nor need file permissions.  A program would look like this:


  process 1:


   fd = open("somefile")
   addr = mmap(fd);
   
   pthread_mutexattr_init(&attr);
   pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);

   pthread_mutex_init ((pthread_mutex_t *) addr, &attr);

   pthread_mutex_lock ((pthread_mutex_t *) addr);

   pthread_mutex_destroy((pthread_mutex_t *) addr);

  process 2:

   fd = open("somefile")
   addr = mmap(fd);

   pthread_mutex_lock ((pthread_mutex_t *) addr);


The shared mem segment can be retrieved in whatever way.  The mutex in
this case is anonymous.  Everybody who has access to the shared mem
*must* have access to the mutex.


For semaphores it looks similarly.  First the anonymous case:

 process 1:


   fd = open("somefile")
   addr = mmap(fd);

   sem_init ((sem_t *) addr, 1, 10);// 10 is arbitrary

   sem_wait ((sem_t *) addr);

   sem_destroy((sem_t *) addr);


  process 2:

   fd = open("somefile")
   addr = mmap(fd);

   sem_wait ((sem_t *) addr);

Note that POSIX semaphores could be implemented with global POSIX
mutexes.


Finally, named semaphores:

   semp = sem_open("somefile", O_CREAT|O_EXCL, 0600)

   sem_wait (semp);

   sem_close(semp);
   sem_unlink(semp);


This is the only semaphore kind which maps nicely to a pipe or socket.
All the others don't.  And even for named semaphores it is best to
have a separate name space like the shmfs.

> So you have unix file permissions on them ?

See above.  Permissions are only allowed for named semaphores.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox <[EMAIL PROTECTED]> writes:

> > I don't want nor need file permissions.  A program would look like this:
> 
> Your example opens/mmaps so has file permissions. Which is what I was asking

There are no permissions on the mutex object.  It is the shared memory
which counts.  If you would implement the global mutexes as
independent objects in the filesystem hierarchy you would somehow
magically make the permissions match those of the object containing
the memory representation of the global semaphore.


   fd = open("somefile", O_CREAT|O_TRUNC, 0666)
   addr=mmap(fd)
   // assume attr is for a global mutex
   pthread_mutex_init((pthread_mutex_t*)addr, &attr)
   fchmod(fd, 0600)
   fchown(fd, someuser, somegroup)

If pthread_mutex_attr() is allocating some kind of file, how do you
determine the permissions?  How are they changed if the permissions to
the file change?

The kernel representation of the mutex must not be disassociated from
the shared memory region.

Even if you all think very little about Solaris, look at the kernel
interface for semaphores.

-- 
---.      ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Ingo Oeser <[EMAIL PROTECTED]> writes:

> Are you sure, you can implement SMP-safe, atomic operations (which you need
> for all up()/down() in user space) WITHOUT using privileged
> instructions on ALL archs Linux supports?

Which processors have no such instructions but are SMP-capable?

> How do we do this on nccNUMA machines later? How on clusters[1]?

Clusters are not my problem.  They require additional software.  And
NUMA machines maybe be requiring a certain sequence in which the
operations must be performed and the hardware should take care of the
rest.


I don't really care what the final implementation will be like.  For
UP and SMP machines I definitely want to have as much as possible at
user-level.  If you need a special libpthread for NUMA machines, so be
it.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> > I fail to see how this works across processes.
> 
> It's up to FS_create() to create whatever shared mapping is needed.

No, the point is that FS_create is *not* the one creating the shared
mapping.  The user is explicitly doing this her/himself.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> I'm not interested in re-creating the idiocies of Sys IPC.

I'm not talking about sysv semaphores (couldn't care less).  And you
haven't read any of the mails with examples I sent.

If the new interface can be useful for anything it must allow to
implement process-shared POSIX mutexes.  The user-level representation
of these mutexes are simple variables which in the case of
inter-process mutexes are placed in shared memory.  These variables
must be usable with the normal pthread_mutex_lock() functions and
perform whatever is needed.

Whether the pthread_mutex_init() function for shared mutexes is doing
a lot more work and allocates even more memory, I don't care.  The
standard certainly permits this and every pthread_mutex_init() must
have a pthread_mutex_destroy() which allows allocating and freeing
resources (no file descriptor, though).  So, yes, your FS_create
syscall can allocate something.

But the question is what handle to put in the pthread_mutex_t variable
so the different processes can use the mutex.  It cannot be a file
descriptor since it's not shared between processes.  It cannot be a
pointer to some other place in the virtual memory since the place
pointed to might not be (and probably isn't if FS_create is allocating
something in the process setting up the mutex).  You could put some
magic cookie in the pthread_mutex_t object the kernel can then use.


So, instead of repeating over and over again the same old story, fill
in the gaps here:


  int
  pthread_mutex_init (pthread_mutex_t *mutex,
  const pthread_mutexattr_t *mutex_attr)
  {
if (mutex_attr != NULL && mutex_attr->__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }

  int
  pthread_mutex_lock (pthread_mutex_t *mutex)
  {
if (mutex_attr != NULL && mutex_attr->__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }

  int
  pthread_mutex_destroy (pthread_mutex_t *mutex)
  {
if (mutex_attr != NULL && mutex_attr->__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }


These functions must work with something like this:

~ cons.c ~~
#include 
#include 
#include 
#include 
#include 

int
main (int argc, char *argv[])
{
  char tmpl[] = "/tmp/fooXX";
  int fd = mkstemp (tmpl);
  pthread_mutexattr_t attr;
  pthread_mutex_t *m1;
  pthread_mutex_t *m2;
  void *addr;
  volatile int *i;

  pthread_mutexattr_init (&attr);
  pthread_mutexattr_setpshared (&attr, PTHREAD_PROCESS_SHARED);

  ftruncate (fd, 2 * sizeof (*m1) + sizeof (int));
  addr = mmap (NULL, sizeof (*m1), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  m1 = addr;
  m2 = m1 + 1;
  i = (int *) (m2 + 1);
  *i = 0;

  pthread_mutex_init (m1, &attr);
  pthread_mutex_lock (m1);

  pthread_mutex_init (m2, &attr);
  pthread_mutex_lock (m2);

  if (fork () == 0)
{
  char buf[10];
  snprintf (buf, sizeof buf, "%d", fd);
  execl ("./prod", "prod", buf, NULL);
}

  while (1)
{
  pthread_mutex_lock (m1);
  printf ("*i = %d\n", *i);
  pthread_mutex_unlock (m2);
}

  return 0;
}
~~prod.c ~~
#include 
#include 
#include 
#include 
#include 

int
main (int argc, char *argv[])
{
  int fd = atoi (argv[1]);
  void *addr;
  pthread_mutex_t *m1;
  pthread_mutex_t *m2;
  volatile int *i;

  addr = mmap (NULL, sizeof (*m1), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  m1 = addr;
  m2 = m1 + 1;
  i = (int *) (m2 + 1);

  while (1)
{
  ++*i;
  pthread_mutex_unlock (m1);
  pthread_mutex_lock (m2);
}

  return 0;
}
~~~~~~~~~~~

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [linux-lvm] 2.4.3-ac{6,7} LVM hang

2001-04-19 Thread Ulrich Drepper

Jens Axboe <[EMAIL PROTECTED]> writes:

> Does attached patch fix it?

Yes.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alexander Viro <[EMAIL PROTECTED]> writes:

> > If the new interface can be useful for anything it must allow to
> > implement process-shared POSIX mutexes.
> 
> Pardon me the bluntness, but... Why?

Because otherwise there is no reason to even waste a second with this.
At least for me and everybody else who has interest in portable solutions.

I don't care how it's implemented.  Look at the code example I posted.
If you can provide an implementation which can implement anonymous
inter-process mutexes then ring again.  Until then I'll wait.  If you
implement something else I couldn't care less since it's useless for
me.

-- 
---.          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Ulrich Drepper

"Richard B. Johnson" <[EMAIL PROTECTED]> writes:

> If it "fixes" it, there is no problem with the FPU, but with the
> 'C' runtime library which doesn't initialize the FPU to a known
> state before it uses it.

It's the kernel which initializes the FPU.  This was always the case
and necessary to implement the fast lazy FPU saving/restoring.
Processes which never use the FPU never initialize it.

-- 
---.          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Ulrich Drepper

"Richard B. Johnson" <[EMAIL PROTECTED]> writes:

> The kernel doesn't know if a process is going to use the FPU when
> a new process is created. Only the user's code, i.e., the 'C' runtime
> library knows.

Maybe you should try to understand the kernel code and the features of
the processor first.  The kernel can detect when the FPU is used for
the first time.

-- 
---.          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: RFC: changing precision control setting in initial FPU context

2001-03-03 Thread Ulrich Drepper

[EMAIL PROTECTED] (Kevin Buhr) writes:

> > You want peoples existing applications to suddenely and magically change
> > their results. Umm problem.
> 
> So, how would you feel about a mechanism whereby the kernel could be
> passed a default FPU control word by the binary (with old binaries, by
> default,

There will be no change whatsoever with me.  The existing ABI is
fixed.  If you want your programs to behave different set the mode
appropriately.  I have not the slightest interest in seeing
applications (including the libc) being broken just because of this
stupid idea.  No kernel and no libc modifications necessary.  This is
the end of the story as far as I'm concerned.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] new bug report script

2001-01-06 Thread Ulrich Drepper

Matthias Juchem <[EMAIL PROTECTED]> writes:

> +# c library 5
> +if ( -e "/lib/libc.so.5" ) {
> + ( $v_libc5 = `/lib/libc.so.5`) =~ m/GNU C Library .+ version (\S+),/;
> + $v_libc5 = $1;
> +} else {
> + $v_libc5 = "not found";
> +}

This is wrong.  You cannot execute libc.so.5.  This only works with
glibc.

-- 
---.          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] new bug report script

2001-01-07 Thread Ulrich Drepper

Matthias Juchem <[EMAIL PROTECTED]> writes:

> Or is the file name scheme reliable (/lib/libc.so.5.x.y)?

Yes, since this was how HJ named the releases.  You have to find out
which version is actually used (there might be several .so files
there).

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: [PATCH] new bug report script

2001-01-07 Thread Ulrich Drepper

Matthias Juchem <[EMAIL PROTECTED]> writes:

>  # c library 5
> -if ( -e "/lib/libc.so.5" ) {
> - ( $v_libc5 = `/lib/libc.so.5`) =~ m/GNU C Library .+ version (\S+),/;
> - $v_libc5 = $1;
> -} else {
> - $v_libc5 = "not found";
> +opendir LIBDIR, "/lib" or die "/lib/ not found, very strange";
> +my @allfiles = readdir LIBDIR;
> +closedir LIBDIR;
> +$v_libc5 = 'not found';
> +foreach (sort @allfiles) {
> + m/libc.so.(5\S+)/ and $v_libc5 = $1;
>  }
> +closedir LIBDIR;

This won't work everywhere either.  Red Hat systems (maybe others)
have libc5 out of the way in a separate subdir.  Your best bet is to
use ldconfig:

  /sbin/ldconfig -p|grep libc.so.5

which produces something like

  libc.so.5 (libc5) => /usr/i486-linux-libc5/lib/libc.so.5

and then look in that directory (/usr/i486-linux-libc5/lib).

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] new bug report script

2001-01-07 Thread Ulrich Drepper

Keith Owens <[EMAIL PROTECTED]> writes:

>   5)
>   # glibc versions.  Take the last symbolic link,
>   # extract the version number from the file it points to.
>   if [ `expr "X$1" : 'Xl'` -eq 2 ]
>   then
>   while [ "X$2" != "X->" ]
>   do
>   shift
>   done
>   version2=`echo "$3" | tr -cd '[.0-9]' | \
> sed -e 's/\.\.*/./g' |
> sed -e 's/^\.//g' |
> sed -e 's/\.$//g'`
>   fi
>   ;;

Why don't you, as the other script suggested, execute libc.so.6?
Symlinks can be missing or can be wrong.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



thread group comments

2000-09-01 Thread Ulrich Drepper

I hoped somebody else would write something about Linus' test8-pre1
thread group changes but I haven't seen anything.  Now you have to
bear with me even though I'm incompetent[1].

I took a look at the code and thought about the changes
necessary/possible in the thread library.  Here's what I came up with
so far:


1st Problem:   One signal handler process process-wide

What is handled correctly now is sending signals to the group.  Also
that every thread has its mask.  But there must be exactly one signal
handler installed.  I.e., a sigaction() call to set a new handler has
consequences process-wide.  Since this muse be atomic I think the
information should be kept in the thread group leader's data
structures and the other threads should simply use this information
all the time.  Yeah, I know, one more indirection.



2nd Problem: Fatal signal handling

kernel/signal.c contains:

 * Send a thread-group-wide signal.
 *
 * Rule: SIGSTOP and SIGKILL get delivered to _everybody_.

That's OK.  Except that is a signal whose default action is to
terminate the process is not caught be the application, this signal is
also handled process-wide.  E.g., if there is no SIGSEGV handler the
whole process is terminated.

This will have to go hand in hand with an extension of the core file
format to include information about all threads but for the time being
it's enough if only the offending thread is dumped and the rest simply
killed.


3rd Problem: one uid/gid process-wide

All the ID (uid/guid/euid/egid/...) must be process wide.  The problem
is similar to the signal handler.  I think one should again keep the
information exclusively in the master thread and have all others refer
to this information.



4th Problem: thread termination

In general, thread termination is not of much interest for the rest of
the system.  It is in the moment but if the fatal signal handling is
done this will change.

If a thread gets a fatal signal, the whole process is killed.  No
cleanup necessary.  Signal handlers can be installed if necessary.

If a thread terminates naturally and can perform the cleanup itself.

In any case, the death signal should be ignored.  Except for the last
thread, of course, which has to notify the process starting the MT
application.

I can see two possible solutions, neither of which I've tried:

- the termination signal given to clone calls is 0 (zero).  So no
  notification is sent out.  Question is: does the kernel allow this?

- the kernel ignores the SIGCHLD signal for all threads except the last
  one

In any case is there the problem how to handle the termination of the
master thread.  If it is not aware of starting and terminating threads
I could imagine some user-level mechanisms to make this work but they
are not very clean (it involves changing the death signal in the
thread depending on the situation).  If there is something people
think the kernel could do this would be probably better.


5th Problem: suspended starting

Related to the last problem a good old friend pops up.  Depending on
the solution of the last problem it might be necessary to add
suspended starting of threads.  The problem is that sometimes the
starter has to modify parameters (e.g., scheduler) of the newly
started thread before it can actually start working.  If this fails,
the new thread must be terminated immediately.  But who will get the
termination signal?  The data structures for the new thread must be
removed as well and this after the new thread is guaranteed to be
vanished.

Anyway, I still think it's not even worth discussing this much since
the whole change to implement this is only a few lines.  And it's in
no fastpath.



I might have more if I get deeper into implementation details.  But if
the above problems could be fixed we'd be a long way down the read to
a good implementation.


[1] Since Linus says so it must be true.

-- 
---.          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: thread group comments

2000-09-01 Thread Ulrich Drepper

Alan Cox <[EMAIL PROTECTED]> writes:

> You dont want it in kernel space.

I don't see how you can do this.  Also on user level you would have to
do this atomically since otherwise communication between the threads
isn't possible anymore.  We have a PR in the glibc bug data base about
just that.  And I know that there are many other users with this
problem.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: thread group comments

2000-09-01 Thread Ulrich Drepper

"Andi Kleen" <[EMAIL PROTECTED]> writes:

> I've been thinking about how to best get rid of the thread manager for
> thread creation in LinuxThreads.  It is currently needed to do the wait.

If you get rid of the manager thread (the +1 thread) then you have
another problem: you cannot send a signal explicitly to this thread
(to implement pthread_kill).  The PID of this initial thread is now
used as the PID of the thread group.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: thread group comments

2000-09-01 Thread Ulrich Drepper

"Theodore Y. Ts'o" <[EMAIL PROTECTED]> writes:

> True, but this can be handled by having the master thread process catch
> SIGSEGV and redistributing the signal to all of its child-threads.

No, it cannot.  We have to have a core dump with all threads.

> (The assumption I'm making here is that the master thread doesn't do
> anything except spawn all threads for the process and monitors its child
> processes for death.  This is the n+1 model.)

The master thread will not anymore spawn the threads.  That's the
whole purpose of this exercise.


-- 
---.          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: thread group comments

2000-09-01 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> But I'd much rather just have the "n+1" thing. The overhead is basically
> nonexistent, and it simplifies so many things.

I see no big problems with this either.  The only tricky thing is to
get the stack swapped after the first clone() but this is solvable.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: thread group comments

2000-09-01 Thread Ulrich Drepper

"Andi Kleen" <[EMAIL PROTECTED]> writes:

> Do you think the SA_NOCLDWAIT/queued exit signal approach makes sense ? 

I'm not sure whether it's worth the effort.  But I'm saying this now
looking at the code for another implementation following the 1:1 model.

In a second stage where we have m kernel threads and n user-level
threads (the ultimate goal) things might be different.  But this is
beyond what is needed in the 2.4 kernel so lets just skip the
SA_NOCLDWAIT stuff for now.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: thread group comments

2000-09-01 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> Well, I would just swap it _before_ the clone() - basically in the
> original parent when you create the new stack, you call clone() with the
> new stack and with the old stack as the argument. No?

Yes.  I have it basically working.  You have of course to swap before
the clone since the new thread will use the stack.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: thread group comments

2000-09-01 Thread Ulrich Drepper

"Andi Kleen" <[EMAIL PROTECTED]> writes:

> But I guess you don't want the context switch to a thread manager just to
> generate a thread ? (and which is one of the main causes of the bad thread 
> creation latency in Linux currently)

The thread manager, is I see it in the moment, will consist more or
less of this:

extern volatile int nthreads;
do
  waitpid (0, &res, __WCLONE)
while (nthreads > 0);
exit (WEXITSTATUS (res));

No signal handler, since it cannot receive signals.  Everything else
the threads will do themselves.

There is a problem though: the code we currently use for something
like restarting depends on the manager doing this.  This can be
implemented in two ways:

- send the manager a signal; this would require the threadkill() syscall
  already mentioned.  Note that we can assume RT signals and therefore
  can transport data.  But we get into problems if too many RT signals
  are queued.

- extend the loop above to something similar to what we have today:

do
  n = poll (..,..,.., timeout);
  check_for_dead_threads();  // use WNOHANG
  if (n > 0)
read request and process it
while (nthreads > 0)

  I really would like to avoid this.  It has the problems we are
  seeing today:

  * high latency of these requests
  * must adjust the priority of the manager (this now gets complicated
since it's not the manager which start the threads)
  * problems with changing UID/GID

It will require some investigation to see whether we can implement the
restart semantics correctly without a manager thread.  If yes, we
should be able to live with the simple loop.

-- 
---.      ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: thread group comments

2000-09-01 Thread Ulrich Drepper

"Andi Kleen" <[EMAIL PROTECTED]> writes:

> So you have a different way to implement pthread_create without context
> switch to the thread manager in 2.4 ? 

It should be possible to do these things with CLONE_PARENT.  It's a
long weekend coming up, let's see what I have next week.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Question regarding kernel_threads

2000-09-05 Thread Ulrich Drepper

[EMAIL PROTECTED] writes:

> I'm currently thinking of adding a PF_NOZOMBIE flag to the process
> flags which releases the process immeadiately instead of calling
> exit_notify in do_exit in exit.c

I think this should happen if the exit signal is zero.  At least I
would like to use it this way.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] threaded processes get stuck in rt_sigsuspend/fillonedir/exit_notify

2000-09-11 Thread Ulrich Drepper

David Ford <[EMAIL PROTECTED]> writes:

> On the recent test kernels, processes get stuck.  A kill -9 results in
> zombies.

The thread group changes broke the signal handling in linuxthreads.
The CLONE_SIGHAND is now also used to enable thread groups but since
linuxthreads already used CLONE_SIGHAND and is not prepared for thread
groups all hell breaks loose.

I've told Linus several times about this problems but he puts out one
test release after the other without this fixed.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] threaded processes get stuck in rt_sigsuspend/fillonedir/exit_notify

2000-09-11 Thread Ulrich Drepper

Ray Bryant <[EMAIL PROTECTED]> writes:

> Is there a succinct description of the the thread group changes
> someplace?  I'd be willing to take a look at fixing linuxthreads,
> but haven't seen any description (other than the kernel source) of
> what the changes are.  Or is someone already working on this?

"Fixing" alone won't cut it.  I've started a rewrite and send Linus
more comments about what is needed but not even got a reply.  Seems
the short interest span is already over.

-- 
---.      ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] threaded processes get stuck in rt_sigsuspend/fillonedir/exit_notify

2000-09-11 Thread Ulrich Drepper

David Ford <[EMAIL PROTECTED]> writes:

> Regardless of who does it or whether or not it goes in testX patch, I'd
> surely like to have a patch(es) for my systems.  Depending on what gets run,
> I could easily end up with hundreds+ of hung programs and zombies.

This is completely unrelated.  The fix for your problem is to change
the CLONE_SIGHAND flag back to it's original behavior.  Changing
linuxthreads to take advantage of the new kernel functionality is on a
different plate.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] threaded processes get stuck in rt_sigsuspend/fillonedir/exit_notify

2000-09-12 Thread Ulrich Drepper

[EMAIL PROTECTED] writes:

> I didn't realize things had changed that broke the old threading model.
> Did Linus do more than add support for the new thread groups?  I didn't
> think any that had changed that would break the old LinuxThread
> programs.

First he introduces CLONE_THREAD (or how it was called).  This was
fine.  But in pre2 ore pre3 he unified CLONE_SIGHAND and CLONE_THREAD
under the new name CLONE_SIGNAL which makes perfect if CLONE_SIGHAND
would be used.  But it is.  Simply undo this change, separate the two
flags.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Ulrich Drepper

Jesse Pollard <[EMAIL PROTECTED]> writes:

> It's not a bug, but a security feature. NO log to syslogd should be lost,
> since it may be related to an attack. 

That's exactly the argument I'm always using to turn down change
requests like this.  If the syslog() function could drop an entry and
not send it is easy enough for somebody who has something to hide to
overflow syslog() and then do the whatever s/he does not want to be
logged.

If anything has to be changed it's (as suggested) the configuration or
even the implementation of syslogd.  Make it robust.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: syslog() blocks on glibc 2.1.3 with kernel 2.2.x

2000-10-23 Thread Ulrich Drepper

[EMAIL PROTECTED] (Patrick J. LoPresti) writes:

> OK, but my current syslogd only listens to /dev/log as a SOCK_DGRAM.
> [...]

I don't care what the current syslogd does.  Changing the libc just to
work around the limitations of current implementations is wrong.
Write a new syslogd and do it right.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Dual XEON - >>SLOW<< on SMP

2000-11-02 Thread Ulrich Drepper

"Richard B. Johnson" <[EMAIL PROTECTED]> writes:

> Yes. Look at the NMI count. Looks like every access produces a
> NMI.

I'm seeing this as well, but only with PIII Xeon systems, not PII
Xeon.  Every single timer interrupt on any CPU is accompanied by a NMI
and LOC increment on every CPU.

   CPU0   CPU1   
  0: 146727 153389IO-APIC-edge  timer
[...]
NMI: 300035 300035 
LOC: 300028 300028 


-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Can EINTR be handled the way BSD handles it? -- a plea from a user-land programmer...

2000-11-03 Thread Ulrich Drepper

[EMAIL PROTECTED] writes:

> Can we _PLEASE_PLEASE_PLEASE_ not do this anymore and have the kernel do
> what BSD does:  re-start the interrupted call?

This is crap.  Returning EINTR is necessary for many applications.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Can EINTR be handled the way BSD handles it? -- a plea from a user-land programmer...

2000-11-06 Thread Ulrich Drepper

"Theodore Y. Ts'o" <[EMAIL PROTECTED]> writes:

> Arguably though the bug is in glibc, in that if it's using signals
> behinds the scenes, it should have passed SA_RESTART to sigaction.

Why are you talking  such a nonsense?

> 
> However, from a portability point of view, you should *always* surround
> certain system calls with while loops, since even if your program
> doesn't use signals, if you run that program on a System-V derived Unix
> system, and someone types ^Z at the wrong moment, you can also get an
> EINTR.   Similarly, you should always check the return value from write
> and make sure all of what you asked to be written, was actually
> written.
> 
> What I normally do is have a full_write routine which looks something
> like this:
> 
> static errcode_t full_write(int fd, void *buf, int count)
> {
>   char*cp = buf;
>   int left = count, c;
> 
>   while (left) {
>   c = write(fd, cp, left);
>   if (c < 0) {
>   if (errno == EINTR || errno == EAGAIN)
>   continue;
>   return errno;
>   }
>   left -= c;
>   cp += c;
>   }
>   return 0;
> }
> 
> It's like checking the return value from malloc().  Not everyone does
> it, but even if it's not needed 99% of the time, it's a darned good idea
> to do that.
> 
>   - Ted
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 
> 

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Can EINTR be handled the way BSD handles it? -- a plea from a user-land programmer...

2000-11-06 Thread Ulrich Drepper

Ulrich Drepper <[EMAIL PROTECTED]> writes:

> "Theodore Y. Ts'o" <[EMAIL PROTECTED]> writes:
> 
> > Arguably though the bug is in glibc, in that if it's using signals
> > behinds the scenes, it should have passed SA_RESTART to sigaction.
> 
> Why are you talking  such a nonsense?

[Note to self: remove kitten from keyboard before writing mail.]

Glibc has to use signals because there *still* is not mechanism in the
kernel to allow synchronization.  After how many years.

I don't blame Linux.  He has no interest in threads and therefore
spends not much time thinking about it.  But everybody who's
complaining about things like this has to be willing to fix the real
problems.

Get your ass up and write a fast semaphore/mutex system.

-- 
-------.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.2.18 signal.h

2000-12-15 Thread Ulrich Drepper

Andrea Arcangeli <[EMAIL PROTECTED]> writes:

> x()
> {
> 
>   switch (1) {
>   case 0:
>   case 1:
>   case 2:
>   case 3:
>   ;
>   }
> }
> 
> Why am I required to put a `;' only in the last case and not in all
> the previous ones? Or maybe gcc-latest is forgetting to complain about
> the previous ones ;)

Your C language knowledge seems to have holes.  It must be possible to
have more than one label for a statement.  Look through the kernel
sources, there are definitely cases where this is needed.

-- 
---.      ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: What is up with Redhat 7.0?

2000-09-30 Thread Ulrich Drepper

I really didn't want to make a comment on this stupid thread but now
you are getting personal:

> > > OTOH, [EMAIL PROTECTED] might get pressed into not doing incompatible
> > > changes,
> > 
> > We're doing no such thing.
> 
> If you say so However, I am not sure that you (we?) can actually
> control it.

You are excused this one and only time since I am fortunate enough to
never have met you but listen carefully now:

  I allow nobody to tell me what to do.

Nobody from Red Hat ever tried to do this.  If this would have been on
the mind of somebody (which I doubt) this illusion would have been
destroyed on the first day when I told them that this never would be
an option.  There are external entities (commercial and
non-commercial) who try this, though, of course without success
either.

> > If we did this sort of thing, he would have been pressed into releasing
> > glibc 2.2 in time.
> 
> Well, I actually do think that this has happened with glibc-2.1.

And this I take as personal insult.  Who the f*ck do you think you are
to claim the right of making such a statement?  This is so completely
insane that I really have not the slightest idea how you can make
something like this up.  Go and find somebody who is working on glibc
to back up this "statement" and not some idiot like you who has no
inside whatsoever.  If you cannot find anybody I demand a public
apology from you.

-- 
-------.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] Additional pad in struct stat(64)

2000-09-26 Thread Ulrich Drepper

Christoph Hellwig <[EMAIL PROTECTED]> writes:

> Hehe, that's why I'd like to introduce some additional pad with my
> patch ;)

There is no reason to introduce now unnecessarily incompatibilities.
If you want to look forward and add more padding do this when there is
another change necessary.  Introducing breakage just to possibily
avoid them in future is stupid.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] Additional pad in struct stat(64)

2000-09-26 Thread Ulrich Drepper

Christoph Hellwig <[EMAIL PROTECTED]> writes:

> I'd like to have st_flags added to struct stat64, so adding the actual
> feature in Linux 2.5 (if it has a chance to get in - that's why I'm
> interested in a comment by Linus on this) will not need a new version
> of struct stat (and a new  libc to use it),

It will need a new libc version anyway.

-- 
---.      ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] pcnet32 compilation fix for 2.4.3pre6

2001-03-29 Thread Ulrich Drepper

[EMAIL PROTECTED] writes:

> with the new ansi standard, this use of __inline__ is no longer
> necessary,

This is not correct.  Since the semantics of inline in C99 and gcc
differ all code which depends on the gcc semantics should continue to
use __inline__ since this keyword will hopefully forever signal the
gcc semantics.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] devfsd, compiling on glibc22x

2001-01-27 Thread Ulrich Drepper

Pierre Rousselet <[EMAIL PROTECTED]> writes:

> for me :
> make CFLAGS='-O2 -I. -D_GNU_SOURCE' 
> compiles without any patch. is it correct ?

Yes.  RTLD_NEXT is not in any standard, it's an extension available
via -D_GNU_SOURCE.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] devfsd, compiling on glibc22x

2001-02-04 Thread Ulrich Drepper

Richard Gooch <[EMAIL PROTECTED]> writes:

> So why do old binaries (compiled with glibc 2.1.3) segfault when they
> call dlsym() with RTLD_NEXT?  Even newly compiled binaries (with glibc
> 2.2) still segfault.

What do you ask me?  You wrote the code.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: pselect() modifying timeout

2005-08-05 Thread Ulrich Drepper
Michael Kerrisk wrote:
> Please consider making Linux pselect() conform to POSIX on this 
> point.

There is no question the implementation will conform.  But this not
dependent on changing the syscall interface.  We deliberately decided to
not require the kernel interface to be different from select.  The
userlevel code will take care of the difference.  The kernel code is
good as proposed.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: sigwait() breaks when straced

2005-07-31 Thread Ulrich Drepper
On 7/30/05, Sanjoy Mahajan <[EMAIL PROTECTED]> wrote:
> so the return value should not be 4 (or the docs are not right).

This return value simply indicated EINTR (sigwait does not set errno,
read the docs).

The kernel simply doesn't restart the function in case of a signal. 
It should do this, though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: sigwait() breaks when straced

2005-08-01 Thread Ulrich Drepper
On 7/31/05, Roland McGrath <[EMAIL PROTECTED]> wrote:
> However, there is in fact no bug here.  The test program is just wrong.
> sigwait returns zero or an error number, as POSIX specifies.

No question, no error is detected incorrectly.

But sigwait is not a function specified with an EINTR error number. 
As I said before, this does not mean that EINTR cannot be returned. 
But it will create havoc among programs and it causes undefined
behavior wrt to SA_RESTART.  I think it is best to not have any
function for which EINTR is not a defined error to fail this way. 
This causes the least amount of surprises and unnecessary loops around
the userlevel call sites.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: sigwait() breaks when straced

2005-08-01 Thread Ulrich Drepper
On 8/1/05, Jesper Juhl <[EMAIL PROTECTED]> wrote:
> I'm not quite sure you are right Ulrich. Given this little bit from
> SUSv3 about SA_RESTART in the page describing sigaction (
> http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html
> ) :

It's not an official SA_RESTART since the syscall is defined to
support EINTR.  It's clear  that sigwait in this sense is not
interruptible.  Return EINTR from sigwait is only allowed by POSIX
since there is no contrary wording (unlike for the pthread functions).
 But if this clause would be used each and every syscall could return
EINTR and we would have to surround all syscalls with a loop.  Hence
the syscall should be restarted, not because SA_RESTART is set, but
because EINTR shouldn't be returned.

Now, Roland correctly said sigtimedwait and sigwaitinfo need to return
EINTR and we use one syscall for them all.  I overlook that part.  So,
I'll add the wrapper in the libc so that sigwait restarts on EINTR.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6 1/2] New Syscall: get rlimits of any process (update)

2005-08-22 Thread Ulrich Drepper
On 8/18/05, Alan Cox <[EMAIL PROTECTED]> wrote:
> Perhaps those application authors should provide a management interface
> to do so within the soft limit range at least. Its not clear to me that
> growing the fd array on a process is even safe. Some programs do size
> arrays at startup after querying the rlimit data.

That's very true.  Using such a remote-rlimit syscall would break all
kinds of code.  It's a basic assumption from Unix/POSIX that the
limits remain constant.  And as Alan hinted at: this is why there are
soft and hard limits.  If tey are set to the same value you obviously
don't get anything.  But this is the application programmer's fault. 
An application which is aware of resources and tries to limit them
should set the soft limits to a reasonable low value and the hard
limit to the absolute maximum (probably the system's maximum).  Then
you can have remote procedure calls into the application to adjust the
soft limits.  Having to change the hard limit means the capacity
planning for the app is completely wrong.  A restart is certainly
acceptable in that case since it should really never happen.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mremap() use is racy

2005-08-23 Thread Ulrich Drepper
Not the mremap() implementation itself, so don't worry.

If mremap() is to be used without the MREMAP_MAYMOVE flag the call will
only succeed of the address space after the block which is to be
remapped is empty.  This is rarely the case since there are many users
of mmap and memory is allocated consecutively in many cases.

So what programs have to do is to make sure ahead of time that the
mremap() call can succeed.  The best way to do this is using an
anonymous, unused, unusable mapping.  Code like this:

p = mmap(NULL, 65536, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0);

mmap(p, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);


Then when the mapping has to be extended one should be able to use mremap():


mremap(p, 16384, 32768, 0);


But this is not possible since mremap() respects the anonymous mapping.
 So one has to use


munmap((char*)p + 16384, 16384);


before the mremap() call.  But this is where the race comes in.  Some
other thread might allocate these blocks before the mremap() call can do it.


One possible solution would be to add a flag to mremap() which allows
mremap() to steal memory.  In general that would be too dangerous but we
could limit it to private, anonymous mappings which have no access
permissions (i.e., PROT_NONE with MAP_PRIVATE and MAP_ANON).  One
explicitly has to allocate such blocks, they don't appear naturally.
And the program in any case knows about the address space layout.

So, how about adding MREMAP_MAPOVERNONE or so?

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: mremap() use is racy

2005-08-23 Thread Ulrich Drepper
Hugh Dickins wrote:
> If the app can plan ahead as you're proposing, why doesn't it just
> mmap the maximum it might need, mprotect PROT_NONE the end it doesn't
> need yet, then progressively re-mprotect parts to make them accessible
> as needed?

Because the underlying file isn't larger than the initial mapping.  In
the one case I'm working on now the file can grow over time.  More data
is added at the end but the mapping cannot move in the address space.

Using mmap with a too-large size for the underlying file and then hoping
that future file growth is magically handled when those pages are
accessed is not valid.


> I'm missing what mremap gives you here that mprotect doesn't.  Though
> I do see that it would be nice not to be forced into mremap moving
> all the time, because of other maps blocking you off: nice perhaps
> to know what region of the layout is least likely to be so affected.

Just accept here that moving is not an option.  If remap cannot be used
then a complete new mmap() with adjusted length is needed.  That is
unnecessarily expensive.  It is the reason why there is mremap().  But
mremap() with MREMAP_MAYMOVE is unreliable as it is implemented today.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: mremap() use is racy

2005-08-23 Thread Ulrich Drepper
Linus Torvalds wrote:
> Actually, it should be pretty much as valid as using mremap - ie it works 
> on Linux. 
> 
> Especially if you use MAP_SHARED, you don't even need to mprotect 
> anything: you'll get a nice SIGBUS if you ever try to access past the last 
> page that maps the file.

If you guarantee this (and test for this) it's fine with me.  The POSIX
spec explicitly leaves this undefined and requiring to use mremap()
would be a nice way to work around this without allowing the
introduction of undefined behavior into programs.  I probably would
prefer to use mremap() since this makes it clear what should happen but
I can live with using the too-large mapping.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: Add pselect, ppoll system calls.

2005-08-24 Thread Ulrich Drepper
David Woodhouse wrote:
> If it's mandatory that we actually call the signal handler, then we need
> to play tricks like sigsuspend() does to leave the old signal mask on
> the stack frame. That's a bit painful atm because do_signal is different
> between architectures. 

It is necessary that the handler is called.  This is the purpose of
these interfaces.  If this means more complexity is needed then this is
how the cookie crumbles.  One use case for pselect would be something
like this:


int got_signal;
void sigint_handler(int sig) {
  got_signal = 1;
}

{
  ...
  while (1) {
if (!got_signal)
  pselect()

if (got_signal) {
  handle signal
  got_signal = 0;
    }
  }
  ...
}

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] RUSAGE_THREAD

2008-01-18 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Roland McGrath wrote:
> +#define  RUSAGE_LWP  RUSAGE_THREAD   /* Solaris name for same */

No need to clutter the kernel header with this, it'll be in the libc header.

Aside from that:

Acked-by: Ulrich Drepper <[EMAIL PROTECTED]>

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHkZbk2ijCOnn/RHQRAtohAKCyWgJsm20LSqxTznvff3LI8zplvgCgwttu
16eJFNgQXWNEk76b141uZvo=
=DzhA
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/4] sys_indirect system call

2007-11-19 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Eric Dumazet wrote:
>>   union indirect_params i;
>>   i.file_flags.flags = O_CLOEXEC;
> 
> This setup forbids future addons to file_flags
> 
> In three years, when we want to add a new indirect feature to socket() 
> call, do we need a new indirect2() syscall ?

No, it doesn't.  The setup is indefinitely expandable.

All you need to do, if it becomes necessary to have more than an int, is
to define a little structure for the system call and then use it.  The
only requirement is that the code has to assume a value of zero is what
is used today.  That's the whole point.

union indirect_params {
  struct {
int flags;
  } file_flags;
  struct {
int flags;
int new_syscall_data1;
sigset_t and_a_sigmask;
  } new_data;
};

Old programs will set only the 'flags' member of 'new_data' while new
once can also set the new elements.  New programs on old kernels will
eithe have failing calls since the structure is too big or the call will
not have all the desired effects.  The latter can be tested for.


> Or better, you could avoid using 'union indirect_params' in user code, and 
> only use the substructs for each function.

There is no overhead introduced through the union.  The only reason the
union is there in the first place is to allocate sufficient data in
task_struct to cover all cases.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHQafd2ijCOnn/RHQRAlSFAJ99lahwCDZGRSlIHCov5bWowrpoiQCgwvW4
LDSEusNUpMfIE1ywBCRDBfc=
=ChVT
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/4] sys_indirect system call

2007-11-19 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Eric Dumazet wrote:
> So when you recompile your old program (as you post it and as I commented on),
> it will pass a >= 12 bytes data to kernel, with only first 4 bytes set to 
> O_CLOEXEC.
> 
> Other bytes will contain junk 

If you don't initialize the entire structure and you use it all, of
course you get undefined behavior.  That's nothing new.  The program I
attached is not an example, it's a test for the functionality in this patch.

Like with every kernel interface, you have to use it correctly.  The
good news is that user programs should never use this syscall directly
(just like don't for existing ones).

I see no problem at all here.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHQbBH2ijCOnn/RHQRAkc3AKCxVTWQ3BiQnCBwdbAsT122QWWaiwCggKXN
Z5Sz9/NFojMHZXXTzIMoxX4=
=slte
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/4] sys_indirect system call

2007-11-19 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

H. Peter Anvin wrote:
> What bothers me about the sys_indirect approach is that it will get
> increasingly expensive as time goes on, and in doing so it does a
> user-space memory reference, which are extra expensive.  The extra table
> can be colocated with the main table (a structure, in effect) so they'll
> share the same cache line.

You assume that using sys_indirect will be the norm.  It won't.  We
mustn't design system calls deliberately wrong so that they require the
indirection.

Beside, if the number of syscalls which has to be handled this way grows
we can use something more efficient for large numbers of test than a
switch statement.  It could even be a word next to the system call table.

But I still don't see that the magic encoding is a valid solution, it
doesn't address the limited parameter number.  Plus, using sys_indirect
could in future be used to transport entire parameters (like a sigset_t)
along with other information, thereby saving individual copy operations.

I think the sys_indirect approach is the way forward.  I'll submit a
last version of the patch in a bit.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHQlRw2ijCOnn/RHQRApifAKDE1nZqRbm4cJxbhobBb7jCx1T00QCgiSa0
EXKjL2Gwu3atSLSD+Rb4yO4=
=6ZGt
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv4 4/6] Allow setting FD_CLOEXEC flag for new sockets

2007-11-19 Thread Ulrich Drepper
This is a first user of sys_indirect.  Several of the socket-related system
calls which produce a file handle now can be passed an additional parameter
to set the FD_CLOEXEC flag.

 arch/x86/ia32/Makefile|1 +
 arch/x86/ia32/sys_ia32.c  |4 
 include/asm-x86/ia32_unistd.h |1 +
 include/linux/indirect.h  |   33 +
 kernel/Makefile   |2 ++
 kernel/indirect.c |4 
 net/socket.c  |   21 +
 7 files changed, 58 insertions(+), 8 deletions(-)

--- arch/x86/ia32/Makefile
+++ arch/x86/ia32/Makefile
@@ -36,6 +36,7 @@ $(obj)/vsyscall-sysenter.so.dbg 
$(obj)/vsyscall-syscall.so.dbg: \
 $(obj)/vsyscall-%.so.dbg: $(src)/vsyscall.lds $(obj)/vsyscall-%.o FORCE
$(call if_changed,syscall)
 
+CFLAGS_sys_ia32.o = -Wno-undef
 AFLAGS_vsyscall-sysenter.o = -m32 -Wa,-32
 AFLAGS_vsyscall-syscall.o = -m32 -Wa,-32
 
--- kernel/Makefile
+++ kernel/Makefile
@@ -67,6 +67,8 @@ ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 CFLAGS_sched.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
+CFLAGS_indirect.o = -Wno-undef
+
 $(obj)/configs.o: $(obj)/config_data.h
 
 # config_data.h contains the same information as ikconfig.h but gzipped.
diff -u net/socket.c net/socket.c
--- net/socket.c
+++ net/socket.c
@@ -344,11 +344,11 @@
  * but we take care of internal coherence yet.
  */
 
-static int sock_alloc_fd(struct file **filep)
+static int sock_alloc_fd(struct file **filep, int flags)
 {
int fd;
 
-   fd = get_unused_fd();
+   fd = get_unused_fd_flags(flags);
if (likely(fd >= 0)) {
struct file *file = get_empty_filp();
 
@@ -391,10 +391,10 @@
return 0;
 }
 
-int sock_map_fd(struct socket *sock)
+static int sock_map_fd_flags(struct socket *sock, int flags)
 {
struct file *newfile;
-   int fd = sock_alloc_fd(&newfile);
+   int fd = sock_alloc_fd(&newfile, flags);
 
if (likely(fd >= 0)) {
int err = sock_attach_fd(sock, newfile);
@@ -409,6 +409,11 @@
return fd;
 }
 
+int sock_map_fd(struct socket *sock)
+{
+   return sock_map_fd_flags(sock, 0);
+}
+
 static struct socket *sock_from_file(struct file *file, int *err)
 {
if (file->f_op == &socket_file_ops)
@@ -1208,7 +1213,7 @@
if (retval < 0)
goto out;
 
-   retval = sock_map_fd(sock);
+   retval = sock_map_fd_flags(sock, INDIRECT_PARAM(file_flags, flags));
if (retval < 0)
goto out_release;
 
@@ -1249,13 +1254,13 @@
if (err < 0)
goto out_release_both;
 
-   fd1 = sock_alloc_fd(&newfile1);
+   fd1 = sock_alloc_fd(&newfile1, INDIRECT_PARAM(file_flags, flags));
if (unlikely(fd1 < 0)) {
err = fd1;
goto out_release_both;
}
 
-   fd2 = sock_alloc_fd(&newfile2);
+   fd2 = sock_alloc_fd(&newfile2, INDIRECT_PARAM(file_flags, flags));
if (unlikely(fd2 < 0)) {
err = fd2;
put_filp(newfile1);
@@ -1411,7 +1416,7 @@
 */
__module_get(newsock->ops->owner);
 
-   newfd = sock_alloc_fd(&newfile);
+   newfd = sock_alloc_fd(&newfile, INDIRECT_PARAM(file_flags, flags));
if (unlikely(newfd < 0)) {
err = newfd;
sock_release(newsock);
diff -u arch/x86/ia32/sys_ia32.c arch/x86/ia32/sys_ia32.c
--- arch/x86/ia32/sys_ia32.c
+++ arch/x86/ia32/sys_ia32.c
@@ -902,6 +902,10 @@
 
switch (INDIRECT_SYSCALL32(®s))
{
+#define INDSYSCALL(name) __NR_ia32_##name
+#include 
+   break;
+
default:
return -EINVAL;
}
diff -u include/linux/indirect.h include/linux/indirect.h
--- include/linux/indirect.h
+++ include/linux/indirect.h
@@ -1,6 +1,39 @@
+#ifndef INDSYSCALL
 #ifndef _LINUX_INDIRECT_H
 #define _LINUX_INDIRECT_H
 
 #include 
 
+
+union indirect_params {
+  struct {
+int flags;
+  } file_flags;
+};
+
+#define INDIRECT_PARAM(set, name) current->indirect_params.set.name
+
+#endif
+#else
+
+/* Here comes the list of system calls which can be called through
+   sys_indirect.  When the list if support system calls is needed the
+   file including this header is supposed to define a macro "INDSYSCALL"
+   which adds a prefix fitting to the use.  If the resulting macro is
+   defined we generate a line
+   case MACRO:
+   */
+#if INDSYSCALL(accept)
+  case INDSYSCALL(accept):
+#endif
+#if INDSYSCALL(socket)
+  case INDSYSCALL(socket):
+#endif
+#if INDSYSCALL(socketcall)
+  case INDSYSCALL(socketcall):
+#endif
+#if INDSYSCALL(socketpair)
+  case INDSYSCALL(socketpair):
+#endif
+
 #endif
diff -u kernel/indirect.c kernel/indirect.c
--- kernel/indirect.c
+++ kernel/indirect.c
@@ -19,6 +19,10 @@
 
switch (INDIRECT_SYSCALL (®s))
{
+#define INDSYSCALL(name) __NR_##name
+#include 
+   break;
+
default:
return -EINVAL;
}
--- include/

[PATCHv4 1/6] actual sys_indirect code

2007-11-19 Thread Ulrich Drepper
This is the actual architecture-independent part of the system call
implementation.

 include/linux/indirect.h |6 ++
 include/linux/sched.h|4 
 include/linux/syscalls.h |4 
 kernel/Makefile  |2 +-
 kernel/indirect.c|   36 
 5 files changed, 51 insertions(+), 1 deletion(-)

--- /dev/null
+++ include/linux/indirect.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_INDIRECT_H
+#define _LINUX_INDIRECT_H
+
+#include 
+
+#endif
--- include/linux/sched.h
+++ include/linux/sched.h
@@ -80,6 +80,7 @@ struct sched_param {
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1174,6 +1175,9 @@ struct task_struct {
int make_it_fail;
 #endif
struct prop_local_single dirties;
+
+   /* Additional system call parameters.  */
+   union indirect_params indirect_params;
 };
 
 /*
--- include/linux/syscalls.h
+++ include/linux/syscalls.h
@@ -54,6 +54,7 @@ struct compat_stat;
 struct compat_timeval;
 struct robust_list_head;
 struct getcpu_cache;
+struct indirect_registers;
 
 #include 
 #include 
@@ -611,6 +612,9 @@ asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
const struct itimerspec __user *utmr);
 asmlinkage long sys_eventfd(unsigned int count);
 asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);
+asmlinkage long sys_indirect(struct indirect_registers __user *userregs,
+void __user *userparams, size_t paramslen,
+int flags);
 
 int kernel_execve(const char *filename, char *const argv[], char *const 
envp[]);
 
--- /dev/null
+++ kernel/indirect.c
@@ -0,0 +1,36 @@
+#include 
+#include 
+#include 
+#include 
+
+
+asmlinkage long sys_indirect(struct indirect_registers __user *userregs,
+void __user *userparams, size_t paramslen,
+int flags)
+{
+   struct indirect_registers regs;
+   long result;
+
+   if (unlikely(flags != 0))
+   return -EINVAL;
+
+   if (copy_from_user(®s, userregs, sizeof(regs)))
+   return -EFAULT;
+
+   switch (INDIRECT_SYSCALL (®s))
+   {
+   default:
+   return -EINVAL;
+   }
+
+   if (paramslen > sizeof(union indirect_params))
+   return -EINVAL;
+
+   result = -EFAULT;
+   if (!copy_from_user(¤t->indirect_params, userparams, paramslen))
+   result = CALL_INDIRECT(®s);
+
+   memset(¤t->indirect_params, '\0', paramslen);
+
+   return result;
+}
--- kernel/Makefile
+++ kernel/Makefile
@@ -9,7 +9,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o 
profile.o \
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o latency.o nsproxy.o srcu.o \
-   utsname.o notifier.o
+   utsname.o notifier.o indirect.o
 
 obj-$(CONFIG_SYSCTL) += sysctl_check.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv4 3/6] UML support for sys_indirect

2007-11-19 Thread Ulrich Drepper
This part adds support for sys_indirect for UML.

 indirect.h |6 ++
 1 file changed, 6 insertions(+)

--- /dev/null
+++ include/asm-um/indirect.h
@@ -0,0 +1,6 @@
+#ifndef __UM_INDIRECT_H
+#define __UM_INDIRECT_H
+
+#include "asm/arch/indirect.h"
+
+#endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv4 2/6] x86&x86-64 support for sys_indirect

2007-11-19 Thread Ulrich Drepper
This part adds support for sys_indirect on x86 and x86-64.

 arch/x86/ia32/ia32entry.S  |2 ++
 arch/x86/ia32/sys_ia32.c   |   31 +++
 arch/x86/kernel/syscall_table_32.S |1 +
 include/asm-x86/indirect.h |5 +
 include/asm-x86/indirect_32.h  |   23 +++
 include/asm-x86/indirect_64.h  |   34 ++
 include/asm-x86/unistd_32.h|3 ++-
 include/asm-x86/unistd_64.h|2 ++
 8 files changed, 100 insertions(+), 1 deletion(-)

--- arch/x86/ia32/ia32entry.S
+++ arch/x86/ia32/ia32entry.S
@@ -400,6 +400,7 @@ END(ia32_ptregs_common)
 
.section .rodata,"a"
.align 8
+   .globl ia32_sys_call_table
 ia32_sys_call_table:
.quad sys_restart_syscall
.quad sys_exit
@@ -726,4 +727,5 @@ ia32_sys_call_table:
.quad compat_sys_timerfd
.quad sys_eventfd
.quad sys32_fallocate
+   .quad sys32_indirect/* 325  */
 ia32_syscall_end:
--- arch/x86/ia32/sys_ia32.c
+++ arch/x86/ia32/sys_ia32.c
@@ -887,3 +887,37 @@ asmlinkage long sys32_fallocate(int fd, int mode, unsigned 
offset_lo,
return sys_fallocate(fd, mode, ((u64)offset_hi << 32) | offset_lo,
 ((u64)len_hi << 32) | len_lo);
 }
+
+asmlinkage long sys32_indirect(struct indirect_registers32 __user *userregs,
+  void __user *userparams, size_t paramslen,
+  int flags)
+{
+   extern long (*ia32_sys_call_table[])(u32, u32, u32, u32, u32, u32);
+
+   struct indirect_registers32 regs;
+   long result;
+
+   if (flags != 0)
+   return -EINVAL;
+
+   if (copy_from_user(®s, userregs, sizeof(regs)))
+   return -EFAULT;
+
+   switch (INDIRECT_SYSCALL32(®s))
+   {
+   default:
+   return -EINVAL;
+   }
+
+   if (paramslen > sizeof(union indirect_params))
+   return -EINVAL;
+   result = -EFAULT;
+   if (!copy_from_user(¤t->indirect_params, userparams, paramslen))
+   result = ia32_sys_call_table[regs.eax](regs.ebx, regs.ecx,
+  regs.edx, regs.esi,
+  regs.edi, regs.ebp);
+
+   memset(¤t->indirect_params, '\0', paramslen);
+
+   return result;
+}
--- arch/x86/kernel/syscall_table_32.S
+++ arch/x86/kernel/syscall_table_32.S
@@ -324,3 +324,4 @@ ENTRY(sys_call_table)
.long sys_timerfd
.long sys_eventfd
.long sys_fallocate
+   .long sys_indirect  /* 325 */
--- /dev/null
+++ include/asm-x86/indirect_32.h
@@ -0,0 +1,23 @@
+#ifndef _ASM_X86_INDIRECT_32_H
+#define _ASM_X86_INDIRECT_32_H
+
+struct indirect_registers {
+   __u32 eax;
+   __u32 ebx;
+   __u32 ecx;
+   __u32 edx;
+   __u32 esi;
+   __u32 edi;
+   __u32 ebp;
+};
+
+#define INDIRECT_SYSCALL(regs) (regs)->eax
+
+#define CALL_INDIRECT(regs) \
+  ({ extern long (*sys_call_table[]) (__u32, __u32, __u32, __u32, __u32, 
__u32); \
+ sys_call_table[INDIRECT_SYSCALL(regs)] ((regs)->ebx, (regs)->ecx, \
+(regs)->edx, (regs)->esi, \
+(regs)->edi, (regs)->ebp); \
+ })
+
+#endif
--- /dev/null
+++ include/asm-x86/indirect_64.h
@@ -0,0 +1,34 @@
+#ifndef _ASM_X86_INDIRECT_64_H
+#define _ASM_X86_INDIRECT_64_H
+
+struct indirect_registers {
+   __u64 rax;
+   __u64 rdi;
+   __u64 rsi;
+   __u64 rdx;
+   __u64 r10;
+   __u64 r8;
+   __u64 r9;
+};
+
+struct indirect_registers32 {
+   __u32 eax;
+   __u32 ebx;
+   __u32 ecx;
+   __u32 edx;
+   __u32 esi;
+   __u32 edi;
+   __u32 ebp;
+};
+
+#define INDIRECT_SYSCALL(regs) (regs)->rax
+#define INDIRECT_SYSCALL32(regs) (regs)->eax
+
+#define CALL_INDIRECT(regs) \
+  ({ extern long (*sys_call_table[]) (__u64, __u64, __u64, __u64, __u64, 
__u64); \
+ sys_call_table[INDIRECT_SYSCALL(regs)] ((regs)->rdi, (regs)->rsi, \
+(regs)->rdx, (regs)->r10, \
+(regs)->r8, (regs)->r9); \
+ })
+
+#endif
--- /dev/null
+++ include/asm-x86/indirect.h
@@ -0,0 +1,5 @@
+#ifdef CONFIG_X86_32
+# include "indirect_32.h"
+#else
+# include "indirect_64.h"
+#endif
--- include/asm-x86/unistd_32.h
+++ include/asm-x86/unistd_32.h
@@ -330,10 +330,11 @@
 #define __NR_timerfd   322
 #define __NR_eventfd   323
 #define __NR_fallocate 324
+#define __NR_indirect  325
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 325
+#define NR_syscalls 326
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
--- include/asm-x86/unistd_64.h
+++ include/asm-x86/unistd_64.h
@@ -635,6 +635,8 @@ __SYSCALL(__NR_timerfd, sys_timerfd)
 __SYSCALL(__NR_eventfd, sys_eventfd)
 #define __NR_fallocate

[PATCHv4 0/6] sys_indirect system call

2007-11-19 Thread Ulrich Drepper

wing patches provide an alternative implementation of the
sys_indirect system call which has been discussed a few times.
This no system call allows us to extend existing system call
interfaces with adding more system calls.

Davide's previous implementation is IMO far more complex than
warranted.  This code here is trivial, as you can see.  I've
discussed this approach with Linus last week and for a brief moment
we actually agreed on something.

We pass an additional block of data to the kernel, it is copied into
the task_struct, and then it is up to the function implementing the system
call to interpret the data.  Each system call, which is meant to be
extended this way, has to be white-listed in sys_indirect.  The
alternative is to filter out those system calls which absolutely cannot
be handled using sys_indirect (like clone, execve) since they require
the stack layout of an ordinary system call.  This is more dangerous
since it is too easy to miss a call.

The code for x86 and x86-64 gets by without a single line of assembly
code.  This is likely to be true for most/all the other archs as well.
There is architecture-dependent code, though.  For x86 and x86-64 I've
also fixed up UML (although only x86-64 is tested, that's my setup).

The last three patches show the first application of the functionality.
They also show a complication: we need the test for valid sub-syscalls in the
main implementation and in the compatibility code.  And more: the actual
sources and generated binary for the test are very different (the numbers
differ).  Duplicating the information is a big problem, though.  I've used
some macro tricks to avoid this.  All the information about the flags and
the system calls using them is concentrated in one header.  This should
maintenance bearable.

This patch to use sys_indirect is just the beginning.  More will follow,
but I want to see how these patches are received before I spend more time
on it.  This code is enough to test the implementation with the following
test program.  Adjust it for architectures other than x86 and x86-64.


#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

typedef uint32_t __u32;
typedef uint64_t __u64;

union indirect_params {
  struct {
int flags;
  } file_flags;
};

#ifdef __x86_64__
# define __NR_indirect 286
struct indirect_registers {
  __u64 rax;
  __u64 rdi;
  __u64 rsi;
  __u64 rdx;
  __u64 r10;
  __u64 r8;
  __u64 r9;
};
#elif defined __i386__
# define __NR_indirect 325
struct indirect_registers {
  __u32 eax;
  __u32 ebx;
  __u32 ecx;
  __u32 edx;
  __u32 esi;
  __u32 edi;
  __u32 ebp;
};
#else
# error "need to define __NR_indirect and struct indirect_params"
#endif

#define FILL_IN(var, values...) \
  var = (struct indirect_registers) { values }

int
main (void)
{
  int fd = socket (AF_INET, SOCK_DGRAM, IPPROTO_IP);
  int s1 = fcntl (fd, F_GETFD);
  int t1 = fcntl (fd, F_GETFL);
  printf ("old: FD_CLOEXEC %s set, NONBLOCK %s set\n",
  s1 == 0 ? "not" : "is", (t1 & O_NONBLOCK) ? "is" : "not");
  close (fd);

  union indirect_params i;
  i.file_flags.flags = O_CLOEXEC|O_NONBLOCK;

  struct indirect_registers r;
#ifdef __NR_socketcall
# define SOCKOP_socket   1
  long args[3] = { AF_INET, SOCK_DGRAM, IPPROTO_IP };
  FILL_IN (r, __NR_socketcall, SOCKOP_socket, (long) args);
#else
  FILL_IN (r, __NR_socket, AF_INET, SOCK_DGRAM, IPPROTO_IP);
#endif

  fd = syscall (__NR_indirect, &r, &i, sizeof (i));
  int s2 = fcntl (fd, F_GETFD);
  int t2 = fcntl (fd, F_GETFL);
  printf ("new: FD_CLOEXEC %s set, NONBLOCK %s set\n",
  s2 == 0 ? "not" : "is", (t2 & O_NONBLOCK) ? "is" : "not");
  close (fd);

  i.file_flags.flags = O_CLOEXEC;
  sigset_t ss;
  sigemptyset(&ss);
  FILL_IN(r, __NR_signalfd, -1, (long) &ss, 8);
  fd = syscall (__NR_indirect, &r, &i, sizeof (i));
  int s3 = fcntl (fd, F_GETFD);
  printf ("signalfd: FD_CLOEXEC %s set\n", s3 == 0 ? "not" : "is");
  close (fd);

  FILL_IN(r, __NR_eventfd, 8);
  fd = syscall (__NR_indirect, &r, &i, sizeof (i));
  int s4 = fcntl (fd, F_GETFD);
  printf ("eventfd: FD_CLOEXEC %s set\n", s4 == 0 ? "not" : "is");
  close (fd);

  return s1 != 0 || s2 == 0 || t1 != 0 || t2 == 0 || s3 == 0 || s4 == 0;
}


Signed-off-by: Ulrich Drepper <[EMAIL PROTECTED]>


 arch/x86/ia32/Makefile |1 
 arch/x86/ia32/ia32entry.S  |2 +
 arch/x86/ia32/sys_ia32.c   |   37 +-
 arch/x86/kernel/syscall_table_32.S |1 
 include/asm-um/indirect.h  |6 +
 include/asm-x86/ia32_unistd.h  |1 
 include/asm-x86/indi

[PATCHv4 6/6] FD_CLOEXEC support for eventfd, signalfd, timerfd

2007-11-19 Thread Ulrich Drepper
This patch adds support to set the FD_CLOEXEC flag for the file descriptors
returned by eventfd, signalfd, timerfd.

 fs/anon_inodes.c  |   15 +++
 fs/eventfd.c  |5 +++--
 fs/signalfd.c |6 --
 fs/timerfd.c  |6 --
 include/asm-x86/ia32_unistd.h |3 +++
 include/linux/anon_inodes.h   |3 +++
 include/linux/indirect.h  |3 +++
 7 files changed, 31 insertions(+), 10 deletions(-)

--- fs/anon_inodes.c
+++ fs/anon_inodes.c
@@ -70,9 +70,9 @@ static struct dentry_operations 
anon_inodefs_dentry_operations = {
  * hence saving memory and avoiding code duplication for the file/inode/dentry
  * setup.
  */
-int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile,
-const char *name, const struct file_operations *fops,
-void *priv)
+int anon_inode_getfd_flags(int *pfd, struct inode **pinode, struct file 
**pfile,
+  const char *name, const struct file_operations *fops,
+  void *priv, int flags)
 {
struct qstr this;
struct dentry *dentry;
@@ -85,7 +85,7 @@ int anon_inode_getfd(int *pfd, struct inode **pinode, struct 
file **pfile,
if (!file)
return -ENFILE;
 
-   error = get_unused_fd();
+   error = get_unused_fd_flags(flags);
if (error < 0)
goto err_put_filp;
fd = error;
@@ -138,6 +138,13 @@ err_put_filp:
put_filp(file);
return error;
 }
+
+int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile,
+const char *name, const struct file_operations *fops,
+void *priv)
+{
+   return anon_inode_getfd_flags(pfd, pinode, pfile, name, fops, priv, 0);
+}
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
 
 /*
--- fs/eventfd.c
+++ fs/eventfd.c
@@ -215,8 +215,9 @@ asmlinkage long sys_eventfd(unsigned int count)
 * When we call this, the initialization must be complete, since
 * anon_inode_getfd() will install the fd.
 */
-   error = anon_inode_getfd(&fd, &inode, &file, "[eventfd]",
-&eventfd_fops, ctx);
+   error = anon_inode_getfd_flags(&fd, &inode, &file, "[eventfd]",
+  &eventfd_fops, ctx,
+  INDIRECT_PARAM(file_flags, flags));
if (!error)
return fd;
 
--- fs/signalfd.c
+++ fs/signalfd.c
@@ -224,8 +224,10 @@ asmlinkage long sys_signalfd(int ufd, sigset_t __user 
*user_mask, size_t sizemas
 * When we call this, the initialization must be complete, since
 * anon_inode_getfd() will install the fd.
 */
-   error = anon_inode_getfd(&ufd, &inode, &file, "[signalfd]",
-&signalfd_fops, ctx);
+   error = anon_inode_getfd_flags(&ufd, &inode, &file,
+  "[signalfd]", &signalfd_fops,
+  ctx, INDIRECT_PARAM(file_flags,
+  flags));
if (error)
goto err_fdalloc;
} else {
--- fs/timerfd.c
+++ fs/timerfd.c
@@ -182,8 +182,10 @@ asmlinkage long sys_timerfd(int ufd, int clockid, int 
flags,
 * When we call this, the initialization must be complete, since
 * anon_inode_getfd() will install the fd.
 */
-   error = anon_inode_getfd(&ufd, &inode, &file, "[timerfd]",
-&timerfd_fops, ctx);
+   error = anon_inode_getfd_flags(&ufd, &inode, &file, "[timerfd]",
+  &timerfd_fops, ctx,
+  INDIRECT_PARAM(file_flags,
+ flags));
if (error)
goto err_tmrcancel;
} else {
--- include/asm-x86/ia32_unistd.h
+++ include/asm-x86/ia32_unistd.h
@@ -15,5 +15,8 @@
 #define __NR_ia32_socketcall   102
 #define __NR_ia32_sigreturn119
 #define __NR_ia32_rt_sigreturn 173
+#define __NR_ia32_signalfd 321
+#define __NR_ia32_timerfd  322
+#define __NR_ia32_eventfd  323
 
 #endif /* _ASM_X86_64_IA32_UNISTD_H_ */
--- include/linux/anon_inodes.h
+++ include/linux/anon_inodes.h
@@ -8,6 +8,9 @@
 #ifndef _LINUX_ANON_INODES_H
 #define _LINUX_ANON_INODES_H
 
+int anon_inode_getfd_flags(int *pfd, struct inode **pinode, struct file 
**pfile,
+  const char *name, const struct file_operations *fops,
+  void *priv, int flags);
 int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile,
 const char *name, const struct file_operations *fops,
 void *priv);
--- include/

[PATCHv4 5/6] Allow setting O_NONBLOCK flag for new sockets

2007-11-19 Thread Ulrich Drepper
This patch adds support for setting the O_NONBLOCK flag of the file
descriptors returned by socket, socketpair, and accept.

 socket.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

--- net/socket.c
+++ net/socket.c
@@ -362,7 +362,7 @@ static int sock_alloc_fd(struct file **filep, int flags)
return fd;
 }
 
-static int sock_attach_fd(struct socket *sock, struct file *file)
+static int sock_attach_fd(struct socket *sock, struct file *file, int flags)
 {
struct dentry *dentry;
struct qstr name = { .name = "" };
@@ -384,7 +384,7 @@ static int sock_attach_fd(struct socket *sock, struct file 
*file)
init_file(file, sock_mnt, dentry, FMODE_READ | FMODE_WRITE,
  &socket_file_ops);
SOCK_INODE(sock)->i_fop = &socket_file_ops;
-   file->f_flags = O_RDWR;
+   file->f_flags = O_RDWR | (flags & O_NONBLOCK);
file->f_pos = 0;
file->private_data = sock;
 
@@ -397,7 +397,7 @@ static int sock_map_fd_flags(struct socket *sock, int flags)
int fd = sock_alloc_fd(&newfile, flags);
 
if (likely(fd >= 0)) {
-   int err = sock_attach_fd(sock, newfile);
+   int err = sock_attach_fd(sock, newfile, flags);
 
if (unlikely(err < 0)) {
put_filp(newfile);
@@ -1268,12 +1268,14 @@ asmlinkage long sys_socketpair(int family, int type, 
int protocol,
goto out_release_both;
}
 
-   err = sock_attach_fd(sock1, newfile1);
+   err = sock_attach_fd(sock1, newfile1,
+INDIRECT_PARAM(file_flags, flags));
if (unlikely(err < 0)) {
goto out_fd2;
}
 
-   err = sock_attach_fd(sock2, newfile2);
+   err = sock_attach_fd(sock2, newfile2,
+INDIRECT_PARAM(file_flags, flags));
if (unlikely(err < 0)) {
fput(newfile1);
goto out_fd1;
@@ -1423,7 +1425,8 @@ asmlinkage long sys_accept(int fd, struct sockaddr __user 
*upeer_sockaddr,
goto out_put;
}
 
-   err = sock_attach_fd(newsock, newfile);
+   err = sock_attach_fd(newsock, newfile,
+INDIRECT_PARAM(file_flags, flags));
if (err < 0)
goto out_fd_simple;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 5/6] Allow setting O_NONBLOCK flag for new sockets

2007-11-20 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David Miller wrote:
> FWIW, I think this indirect syscall stuff is the most ugly interface
> I've ever seen proposed for the kernel.

Well, the alternative is to introduce a dozens of new interfaces.  It
was Linus who suggested this alternative.  Plus, it seems that for
syslets we need basically the same interface anyway.


> And I agree with all of the objections raised by both H. Pater Anvin
> and Eric Dumazet.

Eric had no arguments and HP's comments lack a viable alternative proposal.


> Where does this INDIRECT_PARAM() macro get defined?  I do not
> see it being defined anywhere in these patches.

Defined in :

+#define INDIRECT_PARAM(set, name) current->indirect_params.set.name

Not my idea, I was following one review comment.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHQwWl2ijCOnn/RHQRAhEbAJ9/bkrb/phOMRl16Fb0N1TDYglSsgCeNhHQ
3huhdKCAVTu4CJnktf/ufy4=
=Jj6h
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/6] sys_indirect system call

2007-11-20 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Eric Dumazet wrote:
> I am wondering if some parts are missing from your ChangeLog
> 
> You apparently added in v3 a new 'flags' parameter to indirect syscall
> but no trace of this change in Changelog, and why it was added. This
> seems to imply a future multiplexor.

This was mentioned in one of my mails.  I added the parameter to
accommodate Linus's and Zack's idea to use the functionality for syslets
as well.  Not really a multiplexer, it is meant to be a "execute
synchronously or asynchronously" flag.  In the latter case an additional
parameter might be needed to indicate the notification mechanism.


> And no change in the test program reflecting this 'flags' new param, so
> it fails.

Yep, sorry, I didn't update the text by including the most recent test
program.  I'll do that for the next version.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHQwca2ijCOnn/RHQRAgQJAKDH+N3+FSJ0kD5VbzbAFN4918wREwCePHbc
nSY/t9x1FuYstYDaaT6Kut0=
=c95e
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/4] sys_indirect system call

2007-11-20 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

dean gaudet wrote:
> as an application writer how do i access accept(2) with FD_CLOEXEC 
> functionality?  will glibc expose an accept2() with a flags param?

Not yet decided.  There is the alternative to extend the accept()
interface to have both interfaces:

  int accept(int, struct sockaddr *, socklen_t *);
and
  int accept(int, struct sockaddr *, socklen_t *, int);

We can do this with type safety even in C nowadays.


> if so... why don't we just have an accept2() syscall?

If you read the mails of my first submission you'll find that I
explained this.  I talked to Andrew and he favored new syscalls.  But
then I talked to Linus and he favored this approach.  Probably
especially because it can be used for syslets as well.  And it is less
code and data than introducing new syscalls.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHQwhx2ijCOnn/RHQRAnezAKCkFmGwlwDZjpfKTRSUN4yLIeGTkACgtMK/
OcHdIaR8wbp848D3GU2iNYQ=
=nTu9
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 2/6] x86&x86-64 support for sys_indirect

2007-11-20 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Heiko Carstens wrote:
> All these macros could be functions, or? Would give us some type checking
> and avoids the capital letters.

Should be possible now.  I didn't do it initially since the macro used
the macro for the largest syscall number.  That macro wasn't always
available.  I'll test it.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHQwdg2ijCOnn/RHQRAmh9AJ9EuthsaoupSHn3kR/x0cWxqR3FoQCfSbmE
8RIDWzPKZ6cv+QVGNl0fawM=
=ScgY
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/6] sys_indirect system call

2007-11-20 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Zach Brown wrote:
> I'm sure the additional parameter will be needed, and it might be pretty
> involved.  I think the current notion of syslets needs, at the very least:

All correct.  I just want to point out that the proposed interface is
sufficiently prepared for this and that there is no need to wait adding
this initial, synchronous syscall stuff before the syslet stuff is
ready.  These interface changes are security-relevant and should be
added ASAP.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHQySu2ijCOnn/RHQRAnQqAKCz0JzvmAeEcL8m77jbEYAZ4ZFWXwCgpfvE
do7pJGn9XBu9jfQhfLkxQSc=
=eX6m
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 4/6] Allow setting FD_CLOEXEC flag for new sockets

2007-11-20 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Zach Brown wrote:
> Have you given thought to having to perform compat translation on this?
>   Today it's only copied directly from the user pointer into the union
> in the task_struct.

Since there is no legacy interface to worry about all members added to
the structure can and should be neutral of the word size.  We've done
this with some syscalls already (like pread64) where we always use the
wide form in the parameter list.  It's just more simple here since it
does not have to split into two 32-bit registers.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHQyJn2ijCOnn/RHQRAmWeAJ0Q6qBDtZDvsZYlfBnPFL6n11Z+lwCghiVp
NklFHsSnVyQYMD5rinDFQPo=
=Yo5E
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3: find complains about /proc/net

2007-11-20 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Roland McGrath wrote:
> Oh, it seems it has indeed been that way for a very long time, so I was
> mistaken.  It still seems a little odd to me.  Ulrich can say definitively
> whether the kind of concern I mentioned really matters one way or the other
> for glibc.

glibc cannot survive (at least NPTL) if somebody uses funny CLONE_*
flags to separate various pieces of information, e.g., file descriptors.
 So, all the information in each thread's /proc/self should be identical.

When the information is not the same, the current semantics seems to be
more useful.  So I guess, no change is the way to go here.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHQ25/2ijCOnn/RHQRAmhhAJsHRF7FqO8DWwZ97gHxIO/i4Z1AAQCffCGa
Q2J8kjthKbbNQf1USWMAw3Y=
=xl/a
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv5 5/5] FD_CLOEXEC support for eventfd, signalfd, timerfd

2007-11-20 Thread Ulrich Drepper
This patch adds support to set the FD_CLOEXEC flag for the file descriptors
returned by eventfd, signalfd, timerfd.

 fs/anon_inodes.c  |   15 +++
 fs/eventfd.c  |5 +++--
 fs/signalfd.c |6 --
 fs/timerfd.c  |6 --
 include/asm-x86/ia32_unistd.h |3 +++
 include/linux/anon_inodes.h   |3 +++
 include/linux/indirect.h  |3 +++
 7 files changed, 31 insertions(+), 10 deletions(-)


--- linux/include/linux/indirect.h
+++ linux/include/linux/indirect.h
@@ -40,5 +40,8 @@ union indirect_params {
 #if INDSYSCALL(socketpair)
   case INDSYSCALL(socketpair):
 #endif
+  case INDSYSCALL(eventfd):
+  case INDSYSCALL(signalfd):
+  case INDSYSCALL(timerfd):
 
 #endif
--- linux/fs/anon_inodes.c
+++ linux/fs/anon_inodes.c
@@ -70,9 +70,9 @@ static struct dentry_operations 
anon_inodefs_dentry_operations = {
  * hence saving memory and avoiding code duplication for the file/inode/dentry
  * setup.
  */
-int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile,
-const char *name, const struct file_operations *fops,
-void *priv)
+int anon_inode_getfd_flags(int *pfd, struct inode **pinode, struct file 
**pfile,
+  const char *name, const struct file_operations *fops,
+  void *priv, int flags)
 {
struct qstr this;
struct dentry *dentry;
@@ -85,7 +85,7 @@ int anon_inode_getfd(int *pfd, struct inode **pinode, struct 
file **pfile,
if (!file)
return -ENFILE;
 
-   error = get_unused_fd();
+   error = get_unused_fd_flags(flags);
if (error < 0)
goto err_put_filp;
fd = error;
@@ -138,6 +138,13 @@ err_put_filp:
put_filp(file);
return error;
 }
+
+int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile,
+const char *name, const struct file_operations *fops,
+void *priv)
+{
+   return anon_inode_getfd_flags(pfd, pinode, pfile, name, fops, priv, 0);
+}
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
 
 /*
--- linux/include/linux/anon_inodes.h
+++ linux/include/linux/anon_inodes.h
@@ -8,6 +8,9 @@
 #ifndef _LINUX_ANON_INODES_H
 #define _LINUX_ANON_INODES_H
 
+int anon_inode_getfd_flags(int *pfd, struct inode **pinode, struct file 
**pfile,
+  const char *name, const struct file_operations *fops,
+  void *priv, int flags);
 int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile,
 const char *name, const struct file_operations *fops,
 void *priv);
--- linux/fs/eventfd.c
+++ linux/fs/eventfd.c
@@ -215,8 +215,9 @@ asmlinkage long sys_eventfd(unsigned int count)
 * When we call this, the initialization must be complete, since
 * anon_inode_getfd() will install the fd.
 */
-   error = anon_inode_getfd(&fd, &inode, &file, "[eventfd]",
-&eventfd_fops, ctx);
+   error = anon_inode_getfd_flags(&fd, &inode, &file, "[eventfd]",
+  &eventfd_fops, ctx,
+  INDIRECT_PARAM(file_flags, flags));
if (!error)
return fd;
 
--- linux/fs/signalfd.c
+++ linux/fs/signalfd.c
@@ -224,8 +224,10 @@ asmlinkage long sys_signalfd(int ufd, sigset_t __user 
*user_mask, size_t sizemas
 * When we call this, the initialization must be complete, since
 * anon_inode_getfd() will install the fd.
 */
-   error = anon_inode_getfd(&ufd, &inode, &file, "[signalfd]",
-&signalfd_fops, ctx);
+   error = anon_inode_getfd_flags(&ufd, &inode, &file,
+  "[signalfd]", &signalfd_fops,
+  ctx, INDIRECT_PARAM(file_flags,
+  flags));
if (error)
goto err_fdalloc;
} else {
--- linux/fs/timerfd.c
+++ linux/fs/timerfd.c
@@ -182,8 +182,10 @@ asmlinkage long sys_timerfd(int ufd, int clockid, int 
flags,
 * When we call this, the initialization must be complete, since
 * anon_inode_getfd() will install the fd.
 */
-   error = anon_inode_getfd(&ufd, &inode, &file, "[timerfd]",
-&timerfd_fops, ctx);
+   error = anon_inode_getfd_flags(&ufd, &inode, &file, "[timerfd]",
+  &timerfd_fops, ctx,
+  INDIRECT_PARAM(file_flags,
+ flags));
if (error)
goto err_tmrcancel;
} else {
--- linux/include/

[PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets

2007-11-20 Thread Ulrich Drepper
This patch adds support for setting the O_NONBLOCK flag of the file
descriptors returned by socket, socketpair, and accept.

 socket.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)


--- linux/net/socket.c
+++ linux/net/socket.c
@@ -362,7 +362,7 @@ static int sock_alloc_fd(struct file **filep, int flags)
return fd;
 }
 
-static int sock_attach_fd(struct socket *sock, struct file *file)
+static int sock_attach_fd(struct socket *sock, struct file *file, int flags)
 {
struct dentry *dentry;
struct qstr name = { .name = "" };
@@ -384,7 +384,7 @@ static int sock_attach_fd(struct socket *sock, struct file 
*file)
init_file(file, sock_mnt, dentry, FMODE_READ | FMODE_WRITE,
  &socket_file_ops);
SOCK_INODE(sock)->i_fop = &socket_file_ops;
-   file->f_flags = O_RDWR;
+   file->f_flags = O_RDWR | (flags & O_NONBLOCK);
file->f_pos = 0;
file->private_data = sock;
 
@@ -397,7 +397,7 @@ static int sock_map_fd_flags(struct socket *sock, int flags)
int fd = sock_alloc_fd(&newfile, flags);
 
if (likely(fd >= 0)) {
-   int err = sock_attach_fd(sock, newfile);
+   int err = sock_attach_fd(sock, newfile, flags);
 
if (unlikely(err < 0)) {
put_filp(newfile);
@@ -1268,12 +1268,14 @@ asmlinkage long sys_socketpair(int family, int type, 
int protocol,
goto out_release_both;
}
 
-   err = sock_attach_fd(sock1, newfile1);
+   err = sock_attach_fd(sock1, newfile1,
+INDIRECT_PARAM(file_flags, flags));
if (unlikely(err < 0)) {
goto out_fd2;
}
 
-   err = sock_attach_fd(sock2, newfile2);
+   err = sock_attach_fd(sock2, newfile2,
+INDIRECT_PARAM(file_flags, flags));
if (unlikely(err < 0)) {
fput(newfile1);
goto out_fd1;
@@ -1423,7 +1425,8 @@ asmlinkage long sys_accept(int fd, struct sockaddr __user 
*upeer_sockaddr,
goto out_put;
}
 
-   err = sock_attach_fd(newsock, newfile);
+   err = sock_attach_fd(newsock, newfile,
+INDIRECT_PARAM(file_flags, flags));
if (err < 0)
goto out_fd_simple;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv5 3/5] Allow setting FD_CLOEXEC flag for new sockets

2007-11-20 Thread Ulrich Drepper
This is a first user of sys_indirect.  Several of the socket-related system
calls which produce a file handle now can be passed an additional parameter
to set the FD_CLOEXEC flag.

 include/asm-x86/ia32_unistd.h |1 +
 include/linux/indirect.h  |   27 +++
 net/socket.c  |   21 +
 3 files changed, 41 insertions(+), 8 deletions(-)


diff -u linux/include/linux/indirect.h linux/include/linux/indirect.h
--- linux/include/linux/indirect.h
+++ linux/include/linux/indirect.h
@@ -1,3 +1,4 @@
+#ifndef INDSYSCALL
 #ifndef _LINUX_INDIRECT_H
 #define _LINUX_INDIRECT_H
 
@@ -13,5 +14,31 @@
+  struct {
+int flags;
+  } file_flags;
 };
 
 #define INDIRECT_PARAM(set, name) current->indirect_params.set.name
 
 #endif
+#else
+
+/* Here comes the list of system calls which can be called through
+   sys_indirect.  When the list if support system calls is needed the
+   file including this header is supposed to define a macro "INDSYSCALL"
+   which adds a prefix fitting to the use.  If the resulting macro is
+   defined we generate a line
+   case MACRO:
+   */
+#if INDSYSCALL(accept)
+  case INDSYSCALL(accept):
+#endif
+#if INDSYSCALL(socket)
+  case INDSYSCALL(socket):
+#endif
+#if INDSYSCALL(socketcall)
+  case INDSYSCALL(socketcall):
+#endif
+#if INDSYSCALL(socketpair)
+  case INDSYSCALL(socketpair):
+#endif
+
+#endif
--- linux/include/asm-x86/ia32_unistd.h
+++ linux/include/asm-x86/ia32_unistd.h
@@ -12,6 +12,7 @@
 #define __NR_ia32_exit   1
 #define __NR_ia32_read   3
 #define __NR_ia32_write  4
+#define __NR_ia32_socketcall   102
 #define __NR_ia32_sigreturn119
 #define __NR_ia32_rt_sigreturn 173
 
diff -u linux/net/socket.c linux/net/socket.c
--- linux/net/socket.c
+++ linux/net/socket.c
@@ -344,11 +344,11 @@
  * but we take care of internal coherence yet.
  */
 
-static int sock_alloc_fd(struct file **filep)
+static int sock_alloc_fd(struct file **filep, int flags)
 {
int fd;
 
-   fd = get_unused_fd();
+   fd = get_unused_fd_flags(flags);
if (likely(fd >= 0)) {
struct file *file = get_empty_filp();
 
@@ -391,10 +391,10 @@
return 0;
 }
 
-int sock_map_fd(struct socket *sock)
+static int sock_map_fd_flags(struct socket *sock, int flags)
 {
struct file *newfile;
-   int fd = sock_alloc_fd(&newfile);
+   int fd = sock_alloc_fd(&newfile, flags);
 
if (likely(fd >= 0)) {
int err = sock_attach_fd(sock, newfile);
@@ -409,6 +409,11 @@
return fd;
 }
 
+int sock_map_fd(struct socket *sock)
+{
+   return sock_map_fd_flags(sock, 0);
+}
+
 static struct socket *sock_from_file(struct file *file, int *err)
 {
if (file->f_op == &socket_file_ops)
@@ -1208,7 +1213,7 @@
if (retval < 0)
goto out;
 
-   retval = sock_map_fd(sock);
+   retval = sock_map_fd_flags(sock, INDIRECT_PARAM(file_flags, flags));
if (retval < 0)
goto out_release;
 
@@ -1249,13 +1254,13 @@
if (err < 0)
goto out_release_both;
 
-   fd1 = sock_alloc_fd(&newfile1);
+   fd1 = sock_alloc_fd(&newfile1, INDIRECT_PARAM(file_flags, flags));
if (unlikely(fd1 < 0)) {
err = fd1;
goto out_release_both;
}
 
-   fd2 = sock_alloc_fd(&newfile2);
+   fd2 = sock_alloc_fd(&newfile2, INDIRECT_PARAM(file_flags, flags));
if (unlikely(fd2 < 0)) {
err = fd2;
put_filp(newfile1);
@@ -1411,7 +1416,7 @@
 */
__module_get(newsock->ops->owner);
 
-   newfd = sock_alloc_fd(&newfile);
+   newfd = sock_alloc_fd(&newfile, INDIRECT_PARAM(file_flags, flags));
if (unlikely(newfd < 0)) {
err = newfd;
sock_release(newsock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv5 1/5] actual sys_indirect code

2007-11-20 Thread Ulrich Drepper
This is the actual architecture-independent part of the system call
implementation.

 include/linux/indirect.h |   17 +
 include/linux/sched.h|4 
 include/linux/syscalls.h |4 
 kernel/Makefile  |3 +++
 kernel/indirect.c|   40 
 5 files changed, 68 insertions(+)


diff -u linux/include/linux/indirect.h linux/include/linux/indirect.h
--- linux/include/linux/indirect.h
+++ linux/include/linux/indirect.h
@@ -0,0 +1,17 @@
+#ifndef _LINUX_INDIRECT_H
+#define _LINUX_INDIRECT_H
+
+#include 
+
+
+/* IMPORTANT:
+   All the elements of this union must be neutral to the word size
+   and must not require reworking when used in compat syscalls.  Used
+   fixed-size types or types which are known to not vary in size across
+   architectures.  */
+union indirect_params {
+};
+
+#define INDIRECT_PARAM(set, name) current->indirect_params.set.name
+
+#endif
diff -u linux/kernel/Makefile linux/kernel/Makefile
--- linux/kernel/Makefile
+++ linux/kernel/Makefile
@@ -57,6 +57,7 @@
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_MARKERS) += marker.o
+obj-$(CONFIG_ARCH_HAS_INDIRECT_SYSCALLS) += indirect.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <[EMAIL PROTECTED]>, the -fno-omit-frame-pointer is
@@ -67,6 +68,8 @@
 CFLAGS_sched.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
+CFLAGS_indirect.o = -Wno-undef
+
 $(obj)/configs.o: $(obj)/config_data.h
 
 # config_data.h contains the same information as ikconfig.h but gzipped.
diff -u linux/kernel/indirect.c linux/kernel/indirect.c
--- linux/kernel/indirect.c
+++ linux/kernel/indirect.c
@@ -0,0 +1,40 @@
+#include 
+#include 
+#include 
+#include 
+
+
+asmlinkage long sys_indirect(struct indirect_registers __user *userregs,
+void __user *userparams, size_t paramslen,
+int flags)
+{
+   struct indirect_registers regs;
+   long result;
+
+   if (unlikely(flags != 0))
+   return -EINVAL;
+
+   if (copy_from_user(®s, userregs, sizeof(regs)))
+   return -EFAULT;
+
+   switch (INDIRECT_SYSCALL (®s))
+   {
+#define INDSYSCALL(name) __NR_##name
+#include 
+   break;
+
+   default:
+   return -EINVAL;
+   }
+
+   if (paramslen > sizeof(union indirect_params))
+   return -EINVAL;
+
+   result = -EFAULT;
+   if (!copy_from_user(¤t->indirect_params, userparams, paramslen))
+   result = call_indirect(®s);
+
+   memset(¤t->indirect_params, '\0', paramslen);
+
+   return result;
+}
diff -u linux/include/linux/syscalls.h linux/include/linux/syscalls.h
--- linux/include/linux/syscalls.h
+++ linux/include/linux/syscalls.h
@@ -54,6 +54,7 @@
 struct compat_timeval;
 struct robust_list_head;
 struct getcpu_cache;
+struct indirect_registers;
 
 #include 
 #include 
@@ -611,6 +612,9 @@
const struct itimerspec __user *utmr);
 asmlinkage long sys_eventfd(unsigned int count);
 asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);
+asmlinkage long sys_indirect(struct indirect_registers __user *userregs,
+void __user *userparams, size_t paramslen,
+int flags);
 
 int kernel_execve(const char *filename, char *const argv[], char *const 
envp[]);
 
--- linux/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -80,6 +80,7 @@ struct sched_param {
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1174,6 +1175,9 @@ struct task_struct {
int make_it_fail;
 #endif
struct prop_local_single dirties;
+
+   /* Additional system call parameters.  */
+   union indirect_params indirect_params;
 };
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv5 0/5] sys_indirect system call

2007-11-20 Thread Ulrich Drepper
The following patches provide an alternative implementation of the
sys_indirect system call which has been discussed a few times.
This is a system call that allows us to extend existing system call
interfaces by adding more system call parameters.

Davide's previous implementation is IMO far more complex than
warranted.  This code here is trivial, as you can see.  I've
discussed this approach with Linus recently and for a brief moment
we actually agreed on something.

We pass an additional block of data to the kernel, it is copied into
the task_struct, and then it is up to the function implementing the system
call to interpret the data.  Each system call, which is meant to be
extended this way, has to be white-listed in sys_indirect.  The
alternative is to filter out those system calls which absolutely cannot
be handled using sys_indirect (like clone, execve) since they require
the stack layout of an ordinary system call.  This is more dangerous
since it is too easy to miss a call.

Note that the sys_indirect system call takes an additional parameter which
is for now forced to be zero.  This parameter is meant to enable the use
of sys_indirect to create syslets, asynchronously executed system calls.
This syslet approach is also the main reason for the interface in the form
proposed here.

The code for x86 and x86-64 gets by without a single line of assembly
code.  This is likely to be true for many other archs as well.
There is architecture-dependent code, though.

The last three patches show the first application of the functionality.
They also show a complication: we need the test for valid sub-syscalls in the
main implementation and in the compatibility code.  And more: the actual
sources and generated binary for the test are very different (the numbers
differ).  Duplicating the information is a big problem, though.  I've used
some macro tricks to avoid this.  All the information about the flags and
the system calls using them is concentrated in one header.  This should
keep maintenance bearable.

This patch to use sys_indirect is just the beginning.  More will follow,
but I want to see how these patches are received before I spend more time
on it.  This code is enough to test the implementation with the following
test program.  Adjust it for architectures other than x86 and x86-64.

What is not addressed are differences in opinion about the whole approach.
Maybe Linus can chime in a defend what is basically his design.


#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

typedef uint32_t __u32;
typedef uint64_t __u64;

union indirect_params {
  struct {
int flags;
  } file_flags;
};

#ifdef __x86_64__
# define __NR_indirect 286
struct indirect_registers {
  __u64 rax;
  __u64 rdi;
  __u64 rsi;
  __u64 rdx;
  __u64 r10;
  __u64 r8;
  __u64 r9;
};
#elif defined __i386__
# define __NR_indirect 325
struct indirect_registers {
  __u32 eax;
  __u32 ebx;
  __u32 ecx;
  __u32 edx;
  __u32 esi;
  __u32 edi;
  __u32 ebp;
};
#else
# error "need to define __NR_indirect and struct indirect_params"
#endif

#define FILL_IN(var, values...) \
  var = (struct indirect_registers) { values }

int
main (void)
{
  int fd = socket (AF_INET, SOCK_DGRAM, IPPROTO_IP);
  int s1 = fcntl (fd, F_GETFD);
  int t1 = fcntl (fd, F_GETFL);
  printf ("old: FD_CLOEXEC %s set, NONBLOCK %s set\n",
  s1 == 0 ? "not" : "is", (t1 & O_NONBLOCK) ? "is" : "not");
  close (fd);

  union indirect_params i;
  memset(&i, '\0', sizeof(i));
  i.file_flags.flags = O_CLOEXEC|O_NONBLOCK;

  struct indirect_registers r;
#ifdef __NR_socketcall
# define SOCKOP_socket   1
  long args[3] = { AF_INET, SOCK_DGRAM, IPPROTO_IP };
  FILL_IN (r, __NR_socketcall, SOCKOP_socket, (long) args);
#else
  FILL_IN (r, __NR_socket, AF_INET, SOCK_DGRAM, IPPROTO_IP);
#endif

  fd = syscall (__NR_indirect, &r, &i, sizeof (i), 0);
  int s2 = fcntl (fd, F_GETFD);
  int t2 = fcntl (fd, F_GETFL);
  printf ("new: FD_CLOEXEC %s set, NONBLOCK %s set\n",
  s2 == 0 ? "not" : "is", (t2 & O_NONBLOCK) ? "is" : "not");
  close (fd);

  i.file_flags.flags = O_CLOEXEC;
  sigset_t ss;
  sigemptyset(&ss);
  FILL_IN(r, __NR_signalfd, -1, (long) &ss, 8);
  fd = syscall (__NR_indirect, &r, &i, sizeof (i), 0);
  int s3 = fcntl (fd, F_GETFD);
  printf ("signalfd: FD_CLOEXEC %s set\n", s3 == 0 ? "not" : "is");
  close (fd);

  FILL_IN(r, __NR_eventfd, 8);
  fd = syscall (__NR_indirect, &r, &i, sizeof (i), 0);
  int s4 = fcntl (fd, F_GETFD);
  printf ("eventfd: FD_CLOEXEC %s set\n", s4 == 0 ? "not" : "is");
  close (fd);

  return s1 != 0 || s2 == 0 || t1 != 0 || t2 == 0 || s3 == 0 || s4 == 0;
}

[PATCHv5 2/5] x86&x86-64 support for sys_indirect

2007-11-20 Thread Ulrich Drepper
This part adds support for sys_indirect on x86 and x86-64.

 arch/x86/Kconfig   |3 ++
 arch/x86/ia32/Makefile |1 
 arch/x86/ia32/ia32entry.S  |2 +
 arch/x86/ia32/sys_ia32.c   |   38 +
 arch/x86/kernel/syscall_table_32.S |1 
 include/asm-x86/indirect.h |5 
 include/asm-x86/indirect_32.h  |   25 
 include/asm-x86/indirect_64.h  |   36 +++
 include/asm-x86/unistd_32.h|3 +-
 include/asm-x86/unistd_64.h|2 +
 10 files changed, 115 insertions(+), 1 deletion(-)


--- linux/arch/x86/Kconfig
+++ linux/arch/x86/Kconfig
@@ -112,6 +112,9 @@ config GENERIC_TIME_VSYSCALL
bool
default X86_64
 
+config ARCH_HAS_INDIRECT_SYSCALLS
+   def_bool y
+
 
 
 
diff -u linux/include/asm-x86/indirect_32.h linux/include/asm-x86/indirect_32.h
--- linux/include/asm-x86/indirect_32.h
+++ linux/include/asm-x86/indirect_32.h
@@ -0,0 +1,25 @@
+#ifndef _ASM_X86_INDIRECT_32_H
+#define _ASM_X86_INDIRECT_32_H
+
+struct indirect_registers {
+   __u32 eax;
+   __u32 ebx;
+   __u32 ecx;
+   __u32 edx;
+   __u32 esi;
+   __u32 edi;
+   __u32 ebp;
+};
+
+#define INDIRECT_SYSCALL(regs) (regs)->eax
+
+static inline long call_indirect(struct indirect_registers *regs)
+{
+  extern long (*sys_call_table[]) (__u32, __u32, __u32, __u32, __u32, __u32);
+
+  return sys_call_table[INDIRECT_SYSCALL(regs)](regs->ebx, regs->ecx,
+   regs->edx, regs->esi,
+   regs->edi, regs->ebp);
+}
+
+#endif
diff -u linux/include/asm-x86/indirect_64.h linux/include/asm-x86/indirect_64.h
--- linux/include/asm-x86/indirect_64.h
+++ linux/include/asm-x86/indirect_64.h
@@ -0,0 +1,36 @@
+#ifndef _ASM_X86_INDIRECT_64_H
+#define _ASM_X86_INDIRECT_64_H
+
+struct indirect_registers {
+   __u64 rax;
+   __u64 rdi;
+   __u64 rsi;
+   __u64 rdx;
+   __u64 r10;
+   __u64 r8;
+   __u64 r9;
+};
+
+struct indirect_registers32 {
+   __u32 eax;
+   __u32 ebx;
+   __u32 ecx;
+   __u32 edx;
+   __u32 esi;
+   __u32 edi;
+   __u32 ebp;
+};
+
+#define INDIRECT_SYSCALL(regs) (regs)->rax
+#define INDIRECT_SYSCALL32(regs) (regs)->eax
+
+static inline long call_indirect(struct indirect_registers *regs)
+{
+  extern long (*sys_call_table[]) (__u64, __u64, __u64, __u64, __u64, __u64);
+
+  return sys_call_table[INDIRECT_SYSCALL(regs)](regs->rdi, regs->rsi,
+   regs->rdx, regs->r10,
+   regs->r8, regs->r9);
+}
+
+#endif
diff -u linux/arch/x86/ia32/sys_ia32.c linux/arch/x86/ia32/sys_ia32.c
--- linux/arch/x86/ia32/sys_ia32.c
+++ linux/arch/x86/ia32/sys_ia32.c
@@ -889,0 +890,38 @@
+
+asmlinkage long sys32_indirect(struct indirect_registers32 __user *userregs,
+  void __user *userparams, size_t paramslen,
+  int flags)
+{
+   extern long (*ia32_sys_call_table[])(u32, u32, u32, u32, u32, u32);
+
+   struct indirect_registers32 regs;
+   long result;
+
+   if (flags != 0)
+   return -EINVAL;
+
+   if (copy_from_user(®s, userregs, sizeof(regs)))
+   return -EFAULT;
+
+   switch (INDIRECT_SYSCALL32(®s))
+   {
+#define INDSYSCALL(name) __NR_ia32_##name
+#include 
+   break;
+
+   default:
+   return -EINVAL;
+   }
+
+   if (paramslen > sizeof(union indirect_params))
+   return -EINVAL;
+   result = -EFAULT;
+   if (!copy_from_user(¤t->indirect_params, userparams, paramslen))
+   result = ia32_sys_call_table[regs.eax](regs.ebx, regs.ecx,
+  regs.edx, regs.esi,
+  regs.edi, regs.ebp);
+
+   memset(¤t->indirect_params, '\0', paramslen);
+
+   return result;
+}
--- linux/arch/x86/ia32/Makefile
+++ linux/arch/x86/ia32/Makefile
@@ -36,6 +36,7 @@ $(obj)/vsyscall-sysenter.so.dbg 
$(obj)/vsyscall-syscall.so.dbg: \
 $(obj)/vsyscall-%.so.dbg: $(src)/vsyscall.lds $(obj)/vsyscall-%.o FORCE
$(call if_changed,syscall)
 
+CFLAGS_sys_ia32.o = -Wno-undef
 AFLAGS_vsyscall-sysenter.o = -m32 -Wa,-32
 AFLAGS_vsyscall-syscall.o = -m32 -Wa,-32
 
--- linux/arch/x86/ia32/ia32entry.S
+++ linux/arch/x86/ia32/ia32entry.S
@@ -400,6 +400,7 @@ END(ia32_ptregs_common)
 
.section .rodata,"a"
.align 8
+   .globl ia32_sys_call_table
 ia32_sys_call_table:
.quad sys_restart_syscall
.quad sys_exit
@@ -726,4 +727,5 @@ ia32_sys_call_table:
.quad compat_sys_timerfd
.quad sys_eventfd
.quad sys32_fallocate
+   .quad sys32_indirect/* 325  */
 ia32_syscall_end:
--- linux/arch/x86/kernel/syscall_table_32.S
+++ linux/arch/x86

Re: Where is the new timerfd?

2007-11-23 Thread Ulrich Drepper
On Nov 23, 2007 9:29 AM, Davide Libenzi <[EMAIL PROTECTED]> wrote:
> Yes, it's disabled, and yes, I'll repost today ...

I haven't seen the patch and don't feel like searching.  So I say it
here: please mak sure you add a flags parameter to the system call
itself (instead of adding it on as for eventfd and signalfd).  We need
to be able to use O_CLOEXEC some way or another.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets

2007-11-23 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Eric Dumazet wrote:
> 1) Can the fd passing with recvmsg() on AF_UNIX also gets O_CLOEXEC
> support ?

Already there, see MSG_CMSG_CLOEXEC.


> 2) Why this O_NONBLOCK ability is needed for sockets ? Is it a security
> issue, and if yes could you remind it to me ?

No security issue.  But look at any correct network program, all need to
set the mode to non-blocking.  Adding this support to the syscall comes
at almost no cost and it cuts the cost for every program down by one or
two syscalls.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHR9YQ2ijCOnn/RHQRArbyAJ0d25FPg/BWmJ4YIzJKhO9iaBJNXwCgmpuX
PAA6u3Dc56AlBegTRqtqJPc=
=j5vi
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets

2007-11-24 Thread Ulrich Drepper
On Nov 24, 2007 12:28 AM, Eric Dumazet <[EMAIL PROTECTED]> wrote:
> OK, but maybe for consistency, we might accept the two mechanisms.

It's not a question of the kernel interface.  The issue with all these
extensions is the userlevel interface.  Ideally no new userlevel
interface is needed.  This is the case for open() and incidentally
also for this case (through the flags parameter for recvmsg).  For
socket(), accept(), the situation is unfortunately different and we
need a new interface.

With your proposed patch, we would have to introduce another recvmsg()
interface to take advantage of the additional functionality.  This
just doesn't make any sense.  This is no contest in aesthetics.  You
first have to think about the interface presented to the programmer at
userlevel and then design the syscall interface.  This is how
MSG_CMSG_CLOEXEC came about.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 5/6] Allow setting O_NONBLOCK flag for new sockets

2007-11-26 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

H. Peter Anvin wrote:
> The 6-word limit is a red herring.  There is at least two ways to deal
> with it (and this doesn't mean wiping the legacy stuff we already have):
> 
> - Let each architecture pick a calling convention and redefine the
> architecture-independent bits to take an arbitrary number of arguments.
>  This is a one-time panarchitectural change.
> [...]

Just think beyond wishful thinking for a moment.  What does it take to
come up with something completely new and grand?

Let's start at the basic: you need to signal that the new syscall
calling convention is used.  Since the syscall entry code is limited (at
least the likes of syscall/sysenter, it would be easy enough to use int
$0x81 in addition to int $0x80) you would have to extend the use of the
syscall number while keeping binary compatibility.  This means
additional costs for every single syscall.

Once you're past that, how do you implement the expandable syscall
parameter count?  There are two ways:

- - pass to the real sys_* implementations the number of provided syscall
parameters and have each function figure out what this means

- - dynamically construct a call to the sys_* functions where the syscall
magic adds an appropriate number of parameters filled with zeros.  This
is quite complicated and, more importantly, it requires that you have
code/data somewhere which specifies how many parameters each of the
sys_* function actually requires.  The actual sys_* code and the data
has to be kept in sync at all times.  A maintenance nightmare.


The handling of syscalls with many parameters should not at all be a
driver of this design at all.  Syscalls shouldn't be that complicated, I
completely agree with ingo.


I'm perfectly willing to give you the benefit of doubt, show us a design
for what you're proposing which is not slower than the current code,
doesn't impact existing code, and solves the problem in a nice and clean
way.  I cannot really see it now but I might miss something.  The
sys_indirect approach ain't pretty but it does it jobs, doesn't impact
performance, and is expandable in direction we *know* we will want to go
very soon.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHS1X12ijCOnn/RHQRAihRAJwLNJ9fT8GTv6MAoO6RZGOub07sGgCdGBLR
frXyQVB8Oh5VgWY5YJhpitg=
=FuBx
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 5/6] Allow setting O_NONBLOCK flag for new sockets

2007-11-26 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

H. Peter Anvin wrote:
> No.
> 
> I already said I'm not looking at changing the calling convention for
> existing syscalls.

I did not suggest or ask for that at all.

I was asking you to consider the real implementation details for a new
syscall mechanism.

We do not want to abandon the use of syscall/sysenter and go back to int
(on x86/x86-64).  This means that you have to come up with a mechanism
which hooks into the current syscall/sysenter path while preserving full
backward compatibility.

Now it's your turn.  How do you do this without additional costs?


> Hardly so, as evidenced by the fact that we have successfully done so
> for 15 years already; a number of Linux architectures require this
> information for the existing system calls.

Nothing at this scale is there in the moment, as far as I can see.  And
nothing so critical for getting right.

Talk is cheap.  You still haven't shown one bit if design how you want
to achieve your grand goal.  The time for hand-waiving is over.  Do some
work or step out of the way.  Nothing you have said so far in the least
convinces me and your arguments like "sys_indirect adds parameters" are
not really contested.  Yes, that's what sys_indirect does.  So what?  It
does this with almost no cost which outweighs the ugliness factor in my
book.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHS2gQ2ijCOnn/RHQRAlN5AKCWZQL97sROWBv33//Uj/MN+CNi3gCdFgCU
uLVEOfclERpakp1kdYzy2oI=
=stVB
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Per-thread getrusage

2008-01-17 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Vinay Sridhar wrote:
> There are two ways to implement this in the kernel:
> 1) Introduce an additional parameter 'tid' to sys_getrusage() and put
> code in glibc to handle getrusage() and pthread_getrusage() calls
> correctly.
> 2) Introduce a new system call to handle pthread_getrusage() and leave
> sys_getrusage() untouched.

You're doing two things at once:

a) provide a way to get a thread's usage

b) provide a way to get another process's/thread's usage


The former is a trivial extension and I completely agree.  RUSAGE_THREAD
is trivial to implement and should go in ASAP.

The second part isn't that easy.  The first question is: do we really
need this?  It is a new type of interface.  We have the /proc filesystem
etc for programs which want to look at other process' data.  Second,
more importantly right now, your patch seems not to include any security
support.  Correct me if I'm wrong, but find_task_by_pid will always
succeed, regardless of whether the calling thread belongs to another UID
or not.  I.e., your patch enables any process to read any other process'
usage.  That's a no-no.


I suggest that you split the patch in two.  The first should implement
RUSAGE_THREAD.  You'll immediately get an ACK from me for that.  The
second part then should introduce a way to get another process' usage.
This patch should only be used initially as a starting point for
discussions.  You'll have to argue why it is necessary in the first place.

The argument might have to do with why you want a pthread_getrusage()
interface (which, btw, is a bad name since the interface is nothing like
getrusage, getrusage doesn't allow requesting any other process' data).
 Yes, for intra-process lookups relying on /proc is no good idea.  But
then, I have not seen any reason so far why such an API is needed and
why a thread cannot just be responsible for reading its own usage data.
 Anyway, if pthread_getrusage (or whatever it'll be called) is the only
usage then the syscall should require that the TID parameter is from a
thread in the same process which would solve the security problem.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHj3do2ijCOnn/RHQRAiKdAKCSooiEWcxr780hJGenElyDiWPWKgCdE+6Y
j6ibmGsPT4aYxhSfpimSdiw=
=jOC9
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sigwait() and 2.6

2005-02-15 Thread Ulrich Drepper
On Tue, 15 Feb 2005 13:58:28 +0100, Yves Crespin
<[EMAIL PROTECTED]> wrote:
>ThreadUnblockSignal();
>signo = WaitSignal();
>ThreadBlockSignal();

You expect this to work?  Just read the POSIX spec or even the man
pages.  All signals sigwait() waits for must be blocked before the
call.  You deliberately do the opposite.  Swap the ThreadUnblockSignal
and ThreadBlockSignal lines and suddenly the program doesn't crash
anymore.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: close-exec flag not working in 2.6.9?

2005-01-31 Thread Ulrich Drepper
On Sun, 30 Jan 2005 23:56:07 -0800, Ben Greear <[EMAIL PROTECTED]> wrote:
>flags = fcntl(s, F_GETFL);
>flags |= (FD_CLOEXEC);
>if (fcntl(s, F_SETFL, flags) < 0) {

These have to be F_GETFD and F_SETFD respectively.  Note L -> D.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: short read from /dev/urandom

2005-01-15 Thread Ulrich Drepper
Matt Mackall wrote:
_Neither_ case mentions signals and the "and will return as many bytes
as requested" is clearly just a restatement of "does not have this
limit". Whoever copied this comment to the manpage was a bit sloppy
and dropped the first clause rather than the second:
It still means the documented API says there are no short reads.

So anyone doing a read() can expect a short read regardless of the fd
and is quite clear that reads can be interrupted by signals. "It is
not an error". Ever.
Of course are signal interruptions wrong if the signal uses SA_RESTART.
--
â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â


signature.asc
Description: OpenPGP digital signature


Re: Pollable Semaphores

2005-01-21 Thread Ulrich Drepper
On Fri, 21 Jan 2005 17:17:51 -0600, Brent Casavant <[EMAIL PROTECTED]> wrote:

>   2. select/poll on the fd return EWOULDBLOCK if the current value of
>  the futex is not equal to the value of interest.  Otherwise it
>  behaves as FUTEX_FD currently does.

This is the problematic part.  The expected value, as you suggested,
can be handled with a write() and since the expected value is often
constant, this is a low-overhead method.

But the poll() interface is not so easy.  You cannot change the poll()
semantic to return such an error.  It makes really no sense.

What I thought could be done is to define instead a new POLL* constant
which signals the EWOULDBLOCK condition of the futex() syscall in the
revents member.  The poll/epoll syscall would do it's normal work and
just fill all the appropriate revents.  A futex value mismatch would
mean the call is not blocking at all, just as available data would be
for POLLIN.

For select, I would use the exception bitmap.  The bit is set for
futex fds in the EWOULDBLOCK case.

All this _could_ work.  But we've been bitten quite a few times in the
past.  There might be special cases which may need at least some
additional functionality.  This should be taken into account in the
original design.

So, if people are interested in this, code something up and try it. 
Stress it as much as you can.  I would oppose adding any new futex
interface created at a hunch if I'd be Andrew.

And is another thing to consider.  There is at least one other event
which should be pollable: process (maybe threads) deaths.  I was
hoping that we get support for this, perhaps in the form of polling
the /proc/PID directory.  For poll(), a POLLERR value could mean the
process/thread died.  For select(), once again a  bit in the except
array could be set.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >