Re: [Qemu-devel] glibc "linux: spawni.c: simplify error reporting to parent" breaks qemu-user/Windows Service For Linux

Adhemerval Zanella Mon, 27 Nov 2017 08:08:26 -0800


On 27/11/2017 13:24, Peter Maydell wrote:
> On 27 November 2017 at 12:57, Adhemerval Zanella
> <adhemerval.zane...@linaro.org> wrote:
>> We found out this potential bogus assert on 2.27 development [1] which
>> resulted in two fixes [2][3].
>>
>> It should not be an issue for generic posix_spawn usage where there is
>> no expectation system/user/program kills random pids (since posix_spawn
>> auxiliary process has not yet returned). Some say the possible kind of
>> behaviour is rather undefined, but it shouldn't also trigger an assert.
>>
>> I am not really sure what is happening in qemu usermode because comment
>> #4 in the bug reports states clone is returning an error and it should
>> not trigger the assert in first place.  What seems to be happening in
>> this scenario is clone is actually returning a success, but the auxiliary
>> process is being killed before actually calling execve.
> 
> The bug report is a bit confused, but I think what is happening
> in the QEMU case is that QEMU implements clone(CLONE_VFORK) as having
> the same semantics as fork() (ie the parent will not autowait for
> the child, and the child does not share a memory map with the parent).
> (ie QEMU treats it as having the semantics of a vfork() call, which
> is allowed to be implemented as fork()).


Right, that explains what is happening. 

> Previous versions of glibc's posix_spawn() could cope with this
> divergence from the kernel's native clone() behaviour, but the
> rewrite can't. It's not unreasonable for glibc() to rely on the
> kernel behaviour, but on the other hand it's not too surprising
> if this breaks non-kernel implementations of the syscall ABI
> like QEMU and the MS Linux subsystem, because it's a tricky
> corner case that previously nobody was trying to use.

The problem is vfork is such a broken API [1] that even POSIX has 
deprecated it on the latest 2008 standard. It was used on GLIBC
posix_spawn on some specific usage (old POSIX_SPAWN_USEVFORK 
flag) only because it was 'faster' than using fork, however
it also created its own set bugs [2][3][4][5].

Current implementation is as fast as using vfork on Linux using
which should be platform neutral clone flags and assumptions
(in fact we found out that Linux does not work as expected with
clone (CLONE_VFORK | CLONE_VM) -> exit -> waitpid (WNOHANG)
which resulted in aa95a2414).

GLIBC also maintains another implementation at 
sysdeps/posix/spawni.c which should be more platform neutral
since it uses only POSIX expected semantics (the synchronization
is done using a pipe2 instead of CLONE_VM, so a vfork acting
as fork shouldn't be a problem). It is not used in any architecture
on GLIBC currently.

However I am not very compelled to change internal posix_spawn
on GLIBC on Linux mainly because it uses a slight less resources
than the generic POSIX one (check e83be730910c) and it works
on Linux kernel as expected.

[1] https://ewontfix.com/7/
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=14750
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=14749
[4] https://sourceware.org/bugzilla/show_bug.cgi?id=14499
[5] https://sourceware.org/bugzilla/show_bug.cgi?id=10354

> 
> Unfortunately I can't really think of a mechanism for implementing
> this in QEMU usermode, because the only tools we have available
> for creating new threads and processes are the ones the host libc
> gives us: so we can spawn new threads with pthread_create() and
> fork the process with fork(), but we don't have a safe way to
> create a new process which shares the memory map and where the
> new process can call the various libc functions which QEMU will
> do as it executes the guest code.

Current GLIBC won't trigger any assert anymore (and it was backported
to 2.25 and 2.26 branch as well), however I am not sure if posix_spawn 
semantic will works for all the expected scenarios in qemu user-mode.

Most likely any failure (sched_set{param,scheduler}, setsid, setpgid,
seteuid, any file action or execve itself) won't be advertise to main
process, since err is set 0 as default and the auxiliary process will
write to a expected shared memory to signalling an issue. 

Also, I don't think trying to emulate "CLONE_VM | CLONE_VFORK" with
pthread_create without actually synchronize the threads will work
as expected. If clone actually uses CLONE_VFORK I would expect the
underlying qemu usermode to block the caller thread (using a condition
variable or a barrier) and to release its execution only for execve
or exit in the callee. I am not very versed on qemu code, so I 
am not sure how complex it would be.

Re: [Qemu-devel] glibc "linux: spawni.c: simplify error reporting to parent" breaks qemu-user/Windows Service For Linux

Reply via email to