Re: [RFC PATCH glibc 24/34] hurd: Only check for TLS initialization inside rtld or in static builds

Sergey Bugaev Fri, 14 Apr 2023 01:30:16 -0700

On Fri, Apr 14, 2023 at 12:47 AM Samuel Thibault
<samuel.thiba...@gnu.org> wrote:
> #10 0x0106bc20 in __libc_assert_fail (assertion=0x1222762 "port == arg",
>     file=0x121c2a0 "../sysdeps/mach/hurd/mig-reply.c", line=92,
>     function=0x121c2c4 <__PRETTY_FUNCTION__.0> "__mig_dealloc_reply_port")
>     at __libc_assert_fail.c:31
> #11 0x010283d2 in __GI___mig_dealloc_reply_port (arg=<optimized out>)
>     at ../sysdeps/mach/hurd/mig-reply.c:92
> #12 0x012eb23e in __msg_sig_post (process=27, signal=15, sigcode=0, 
> refport=35)
>     at /usr/src/glibc-upstream/build/hurd/RPC_msg_sig_post.c:160
> #13 0x01074dbc in kill_port (refport=<optimized out>, msgport=<optimized out>)
>     at ../sysdeps/mach/hurd/kill.c:67


Interesting...

So first of all we're wrong, and the signal code does not destroy the
reply port. In fact it takes care to restore it:

  /* Destroy the MiG reply port used by the signal handler, and restore the
     reply port in use by the thread when interrupted.  */
  reply_port = THREAD_GETMEM (THREAD_SELF, reply_port);
  THREAD_SETMEM (THREAD_SELF, reply_port, scp->sc_reply_port);
  if (MACH_PORT_VALID (reply_port))
    __mach_port_destroy (__mach_task_self (), reply_port);

(Yes, that's the very same code we've altered recently, but it also
restored the reply port before my changes.)

See, it destroys the current reply port, and restores
scp->sc_reply_port. So then when we jump back to the user's code
(which should be in intr-msg), it gets MACH_RCV_TIMED_OUT (if that's
what happened), and proceeds to destroy the port itself.

One notable thing here: the sigreturn code (above) uses sc_reply_port
to destroy the current reply port. So:
* if the rcv timed out, we're going to destroy the reply port, but we
  haven't yet, so the server could fake a reply to mach_port_destroy
* if the mach_port_destroy RPC itself fails, we'll dealloc the user's port
  right here, and then it'll mismatch what the user expects it to be

The latter is in fact what's tripping the assertion in your test case.
mach_port_destroy (well, your test case contains my
mach_port_mod_refs) expects to receive a matching reply, but instead
we get a msg_sig_post_reply. We'd have caught this earlier if we
didn't neglect the result of mach_port_destroy.

Q: But why do we get msg_sig_post_reply here, why not in
   rpc_wait_trampoline, *before* running the signal handler?
A: Because msg_sig_post is uninterruptible. (and this is stupid! and so
   frustrating! we should just make it interruptible)

So: we should not use the user's reply port for this RPC. This is my
fault (747812349d42427c835aeac987aa67641d84f1ad). In my defense, there
was no comment explaining why this would be a bad idea; we should
write one. I'll prepare a patch.

But secondly, if we are not destroying the user's reply port, but
restoring it, then I don't think the "port = MACH_PORT_NULL, arg =
stale name" case can happen? So we don't need to handle it?

We should really just assert (arg == port), and this will help us find
more broken code.

Thoughts?

Sergey

Re: [RFC PATCH glibc 24/34] hurd: Only check for TLS initialization inside rtld or in static builds

Reply via email to