Hello, Hm, this was always the weakest part of the patch series. And I'm still not satisfied with it.
Quoting Samuel Thibault (2014-12-02 01:51:36) > Justus Winter, le Thu 27 Nov 2014 14:19:08 +0100, a écrit : > > + /* We are the only one updating generation, so this is safe. */ > > + old = generation; > > + > > + /* Update generation. */ > > + __atomic_store_n (&generation, (old + 1) % 3, __ATOMIC_RELAXED); > > + > > + /* This is the garbage generation. As all writers are long > > + gone, we do not need to bother with atomic operations. */ > > It seems unsafe to me: "long" may be not long enough some day, you can > never know with extra-loaded systems, you may see the following: > > - thread reads generation, gets preempted for a "long" time > - GC increases and handles generation > - GC increases and handles generation > - thread (at last!) gets to queue its thing > - GC increase and handles generation > > The last two are then working on the same queue concurrently. > > The GC is not on the critical path anyway, so we can use an atomic > operation there. We can probably just get away with the generation, and > just do something like the following: > [...] I'm not sure we can just drop the generational part and use your code, because what the generation thing tried to ensure (poorly), was that there is always enough time for the notification to be processed. Waiting a fixed amount of time is of course not the right thing to do. So here is the problem: When libports hands out a send right for a receive right X, it asks the kernel for a no-senders notification that is to be send to X. If we now `ports_destroy_right' X, and at the same time the last send right to X is destroyed, there's a chance that said notification is already delivered to X. If we are not using protected payloads, then the receiver is denoted by a port name, it is looked up in a hash table and this fails. Nothing happens. But with protected payloads, we'll get a stale payload that points to a port_info object that has been freed. I see two options how to solve this properly. i) Fix what "long" means, or ii) Fix the no-senders notification. i. Instead of waiting "long" enough, we could wait for all currently running userspace threads to finish whatever they are doing. By incrementing a counter every time a request was processed, we can decide when it is safe to dereference that object similar to how RCU handles deallocation. ii. Add a new notification type `MACH_NOTIFY_NO_SENDERS2' that carries the name of the port similarly to how `MACH_NOTIFY_DEAD_NAME' carries a name. Then we can make libports use a single port to receive all no-sender notifications, the problem of destroying the right that the notification is sent to disappears. A survey of the Hurds source shows that it would be easy to implement this. Currently, the notification protocol is handled in many demuxers, but then they just call `ports_no_senders'. I kind of prefer ii because it is an more obviously correct fix for the problem at hand, even if it means adding a new notification type to Gnumach. But it would work like the MACH_NOTIFY_NO_SENDERS, which would continue to work of course. Justus