olafbuddenha...@gmx.net wrote:
In order to track all tasks in subhurd, boot works as a proxy for all
RPCs on the task port,
[...]
However, it seems to be the source of the most serious bug in my
modified boot.
BUG: After I added the proxy for all RPCs to 'boot', I find that
subhurd sometimes failed to boot. For example, it sometimes stops
booting after the system displays "GNU 0.3 (hurd) (console)" and it
sometimes boots successfully and displays "login>" but stops working
after I try to login. Sometimes, it even prints the error message
like
getty[47]: /bin/login: No such file or directory Use `login USER'
to login, or `help' for more information.
Of course, sometimes subhurd can boot and I can login successfully.
Sounds like some kind of race condition... But I don't know where.
You could try tracing all RPCs made to the proxy (using some logging
mechanism in the proxy itself, or perhaps rpctrace), and comparing the
results of various runs...
As I mentioned before, the subhurd sometimes hangs. I think I have found
one of the places where subhurd hangs.
The boot now proxies all RPCs that are sent on the task port.
The proxy works in a signal thread and it only forwards the requests of
most RPCs and their replies are sent back to subhurd by the kernel. But
task_create, vm_set_default_memory_manager, processor_set_tasks and
host_processor_set_priv are handled by the proxy and their replies are
sent back directly.
One place where subhurd hangs is when the exec server calls vm_map at
some point. The proxy fails to forward the request of vm_map and
mach_msg is blocked.
The code of forwarding messages is as follows:
debug ("request %d to %d, real target: %d", inp->msgh_id, target,
task_pi->task_port);
/* Resend the message to the tracee. */
err = mach_msg (inp, MACH_SEND_MSG | MACH_SEND_TIMEOUT, inp->msgh_size, 0,
MACH_PORT_NULL, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL);
outp->RetCode = MIG_NO_REPLY;
if (err)
{
info ("mach_msg %d to %d: %s", inp->msgh_id, target, strerror (err));
debug ("mach_msg %d to %d: %s", inp->msgh_id, target, strerror (err));
outp->RetCode = err;
}
out:
debug ("request %d to %d ends", inp->msgh_id, target);
I have enabled send timeout and the time to wait before giving up is 0
(I tried some other values, and it didn't seem to work, either).
I don't understand why mach_msg is still blocked even when the send
timeout is enabled?
It is also weird that subhurd hangs only by vm_map called by the exec
server (though I sometimes see the subhurd hang by something else, which
is definitely not the RPCs forwarded by boot).
I am thinking if it has something to do with the memory management.
e.g., some memory is swapped out, but it cannot be read from the disk.
But it should not be possible because the subhurd doesn't have its own
default memory manager and doesn't have its own swap partitions.
Could anyone have any clues why mach_msg is blocked here?
Thank you,
Zheng Da