olafbuddenha...@gmx.net wrote:
However, it seems to be the source of the most serious bug in my
modified boot.
BUG: After I added the proxy for all RPCs to 'boot', I find that
subhurd sometimes failed to boot. For example, it sometimes stops
booting after the system displays "GNU 0.3 (hurd) (console)" and it
sometimes boots successfully and displays "login>" but stops working
after I try to login. Sometimes, it even prints the error message
like
getty[47]: /bin/login: No such file or directory Use `login USER'
to login, or `help' for more information.
Of course, sometimes subhurd can boot and I can login successfully.
Sounds like some kind of race condition... But I don't know where.
You could try tracing all RPCs made to the proxy (using some logging
mechanism in the proxy itself, or perhaps rpctrace), and comparing the
results of various runs...
I logged all RPCs and tried to analyze them. (antrik, I was wrong. There
aren't 100, 000 RPCs. The number of RPCs to the Mach during the subhurd
booting is about 20,000 - 60,000).
I found something abnormal, but I am not sure if it should be considered
as errors. I list all of errors below:
* Lots of mach_port_deallocate are called to deallocate the port
with the name 0. A few of them deallocate a port with the name -1. Some
try to deallocate a port whose name is positive but still fail.
* some mach_port_request_notification returns invalid name. All of
these failed RPCs are sent from the same task and try to cancel the DEAD
NAME previous notification request.
* some vm_region returns "no space available". I assume that it is a
normal case. It's possible that there is no region at or above address
in the specified task.
* some task_info return "invalid argument" error, probably because
the task has already died.
The errors below are a bit rare. They don't always appear and don't seem
to be related to whether the subhurd can be booted successfully or not.
* some vm_allocate returns "invalid argument" error. The Mach
reference doesn't mention that vm_allocate can return this type of
error. I assume the target task has died.
* I also see mach_port_allocate and mach_port_mod_refs return
"invalid task" error once.
I don't understand why some programs try to deallocate MACH_PORT_NULL or
even -1.
With regard to the failure of mach_port_request_notification, I guess
it's because the port of that name has already died. But I fail to
understand why there is more failure than success when the program tries
to cancel the previous notification request. As far as I see, only
'console' tries to cancel the notification request.
That's all I find for now.
Zheng Da