On 18/03/2026 19:57, Michael Kelly wrote:
On 18/03/2026 09:42, Michael Kelly wrote:
mypy package is certainly a good example. It crashed my VM yesterday
running out of swap space although admittedly I have only 4GB of that
available. I'm going to run this same build on a Debian/Linux
similarly sized VM to see what memory resources are used on that OS
for comparison.
I did succeed in getting the Hurd mypy build to lock up. This is
perhaps what Samuel is experiencing on the buildd? In my case, several
of the mach-defpager threads are stuck and the build stops. See kernel
debugger output appended. I think that thread 4 is blocked because the
page is 'busy' but why it's busy, given that it's a real (not
fictitious) page, I don't know yet. Why is $map21 locked? I have this
state preserved in a VM snapshot but I've run out of time today to
look further.
I have a theory as to what is happening which will hopefully withstand
analysis by those more knowledgeable than I am.
The kernel debugger confirms that the map is locked by thread4 and has a
read count of 1. The first member of 'struct vm_map' is the lock itself,
lock->thread has offset 0 and lock->read_count has offset 8, so:
db> x /x $map21,4
df9c76c0 ffffffff e0001 0
db> print $task21.4
ffffffffdf9c76c0
This shows that map->lock has been converted to a recursive lock within
thread4 whose stack trace is:
[...]
thread_block(...)+0x5d
vm_fault_page(...)+0x121b
vm_fault(...)+0x4e0
vm_fault_wire(...)+0x75
vm_map_pageable_scan(...)+0x154
vm_map_pageable(...)+0x141
vm_map_copyout(...)+0x457
ipc_kmsg_copyout_body(...)+0x70
ipc_kmsg_copyout(...)+0x51
mach_msg_continue(...)+0x9c
$map21 is initially write locked by vm_map_find_entry_anywhere() within
vm_map_copyout(). The final part of the copyout is to wire pages that
require it by a call to vm_map_pageable() which calls
vm_map_pageable_scan() to do that. This is where the write lock is
downgraded to a read lock which converts it to a recursive lock.
Each map entry requiring wiring gets a call to vm_fault_wire(). One of
the pages for this map entry must fail the call to vm_wire_fast() and so
ends up in vm_fault(). Normally when vm_fault is called, the map is not
locked and vm_map_lookup() locks and unlocks the map to ensure that any
thread blocks that result do not do so with the map locked. In this
instance however, the map is already read locked and the map lookup
would inc/dec the read count back to a read lock before the remainder of
the fault handling takes place. This is how the map lock remains held
within the thread_block() which is bad news.
As to why the page is blocked on the busy state, I don't know. It's
possible that is normal behaviour and only shows because of the map
locking issue.
The recursive lock is also set in vm_fault_unwire() so it seems that
this strategy is intentional but unwiring is perhaps less hazardous than
wiring since the page is guaranteed to be available.
If this analysis is agreed, then it seems to me that it will be
necessary to rearrange the code to call vm_fault(), for those virtual
addresses that cannot be wired fast, without the map lock held.
Cheers,
Mike.