Re: Running buildd without mach-defpager

Michael Kelly Thu, 19 Mar 2026 10:23:46 -0700

On 18/03/2026 19:57, Michael Kelly wrote:

On 18/03/2026 09:42, Michael Kelly wrote:
mypy package is certainly a good example. It crashed my VM yesterdayrunning out of swap space although admittedly I have only 4GB of thatavailable. I'm going to run this same build on a Debian/Linuxsimilarly sized VM to see what memory resources are used on that OSfor comparison.
I did succeed in getting the Hurd mypy build to lock up. This isperhaps what Samuel is experiencing on the buildd? In my case, severalof the mach-defpager threads are stuck and the build stops. See kerneldebugger output appended. I think that thread 4 is blocked because thepage is 'busy' but why it's busy, given that it's a real (notfictitious) page, I don't know yet. Why is $map21 locked? I have thisstate preserved in a VM snapshot but I've run out of time today tolook further.

I have a theory as to what is happening which will hopefully withstandanalysis by those more knowledgeable than I am.

The kernel debugger confirms that the map is locked by thread4 and has aread count of 1. The first member of 'struct vm_map' is the lock itself,lock->thread has offset 0 and lock->read_count has offset 8, so:


db> x /x $map21,4
        df9c76c0    ffffffff    e0001        0

db> print $task21.4
ffffffffdf9c76c0

This shows that map->lock has been converted to a recursive lock withinthread4 whose stack trace is:


[...]
thread_block(...)+0x5d
vm_fault_page(...)+0x121b
vm_fault(...)+0x4e0
vm_fault_wire(...)+0x75
vm_map_pageable_scan(...)+0x154
vm_map_pageable(...)+0x141
vm_map_copyout(...)+0x457
ipc_kmsg_copyout_body(...)+0x70
ipc_kmsg_copyout(...)+0x51
mach_msg_continue(...)+0x9c

$map21 is initially write locked by vm_map_find_entry_anywhere() withinvm_map_copyout(). The final part of the copyout is to wire pages thatrequire it by a call to vm_map_pageable() which callsvm_map_pageable_scan() to do that. This is where the write lock isdowngraded to a read lock which converts it to a recursive lock.

Each map entry requiring wiring gets a call to vm_fault_wire(). One ofthe pages for this map entry must fail the call to vm_wire_fast() and soends up in vm_fault(). Normally when vm_fault is called, the map is notlocked and vm_map_lookup() locks and unlocks the map to ensure that anythread blocks that result do not do so with the map locked. In thisinstance however, the map is already read locked and the map lookupwould inc/dec the read count back to a read lock before the remainder ofthe fault handling takes place. This is how the map lock remains heldwithin the thread_block() which is bad news.

As to why the page is blocked on the busy state, I don't know. It'spossible that is normal behaviour and only shows because of the maplocking issue.

The recursive lock is also set in vm_fault_unwire() so it seems thatthis strategy is intentional but unwiring is perhaps less hazardous thanwiring since the page is guaranteed to be available.

If this analysis is agreed, then it seems to me that it will benecessary to rearrange the code to call vm_fault(), for those virtualaddresses that cannot be wired fast, without the map lock held.


Cheers,

Mike.

Re: Running buildd without mach-defpager

Reply via email to