Mischa <open...@mlst.nl> writes:

> On 2023-09-05 14:27, Dave Voutila wrote:
>> Mike Larkin <mlar...@nested.page> writes:
>>
>>> On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:
>>>> On 2023-09-04 18:58, Mischa wrote:
>>>> > On 2023-09-04 18:55, Mischa wrote:
>> /snip
>>
>>>> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
>>>> > > this way, before it would choke on 2-3.
>>>> > >
>>>> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>>>> >
>>>> > I do still get the same message on the console, but the machine isn't
>>>> > freezing up.
>>>> >
>>>> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
>>>> > MAP_STACK
>>>> Starting 30 VMs this way caused the machine to become unresponsive
>>>> again,
>>>> but nothing on the console. :(
>>>> Mischa
>>> Were you seeing these uvm errors before this diff? If so, this
>>> isn't
>>> causing the problem and something else is.
>> I don't believe we solved any of the underlying uvm issues in Bruges
>> last year. Mischa, can you test with just the latest snapshot/-current?
>> I'd imagine starting and stopping many vm's now is exacerbating the
>> issue because of the fork/exec for devices plus the ioctl to do a uvm
>> share into the device process address space.
>>
>>> If this diff causes the errors to occur, and without the diff it's
>>> fine, then
>>> we need to look into that.
>>> Also I think a pid number in that printf might be useful, I'll see
>>> what I can
>>> find. If it's not vmd causing this and rather some other process
>>> then that
>>> would be good to know also.
>> Sadly it looks like that printf doesn't spit out the offending
>> pid. :(
>
> Just to confirm I am seeing this behavior on the latest snap without
> the patch as well.

Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.

> Just started 10 VMs with sleep 2, machine freezes, but nothing on the
> console. :(

For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating, starting,
and destroying vm's rapidly in a loop.

-dv

Reply via email to