Mischa <open...@mlst.nl> writes:
> On 2023-09-05 14:27, Dave Voutila wrote: >> Mike Larkin <mlar...@nested.page> writes: >> >>> On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: >>>> On 2023-09-04 18:58, Mischa wrote: >>>> > On 2023-09-04 18:55, Mischa wrote: >> /snip >> >>>> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started >>>> > > this way, before it would choke on 2-3. >>>> > > >>>> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? >>>> > >>>> > I do still get the same message on the console, but the machine isn't >>>> > freezing up. >>>> > >>>> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not >>>> > MAP_STACK >>>> Starting 30 VMs this way caused the machine to become unresponsive >>>> again, >>>> but nothing on the console. :( >>>> Mischa >>> Were you seeing these uvm errors before this diff? If so, this >>> isn't >>> causing the problem and something else is. >> I don't believe we solved any of the underlying uvm issues in Bruges >> last year. Mischa, can you test with just the latest snapshot/-current? >> I'd imagine starting and stopping many vm's now is exacerbating the >> issue because of the fork/exec for devices plus the ioctl to do a uvm >> share into the device process address space. >> >>> If this diff causes the errors to occur, and without the diff it's >>> fine, then >>> we need to look into that. >>> Also I think a pid number in that printf might be useful, I'll see >>> what I can >>> find. If it's not vmd causing this and rather some other process >>> then that >>> would be good to know also. >> Sadly it looks like that printf doesn't spit out the offending >> pid. :( > > Just to confirm I am seeing this behavior on the latest snap without > the patch as well. Since this diff isn't the cause, I've committed it. Thanks for testing. I'll see if I can reproduce your MAP_STACK issues. > Just started 10 VMs with sleep 2, machine freezes, but nothing on the > console. :( For now, I'd recommend spacing out vm launches. I'm pretty sure it's related to the uvm corruption we saw last year when creating, starting, and destroying vm's rapidly in a loop. -dv