Mischa <open...@mlst.nl> writes:
> On 2023-09-06 05:36, Dave Voutila wrote: >> Mischa <open...@mlst.nl> writes: >>> On 2023-09-05 14:27, Dave Voutila wrote: >>>> Mike Larkin <mlar...@nested.page> writes: >>>> >>>>> On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: >>>>>> On 2023-09-04 18:58, Mischa wrote: >>>>>> > On 2023-09-04 18:55, Mischa wrote: >>>> /snip >>>> >>>>>> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started >>>>>> > > this way, before it would choke on 2-3. >>>>>> > > >>>>>> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? >>>>>> > >>>>>> > I do still get the same message on the console, but the machine isn't >>>>>> > freezing up. >>>>>> > >>>>>> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not >>>>>> > MAP_STACK >>>>>> Starting 30 VMs this way caused the machine to become unresponsive >>>>>> again, >>>>>> but nothing on the console. :( >>>>>> Mischa >>>>> Were you seeing these uvm errors before this diff? If so, this >>>>> isn't >>>>> causing the problem and something else is. >>>> I don't believe we solved any of the underlying uvm issues in Bruges >>>> last year. Mischa, can you test with just the latest >>>> snapshot/-current? >>>> I'd imagine starting and stopping many vm's now is exacerbating the >>>> issue because of the fork/exec for devices plus the ioctl to do a uvm >>>> share into the device process address space. >>>> >>>>> If this diff causes the errors to occur, and without the diff it's >>>>> fine, then >>>>> we need to look into that. >>>>> Also I think a pid number in that printf might be useful, I'll see >>>>> what I can >>>>> find. If it's not vmd causing this and rather some other process >>>>> then that >>>>> would be good to know also. >>>> Sadly it looks like that printf doesn't spit out the offending >>>> pid. :( >>> Just to confirm I am seeing this behavior on the latest snap >>> without >>> the patch as well. >> Since this diff isn't the cause, I've committed it. Thanks for >> testing. I'll see if I can reproduce your MAP_STACK issues. >> >>> Just started 10 VMs with sleep 2, machine freezes, but nothing on the >>> console. :( >> For now, I'd recommend spacing out vm launches. I'm pretty sure it's >> related to the uvm corruption we saw last year when creating, starting, >> and destroying vm's rapidly in a loop. > > That could very well be the case. I will adjust my start script, so > far I've got good results with a 10 second sleep. > > Is there some additional debugging I can turn that makes sense for > this? I can easily replicate. > Highly doubtful if the issue is what I think. The only thing would be making sure you're running in a way to see any panic and drop into ddb. If you're using X or not on the the primary console or serial connection it might just appear as a deadlocked system during a panic. -dv