On 2023-09-06 19:38, Dave Voutila wrote:
Mischa <open...@mlst.nl> writes:
On 2023-09-06 05:36, Dave Voutila wrote:
Mischa <open...@mlst.nl> writes:
On 2023-09-05 14:27, Dave Voutila wrote:
Mike Larkin <mlar...@nested.page> writes:
On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:
On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:
/snip
> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK
Starting 30 VMs this way caused the machine to become
unresponsive
again,
but nothing on the console. :(
Mischa
Were you seeing these uvm errors before this diff? If so, this
isn't
causing the problem and something else is.
I don't believe we solved any of the underlying uvm issues in
Bruges
last year. Mischa, can you test with just the latest
snapshot/-current?
I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a
uvm
share into the device process address space.
If this diff causes the errors to occur, and without the diff it's
fine, then
we need to look into that.
Also I think a pid number in that printf might be useful, I'll see
what I can
find. If it's not vmd causing this and rather some other process
then that
would be good to know also.
Sadly it looks like that printf doesn't spit out the offending
pid. :(
Just to confirm I am seeing this behavior on the latest snap
without
the patch as well.
Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.
Just started 10 VMs with sleep 2, machine freezes, but nothing on
the
console. :(
For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating,
starting,
and destroying vm's rapidly in a loop.
That could very well be the case. I will adjust my start script, so
far I've got good results with a 10 second sleep.
Is there some additional debugging I can turn that makes sense for
this? I can easily replicate.
Highly doubtful if the issue is what I think. The only thing would be
making sure you're running in a way to see any panic and drop into
ddb. If you're using X or not on the the primary console or serial
connection it might just appear as a deadlocked system during a panic.
I am using the console via iDRAC, there isn't any information anymore.
:(
Mischa