Hi SviridoffA

We have the same setup as you, and we do have a similar issue where the very 
first VM created from a template takes a long time to startup. It used to take 
7 minutes, but yours sound extreme. What ur specs?

We reported this to linbit and they did quite a number of improvements. Now we 
are down to 3-5 minutes. Are u using the latest linstor version?

But in our case, it only takes us max 5 minutes for the VM to startup for the 
first time (must be a new template never used to create a VM before).

We found this issue mainly when creating Autoscale VM. We create a template 
from a VM with running dockers, and use it for the autoscale. We notice the 
first autoscale Vm takes a 3-5 minutes for us, subsequent VMs take 30 seconds.

I think its got to do with template having to copy itself from secondary 
storage to primary. If u have a slow secondary storage it might contribute to 
this issue.

Or u can try using the latest packages from linbit. If not, trying reaching out 
to them?

I didnt notice the cpu utilisation as we didnt measure it.

Regards, Bryan
On 22 Aug 2024 at 6:41 PM +0800, SviridoffA (via GitHub) <[email protected]>, 
wrote:
>
> GitHub user SviridoffA created a discussion: Abnormal Load average on first 
> vm start from template
>
> Hi everyone! Maybe someone has faced with similar problem and can help me.
>
> First of all i'm using:
> Cloudstack 4.19.1.1
> Hypervisor KVM on ubuntu 24.04
> Primary storage linstor
>
> When I creating template or registering it from URL, and trying to start VM 
> for firs time, after cloudstack ui shows this template as downloaded on 
> primary storage, load average starts to grow, and instance creating takes too 
> long. It's can take about 40 mins, or even can totaly overload host and ony 
> way to bring in to a health state its reboot. This load average is present on 
> all disk nods where linstor creates related devices. After first instamce 
> creation, all folowing tryes compliting without any problems, and vm's starts 
> immediately.
>
> pc utility shows growing number of vgs and lvs processes. Like this:
>
> 393650 ? D 0:00 vgs -o vg_name,vg_extent_size,vg_size,vg_free --separator ; 
> --units k --noheadings --nosuffix
> 394149 ? D 0:00 lvs -o 
> lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr,metadata_percent,chunk_size
>  --separator ; --noheadings --units k --nosuffix
>
> Nothing suspisies in Cloudstack agent logs, but in Linstor sattelite I have 
> found many reports like this:
>
> ERROR REPORT 66C5C167-EF4D9-000006
>
> ============================================================
>
> Application: LINBIT® LINSTOR
> Module: Satellite
> Version: 1.29.0
> Build ID: b2be7208a777f0743d4c7187062678cd5416fccf
> Build time: 2024-07-31T11:02:51+00:00
> Error time: 2024-08-22 09:59:58
> Node: node2
> Thread: MainWorkerPool-4
>
> ============================================================
>
> Reported error:
> ===============
>
> Category: LinStorException
> Class name: StorageException
> Class canonical name: com.linbit.linstor.storage.StorageException
> Generated at: Method 'genericExecutor', Source file 'Commands.java', Line #120
>
> Error message: Failed to query 'lvs' info
>
> ErrorContext:
> Cause: External command timed out
> Details: External command: lvs -o 
> lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr,metadata_percent,chunk_size
>  --separator ';' --noheadings --units k --nosuffix
>
>
> Call backtrace:
>
> Method Native Class:Line number
> genericExecutor N com.linbit.linstor.storage.utils.Commands:120
> genericExecutor N com.linbit.linstor.storage.utils.Commands:63
> lvs N com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:99
> lambda$getLvsInfoImpl$1 N 
> com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:320
> execWithRetry N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:669
> getLvsInfoImpl N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:317
> getLvsInfo N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:282
> fetchAllocatedSizes N com.linbit.linstor.layer.storage.lvm.LvmThinProvider:710
> getVlmAllocatedCapacities N 
> com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:185
> executeInScope N com.linbit.linstor.api.protobuf.ReqVlmAllocated:84
> lambda$executeReactive$0 N com.linbit.linstor.api.protobuf.ReqVlmAllocated:69
> doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:175
> lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:100
> call N reactor.core.publisher.MonoCallable:72
> trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:127
> subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
> subscribe N reactor.core.publisher.Flux:8759
> onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
> request N reactor.core.publisher.Operators$ScalarSubscription:2545
> onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
> subscribe N reactor.core.publisher.MonoJust:55
> subscribe N reactor.core.publisher.MonoDeferContextual:55
> subscribe N reactor.core.publisher.Flux:8773
> trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:200
> subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
> subscribe N reactor.core.publisher.Flux:8759
> onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427
> slowPath N reactor.core.publisher.FluxArray$ArraySubscription:127
> request N reactor.core.publisher.FluxArray$ArraySubscription:100
> onSubscribe N reactor.core.publisher.FluxFlatMap$FlatMapMain:371
> subscribe N reactor.core.publisher.FluxMerge:70
> subscribe N reactor.core.publisher.Flux:8773
> onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:258
> subscribe N reactor.core.publisher.FluxConcatArray:78
> subscribe N reactor.core.publisher.InternalFluxOperator:62
> subscribe N reactor.core.publisher.FluxDefer:54
> subscribe N reactor.core.publisher.Flux:8773
> onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427
> drainAsync N 
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
> drain N 
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
> onNext N 
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
> drainFused N reactor.core.publisher.SinkManyUnicast:319
> drain N reactor.core.publisher.SinkManyUnicast:362
> tryEmitNext N reactor.core.publisher.SinkManyUnicast:237
> tryEmitNext N reactor.core.publisher.SinkManySerialized:100
> processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:420
> doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:228
> lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:165
> onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:185
> runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
> run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
> call N reactor.core.scheduler.WorkerTask:84
> call N reactor.core.scheduler.WorkerTask:37
> run N java.util.concurrent.FutureTask:317
> run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
> runWorker N java.util.concurrent.ThreadPoolExecutor:1144
> run N java.util.concurrent.ThreadPoolExecutor$Worker:642
> run N java.lang.Thread:1583
>
> Caused by:
> ==========
>
> Category: Exception
> Class name: ChildProcessTimeoutException
> Class canonical name: com.linbit.ChildProcessTimeoutException
> Generated at: Method 'waitFor', Source file 'ChildProcessHandler.java', Line 
> #184
>
>
> Call backtrace:
>
> Method Native Class:Line number
> waitFor N com.linbit.extproc.ChildProcessHandler:184
> waitFor N com.linbit.extproc.ChildProcessHandler:124
> syncProcess N com.linbit.extproc.ExtCmd:160
> exec N com.linbit.extproc.ExtCmd:92
> genericExecutor N com.linbit.linstor.storage.utils.Commands:79
> genericExecutor N com.linbit.linstor.storage.utils.Commands:63
> lvs N com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:99
> lambda$getLvsInfoImpl$1 N 
> com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:320
> execWithRetry N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:669
> getLvsInfoImpl N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:317
> getLvsInfo N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:282
> fetchAllocatedSizes N com.linbit.linstor.layer.storage.lvm.LvmThinProvider:710
> getVlmAllocatedCapacities N 
> com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:185
> executeInScope N com.linbit.linstor.api.protobuf.ReqVlmAllocated:84
> lambda$executeReactive$0 N com.linbit.linstor.api.protobuf.ReqVlmAllocated:69
> doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:175
> lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:100
> call N reactor.core.publisher.MonoCallable:72
> trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:127
> subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
> subscribe N reactor.core.publisher.Flux:8759
> onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
> request N reactor.core.publisher.Operators$ScalarSubscription:2545
> onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
> subscribe N reactor.core.publisher.MonoJust:55
> subscribe N reactor.core.publisher.MonoDeferContextual:55
> subscribe N reactor.core.publisher.Flux:8773
> trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:200
> subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
> subscribe N reactor.core.publisher.Flux:8759
> onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427
> slowPath N reactor.core.publisher.FluxArray$ArraySubscription:127
> request N reactor.core.publisher.FluxArray$ArraySubscription:100
> onSubscribe N reactor.core.publisher.FluxFlatMap$FlatMapMain:371
> subscribe N reactor.core.publisher.FluxMerge:70
> subscribe N reactor.core.publisher.Flux:8773
> onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:258
> subscribe N reactor.core.publisher.FluxConcatArray:78
> subscribe N reactor.core.publisher.InternalFluxOperator:62
> subscribe N reactor.core.publisher.FluxDefer:54
> subscribe N reactor.core.publisher.Flux:8773
> onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427
> drainAsync N 
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
> drain N 
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
> onNext N 
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
> drainFused N reactor.core.publisher.SinkManyUnicast:319
> drain N reactor.core.publisher.SinkManyUnicast:362
> tryEmitNext N reactor.core.publisher.SinkManyUnicast:237
> tryEmitNext N reactor.core.publisher.SinkManySerialized:100
> processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:420
> doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:228
> lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:165
> onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:185
> runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
> run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
> call N reactor.core.scheduler.WorkerTask:84
> call N reactor.core.scheduler.WorkerTask:37
> run N java.util.concurrent.FutureTask:317
> run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
> runWorker N java.util.concurrent.ThreadPoolExecutor:1144
> run N java.util.concurrent.ThreadPoolExecutor$Worker:642
> run N java.lang.Thread:1583
>
>
> END OF ERROR REPORT.
>
> I have tryed to disable apparmor and firewall, but it's not helped
> I would be appreciate for any help with this.
>
>
>
> GitHub link: https://github.com/apache/cloudstack/discussions/9570
>
> ----
> This is an automatically sent email for [email protected].
> To unsubscribe, please send an email to: 
> [email protected]
>

Reply via email to