Hi Alireza, Tables details below as per my knowledge. @Dev Please correct if any detail is wrong.
- sync_queue and sync_queue_item tables are used for handling the entity (VM, host, etc) queues and concurrent control. Mainly, all the VM sync jobs pass through this queuing. - async_job - all the async jobs and related place holder VM async jobs (if any). - vm_work_job - extension to place holder VM async job in async_job, which holds VM id and the job stage. - op_ha_work - holds the VM work items to perform HA on the VMs, scheduled or cancelled based on the VM state. - op_lock - Used to acquire lock on a record in the given table (key: <tablename> + <entityid>) for a transaction by a running thread in the Management Server. Lock is released once the transaction is completed and corresponding record will be deleted. Hope this helps! -Suresh On Thu, Jan 24, 2019 at 12:49 AM Alireza Eskandari <astro.alir...@gmail.com> wrote: > Dear Suresh and Andrei > Thanks for your help. > I have upgrade CloudStack from 4.9.3 to 4.11.2 but the problem still > persists. > Then I inspect database tables and I found that these three tables could be > the root cause: > - op_ha_work > - op_lock > - vm_work_job > So I delete all records in those tables and problem solved. > The content of those tables are submitted as a comment in the bug report in > jira: > https://issues.apache.org/jira/browse/CLOUDSTACK-10401 > Suresh, could you tell me more about the role of those tables in CS? > I think CS had been more sensitive about concurrent jobs. Previous versions > works better. > Regards > > On Wed, Jan 23, 2019 at 9:43 PM Suresh Kumar Anaparti < > sureshkumar.anapa...@gmail.com> wrote: > > > Hi Alireza, > > > > *sync_queue *table is the actual VM sync queue which holds a queue id for > > each VM (*sync_objtype*: VmWorkJobQueue, *sync_objid*: <VM-Id>) and the > VM > > jobs would reside in *sync_queue_item* table against that queue id. Only > > one running job is allowed per VM queue (*queue_size_limit*: 1 in > > *sync_queue* table). The active/running job would have the > *queue_proc_id*, > > *queue_proc_number* and *queue_proc_time* set in the *sync_queue_item* > > table > > and the rest jobs with that queue id would be waiting for active job to > > complete. So, to delete pending jobs, records in the *sync_queue_item > > *table > > has to be cleared for the respective VMs, not the *sync_queue *table. > > > > I think, in your case, snapshots is taking long time and other jobs in > that > > VM are pending for long time as they are in queue waiting for snapshot > job > > to complete. What are the config values set for > > "job.cancel.threshold.minutes", "job.expire.minutes" and > > "volume.snapshot.job.cancel.threshold"? Are the jobs cancelled after the > > threshold time? > > > > Thanks, > > Suresh > > > > On Wed, Jan 23, 2019 at 7:14 PM Andrei Mikhailovsky > > <and...@arhont.com.invalid> wrote: > > > > > Hi > > > > > > I've had this issue a few times in 2018 and managed to get it fixed > > pretty > > > easily, although had spent a number of hours initially trying to figure > > out > > > WTF is going on. This issue looks like one of those artefacts that > > creeped > > > up in one of the versions released in 2018 and hasn't been addressed by > > the > > > dev team. > > > > > > The way I fixed it was similar to what has been recommended earlier. > > > However, the difference was that I am sure I've looked at more tables > > than > > > just the two suggested. Basically, I've stopped the management server, > > > created the sql backup, connected to the sql db and listed all tables. > > > Grepped for the words like job/schedule/queue/sync. After that I've > went > > > through all the tables and pretty much removed all the past / active / > > > awaiting execution jobs. I have started by looking at the vm related > jobs > > > (the vm that I've tried to start but wasn't able to). This has worked > > once, > > > but the second time I had to remove a lot more jobs which relate to > other > > > vms. After that I've started the management server and all went well > from > > > there. > > > > > > What I have also noticed is that my snapshot jobs (I use KVM and Ceph) > > > seem to be blocking jobs on the hypervisor hosts which are running > these > > > snapshots. So, if I am trying to perform various vm related jobs on a > > host > > > server which is currently running a snapshot process, that job will not > > be > > > executed until the snapshot process is done. I've tested this countless > > > number of times and it's still the case. Again, this issued appeared in > > one > > > of the 2018 releases as I've never seen between 2012 - 2017. > > > > > > Both issues are annoying as hell! > > > > > > Cheers > > > > > > ----- Original Message ----- > > > > From: "Alireza Eskandari" <astro.alir...@gmail.com> > > > > To: "dev" <dev@cloudstack.apache.org> > > > > Sent: Wednesday, 23 January, 2019 12:40:48 > > > > Subject: Re: Help! Jobs stuck in pending state > > > > > > > I'm following this issue in github: > > > > https://github.com/apache/cloudstack/issues/3104 > > > > Please leave your comments > > > > Thanks > > > > > > > > On Wed, Jan 23, 2019 at 12:39 PM Wei ZHOU <ustcweiz...@gmail.com> > > wrote: > > > > > > > >> Hi Alireza, > > > >> > > > >> could you try again after restarting mgt server ? > > > >> > > > >> -Wei > > > >> > > > >> Alireza Eskandari <astro.alir...@gmail.com> 于2019年1月23日周三 上午6:22写道: > > > >> > > > >> > First I deleted two jobs which was existed in vm_work_job table > and > > > its > > > >> > related entry in sync_queue table but it doesn't help. > > > >> > Then I delete all the entries in sync_queue tables and again no > > > success. > > > >> > Any idea? > > > >> > > > > >> > On Wed, Jan 23, 2019 at 1:50 AM Wei ZHOU <ustcweiz...@gmail.com> > > > wrote: > > > >> > > > > >> > > If you know the instance id and mysql password, it should work > > after > > > >> > > removing some records in mysql. > > > >> > > > > > >> > > ``` > > > >> > > set @id=XXXXX; > > > >> > > > > > >> > > delete from vm_work_job where vm_instance_id=@id; > > > >> > > delete from sync_queue where sync_objid=@id; > > > >> > > ``` > > > >> > > > > > >> > > Alireza Eskandari <astro.alir...@gmail.com> 于2019年1月22日周二 > > > 下午10:59写道: > > > >> > > > > > >> > > > Hi guys > > > >> > > > I have opened a bug in jira about my problem in CS: > > > >> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-10401 > > > >> > > > CloudStack doesn't process jobs! My cloud in totally unusable. > > > >> > > > Thanks in advance for you help. > > > >> > > > > > > >> > > > > > >> > > > > > > >