Things to consider for Campo

Suresh Sadhu Mon, 13 Aug 2012 04:17:08 -0700

HI All,

As I heard , Campo has major architecture changes involved. It will be good if 
we consider the following items for better improvement.so that it will help 
QA/Support and customers. Also it will  minimize  support calls count.


Also please feel to add if I miss any data points or you feel you can add few 
more points for improvements to the below list... kindly correct me if my 
assumption/views are wrong.

-

Job in waiting state
*****************
--- we don't fix the time to job completion ..because we don't know how much 
time  will it  take to complete  a particular job But due to this design any 
initials job went in loop/infinite then other jobs are queued and wait for 
first job to finish.

The only way to come out of this situation is ..manually update the field 
status in the DB.

Is there any alternate(better) way to overcome the above problem... please 
share your view and thoughts

MY though:
If we put job priority/ Job waiting period  as configurable parameters  and  
end user can set/update the priority based on his needs and also waiting 
period.so that even one job in waiting state based on priority other waiting 
job needs to trigger.

In Current design if one job is in waiting state.. end user can't stop the job.
So if we introduce configurable parameters  so the job in waiting(hanged state 
) can be come out after configured duration over /expired.



Issue no# http://bugs.cloud.com/show_bug.cgi?id=12061
Job fails/retry mechanism :
********************
If any job fails  due to some exception we don't try  after some time.

Like example:
[ It's not accurate example but gives some info]

In Vmware case: you can't take snapshot  on root and data disk of vm at the 
same time. If you try to trigger the snapshot on both disk on same time.
First request will be succeeded and second request will failed with proper 
limitation message.

Again end user has to initiate the snapshot on another disk(i. datadisk)

My Thought:
It will be good if we keep the failed job in queue and once first job completes 
..Job manager should take/consider waiting job(failed job) in queue and process 
it.

Issue no# http://bugs.cloud.com/show_bug.cgi?id=11531


Please feel free to add few more data points here.

Usability in terms of UI refresh:
************************
CS has still has caching issue until and unless you manually click on refresh 
button. Sometimes you still see the cached values.


Issue no#http://bugs.cloudstack.org/browse/CS-14988


Error &Exception Handling & coordination between the tasks on same resource.
***************************************************************
I don't have much data points .if anybody has please share your views.

But will give one example:

Problem:

Power on stopped VM and at the same time perform snapshot on root disk- 
Fail(deploy VM failed with lock problem-Java.lang.exception occurred but 
snapshot jib completed successfully and tried again startVM this time its 
deployed successfully.)please check the attached log and execution logs.

Limitation:

This is not a problem under current architecture. We currently don't coordinate 
tasks but to throw runtime errors, when a snapshot task is being taken, VM 
operation may be temporarily unavailable to user and user needs to retry




Regards

Sadhu

Things to consider for Campo

Reply via email to