Job heartbeat(progress report etc), job expiration, job cancellation, and job throttling will be improved in the new architecture
Kelven On 8/13/12 4:46 AM, "Suresh Sadhu" <suresh.sa...@citrix.com> wrote: > >Including few more points .. > >HI All, > >As I heard , Upcoming releases has major architecture changes involved. >It will be good if we consider the following items for better >improvement.so that it will help QA/Support and customers. Also it will >minimize support calls count. > >Also please feel to add if I miss any data points or you feel you can add >few more points for improvements to the below list... kindly correct me >if my assumption/views are wrong. > >- > >Job in waiting state >***************** >--- we don't fix the time to job completion ..because we don't know how >much time will it take to complete a particular job But due to this >design any initials job went in loop/infinite then other jobs are queued >and wait for first job to finish. > >The only way to come out of this situation is ..manually update the field >status in the DB. > >Is there any alternate(better) way to overcome the above problem... >please share your view and thoughts > >MY though: >If we put job priority/ Job waiting period as configurable parameters >and end user can set/update the priority based on his needs and also >waiting period.so that even one job in waiting state based on priority >other waiting job needs to trigger. > >In Current design if one job is in waiting state.. end user can't stop >the job. >So if we introduce configurable parameters so the job in waiting(hanged >state ) can be come out after configured duration over /expired. > > > >Issue no# http://bugs.cloud.com/show_bug.cgi?id=12061 >Job fails/retry mechanism : >******************** >If any job fails due to some exception we don't try after some time. > >Like example: >[ It's not accurate example but gives some info] > >In Vmware case: you can't take snapshot on root and data disk of vm at >the same time. If you try to trigger the snapshot on both disk on same >time. >First request will be succeeded and second request will failed with >proper limitation message. > >Again end user has to initiate the snapshot on another disk(i. datadisk) > >My Thought: >It will be good if we keep the failed job in queue and once first job >completes ..Job manager should take/consider waiting job(failed job) in >queue and process it. > >Issue no# http://bugs.cloud.com/show_bug.cgi?id=11531 > > >Please feel free to add few more data points here. > >Usability in terms of UI refresh: >************************ >CS has still has caching issue until and unless you manually click on >refresh button. Sometimes you still see the cached values. > > >Issue no#http://bugs.cloudstack.org/browse/CS-14988 > > >Error &Exception Handling & coordination between the tasks on same >resource. >*************************************************************** >I don't have much data points .if anybody has please share your views. > >But will give one example: > >Problem: > >Power on stopped VM and at the same time perform snapshot on root disk- >Fail(deploy VM failed with lock problem-Java.lang.exception occurred but >snapshot jib completed successfully and tried again startVM this time its >deployed successfully.)please check the attached log and execution logs. > >Limitation: > >This is not a problem under current architecture. We currently don't >coordinate tasks but to throw runtime errors, when a snapshot task is >being taken, VM operation may be temporarily unavailable to user and user >needs to retry > > >Also for HA CloudStack HA/VMSync behavior is going to be >same(implementation) for all hypervisor or still the functionality is >same(no change in existing functionality) in upcoming release also. > > > >Regards > >Sadhu > > > > > > > >