As we are already using a list management server API calls to handle the scripting of the shutdown/upgrade/start, I manually backported the code:
https://github.com/apache/cloudstack/pull/2578 On Tue, Apr 17, 2018 at 9:31 PM, Rafael Weingärtner < rafaelweingart...@gmail.com> wrote: > Ron, that is a good analogy. > > There is something else that I forgot to mention. We discussed the issue of > migrating Jobs/tasks to other management servers. This is not something > easy to achieve because of the way it is currently implemented in ACS. > However, as soon as we have a more comprehensive solution to a graceful > shutdown, this becomes something feasible for us to work on. > > I do not know if Ilya is going to develop a graceful shutdown or if someone > else will pick this up, but we are willing to work on it. Of course, it is > not something that we would develop right away because it will probably > take quite some work, and we have some other priorities. However, I will > discuss this further internally and see what we can come up with. > > On Tue, Apr 17, 2018 at 1:46 PM, Ron Wheeler <rwheeler@artifact-software. > com > > wrote: > > > Part of this sounds like the Windows shut down process which is familiar > > to many. > > > > For those who have never used Windows: > > > > Once you initiate the shutdown, it asks the tasks to shut down. > > If tasks have not shutdown within a "reasonable period", it lists them > and > > asks you if you want to wait a bit longer, force them to close or abort > the > > shutdown so that you can manually shut them down. > > If you "force" a shutdown it closes all of the tasks using all of the > > brutality at its command. > > If you abort, then you have to redo the shutdown after you have manually > > exited from the processes that you care about. > > > > This is pretty user friendly but requires that you have a way to signal > to > > a task that it is time to say goodbye. > > > > The "reasonable time" needs to have a default that is short enough to > make > > the operator happy and long enough to have a reasonable chance of getting > > everything stopped without intervention. If you allow the shutdown to > > proceed after the interval, while the operator waits then you need to > > refresh the list of running tasks when tasks end. > > > > Ron > > > > > > On 17/04/2018 11:27 AM, Rafael Weingärtner wrote: > > > >> Ilya and others, > >> > >> We have been discussing this idea of graceful/nicely shutdown. Our > >> feeling > >> is that we (in CloudStack community) might have been trying to solve > this > >> problem with too much scripting. What if we developed a more integrated > >> (native) solution? > >> > >> Let me explain our idea. > >> > >> ACS has a table called “mshost”, which is used to store management > server > >> information. During balancing and when jobs are dispatched to other > >> management servers this table is consulted/queried. Therefore, we have > >> been discussing the idea of creating a management API for management > >> servers. We could have an API method that changes the state of > management > >> servers to “prepare to maintenance” and then “maintenance” (as soon as > all > >> of the task/jobs it is managing finish). The idea is that during > >> rebalancing we would remove the hosts of servers that are not in “Up” > >> state > >> (of course we would also ignore hosts in the aforementioned state to > >> receive hosts to manage). Moreover, when we send/dispatch jobs to other > >> management servers, we could ignore the ones that are not in “Up” state > >> (which is something already done). > >> > >> By doing this, the nicely shutdown could be executed in a few steps. > >> > >> 1 – issue the maintenance method for the management server you desire > >> 2 – wait until the MS goes into maintenance mode, while there are still > >> running jobs it (the management server) will be maintained in prepare > for > >> maintenance > >> 3 – execute the Linux shutdown command > >> > >> We would need other APIs methods to manage MSs then. An (i) API method > to > >> list MSs, and we could even create an (ii) API to remove > old/de-activated > >> management servers, which we currently do not have (forcing users to > apply > >> changed directly in the database). > >> > >> Moreover, in this model, we would not kill hanging jobs; we would wait > >> until they expire and ACS expunges them. Of course, it is possible to > >> develop a forceful maintenance method as well. Then, when the “prepare > for > >> maintenance” takes longer than a parameter, we could kill hanging jobs. > >> > >> All of this would allow the MS to be kept up and receiving requests > until > >> it can be safely shutdown. What do you guys about this approach? > >> > >> On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yzh...@marketo.com> > wrote: > >> > >> As a cloud admin, I would love to have this feature. > >>> > >>> It so happens that I just accidentally restarted my ACS management > server > >>> while two instances are migrating to another Xen cluster (via storage > >>> migration, not live migration). As results, both instances > >>> ends up with corrupted data disk which can't be reattached or migrated. > >>> > >>> Any feature which prevents this from happening would be great. A low > >>> hanging fruit is simply checking for > >>> if there are any async jobs running, especially any kind of migration > >>> jobs > >>> or other known long running type of > >>> jobs and warn the operator so that he has a chance to abort server > >>> shutdowns. > >>> > >>> Yiping > >>> > >>> On 4/5/18, 3:13 PM, "ilya musayev" <ilya.mailing.li...@gmail.com> > >>> wrote: > >>> > >>> Andrija > >>> > >>> This is a tough scenario. > >>> > >>> As an admin, they way i would have handled this situation, is to > >>> advertise > >>> the upcoming outage and then take away specific API commands from > a > >>> user a > >>> day before - so he does not cause any long running async jobs. > Once > >>> maintenance completes - enable the API commands back to the user. > >>> However - > >>> i dont know who your user base is and if this would be an > acceptable > >>> solution. > >>> > >>> Perhaps also investigate what can be done to speed up your long > >>> running > >>> tasks... > >>> > >>> As a side node, we will be working on a feature that would allow > >>> for a > >>> graceful termination of the process/job, meaning if agent noticed > a > >>> disconnect or termination request - it will abort the command in > >>> flight. We > >>> can also consider restarting this tasks again or what not - but it > >>> would > >>> not be part of this enhancement. > >>> > >>> Regards > >>> ilya > >>> > >>> On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic < > >>> andrija.pa...@gmail.com > >>> wrote: > >>> > >>> > Hi Ilya, > >>> > > >>> > thanks for the feedback - but in "real world", you need to > >>> "understand" > >>> > that 60min is next to useless timeout for some jobs (if I > >>> understand > >>> this > >>> > specific parameter correctly ?? - job is really canceled, not > only > >>> job > >>> > monitoring is canceled ???) - > >>> > > >>> > My value for the "job.cancel.threshold.minutes" is 2880 minutes > >>> (2 > >>> days?) > >>> > > >>> > I can tell you when you have CEPH/NFS (CEPH even "worse" case, > >>> since > >>> slower > >>> > read durign qemu-img convert process...) of 500GB, then imagine > >>> snapshot > >>> > job will take many hours. Should I mention 1TB volumes (yes, we > >>> had > >>> > client's like that...) > >>> > Than attaching 1TB volume, that was uploaded to ACS (lives > >>> originally on > >>> > Secondary Storage, and takes time to be copied over to NFS/CEPH) > >>> will take > >>> > up to few hours. > >>> > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also > >>> takes > >>> > time...etc. > >>> > > >>> > I'm just giving you feedback as "user", admin of the cloud, zero > >>> DEV > >>> skills > >>> > here :) , just to make sure you make practical decisions (and I > >>> admit I > >>> > might be wrong with my stuff, but just giving you feedback from > >>> our > >>> public > >>> > cloud setup) > >>> > > >>> > > >>> > Cheers! > >>> > > >>> > > >>> > > >>> > > >>> > On 5 April 2018 at 15:16, Tutkowski, Mike < > >>> mike.tutkow...@netapp.com > >>> > wrote: > >>> > > >>> > > Wow, there’s been a lot of good details noted from several > >>> people > >>> on how > >>> > > this process works today and how we’d like it to work in the > >>> near > >>> future. > >>> > > > >>> > > 1) Any chance this is already documented on the Wiki? > >>> > > > >>> > > 2) If not, any chance someone would be willing to do so (a > flow > >>> diagram > >>> > > would be particularly useful). > >>> > > > >>> > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier < > >>> ma...@exoscale.ch> > >>> > > wrote: > >>> > > > > >>> > > > Hi all, > >>> > > > > >>> > > > Good point ilya but as stated by Sergey there's more thing > to > >>> consider > >>> > > > before being able to do a proper shutdown. I augmented my > >>> script > >>> I gave > >>> > > you > >>> > > > originally and changed code in CS. What we're doing for our > >>> environment > >>> > > is > >>> > > > as follow: > >>> > > > > >>> > > > 1. the MGMT looks for a change in the file /etc/lb-agent > which > >>> contains > >>> > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can > >>> disable the > >>> > > > mgmt on the keyword "maint" and the mgmt server stops a > >>> couple of > >>> > > > threads[1] to stop processing async jobs in the queue > >>> > > > 2. Looks for the async jobs and wait until there is none to > >>> ensure you > >>> > > can > >>> > > > send the reconnect commands (if jobs are running, a > reconnect > >>> will > >>> > result > >>> > > > in a failed job since the result will never reach the > >>> management > >>> > server - > >>> > > > the agent waits for the current job to be done before > >>> reconnecting, and > >>> > > > discard the result... rooms for improvement here!) > >>> > > > 3. Issue a reconnectHost command to all the hosts connected > to > >>> the mgmt > >>> > > > server so that they reconnect to another one, otherwise the > >>> mgmt > >>> must > >>> > be > >>> > > up > >>> > > > since it is used to forward commands to agents. > >>> > > > 4. when all agents are reconnected, we can shutdown the > >>> management > >>> > server > >>> > > > and perform the maintenance. > >>> > > > > >>> > > > One issue remains for me, during the reconnect, the commands > >>> that are > >>> > > > processed at the same time should be kept in a queue until > the > >>> agents > >>> > > have > >>> > > > finished any current jobs and have reconnected. Today the > >>> little > >>> time > >>> > > > window during which the reconnect happens can lead to failed > >>> jobs due > >>> > to > >>> > > > the agent not being connected at the right moment. > >>> > > > > >>> > > > I could push a PR for the change to stop some processing > >>> threads > >>> based > >>> > on > >>> > > > the content of a file. It's possible also to cancel the > drain > >>> of > >>> the > >>> > > > management by simply changing the content of the file back > to > >>> "ready" > >>> > > > again, instead of "maint" [2]. > >>> > > > > >>> > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector > >>> > > > [2] HA proxy documentation on agent checker: > >>> https://cbonte.github.io/ > >>> > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check > >>> > > > > >>> > > > Regarding your issue on the port blocking, I think it's fair > >>> to > >>> > consider > >>> > > > that if you want to shutdown your server at some point, you > >>> have > >>> to > >>> > stop > >>> > > > serving (some) requests. Here the only way it's to stop > >>> serving > >>> > > everything. > >>> > > > If the API had a REST design, we could reject any > >>> POST/PUT/DELETE > >>> > > > operations and allow GET ones. I don't know how hard it > would > >>> be > >>> today > >>> > to > >>> > > > only allow listBaseCmd operations to be more friendly with > the > >>> users. > >>> > > > > >>> > > > Marco > >>> > > > > >>> > > > > >>> > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy < > >>> serg...@hotmail.com> > >>> > > > wrote: > >>> > > > > >>> > > >> Now without spellchecking :) > >>> > > >> > >>> > > >> This is not simple e.g. for VMware. Each management server > >>> also > >>> acts > >>> > as > >>> > > an > >>> > > >> agent proxy so tasks against a particular ESX host will be > >>> always > >>> > > >> forwarded. That right answer will be to support a native > >>> “maintenance > >>> > > mode” > >>> > > >> for management server. When entered to such mode the > >>> management > >>> server > >>> > > >> should release all agents including SSVM, block/redirect > API > >>> calls and > >>> > > >> login request and finish all async job it originated. > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy < > >>> serg...@hotmail.com > >>> > > <mailto: > >>> > > >> serg...@hotmail.com>> wrote: > >>> > > >> > >>> > > >> This is not simple e.g. for VMware. Each management server > >>> also > >>> acts > >>> > as > >>> > > an > >>> > > >> agent proxy so tasks against a particular ESX host will be > >>> always > >>> > > >> forwarded. That right answer will be to a native support > for > >>> > > “maintenance > >>> > > >> mode” for management server. When entered to such mode the > >>> management > >>> > > >> server should release all agents including save, > >>> block/redirect > >>> API > >>> > > calls > >>> > > >> and login request and finish all a sync job it originated. > >>> > > >> > >>> > > >> Sent from my iPhone > >>> > > >> > >>> > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner < > >>> > > >> rafaelweingart...@gmail.com<mailto:rafaelweingartner@gmail > . > >>> com > >>> > wrote: > >>> > > >> > >>> > > >> Ilya, still regarding the management server that is being > >>> shut > >>> down > >>> > > issue; > >>> > > >> if other MSs/or maybe system VMs (I am not sure to know if > >>> they > >>> are > >>> > > able to > >>> > > >> do such tasks) can direct/redirect/send new jobs to this > >>> management > >>> > > server > >>> > > >> (the one being shut down), the process might never end > >>> because > >>> new > >>> > tasks > >>> > > >> are always being created for the management server that we > >>> want > >>> to > >>> > shut > >>> > > >> down. Is this scenario possible? > >>> > > >> > >>> > > >> That is why I mentioned blocking the port 8250 for the > >>> > > “graceful-shutdown”. > >>> > > >> > >>> > > >> If this scenario is not possible, then everything s fine. > >>> > > >> > >>> > > >> > >>> > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev < > >>> > > ilya.mailing.li...@gmail.com > >>> > > >> <mailto:ilya.mailing.li...@gmail.com>> > >>> > > >> wrote: > >>> > > >> > >>> > > >> I'm thinking of using a configuration from > >>> > > "job.cancel.threshold.minutes" - > >>> > > >> it will be the longest > >>> > > >> > >>> > > >> "category": "Advanced", > >>> > > >> > >>> > > >> "description": "Time (in minutes) for async-jobs to be > >>> forcely > >>> > > >> cancelled if it has been in process for long", > >>> > > >> > >>> > > >> "name": "job.cancel.threshold.minutes", > >>> > > >> > >>> > > >> "value": "60" > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner < > >>> > > >> rafaelweingart...@gmail.com<mailto:rafaelweingartner@gmail > . > >>> com > >>> > wrote: > >>> > > >> > >>> > > >> Big +1 for this feature; I only have a few doubts. > >>> > > >> > >>> > > >> * Regarding the tasks/jobs that management servers (MSs) > >>> execute; are > >>> > > >> these > >>> > > >> tasks originate from requests that come to the MS, or is it > >>> possible > >>> > > that > >>> > > >> requests received by one management server to be executed > by > >>> other? I > >>> > > >> mean, > >>> > > >> if I execute a request against MS1, will this request > always > >>> be > >>> > > >> executed/threated by MS1, or is it possible that this > >>> request is > >>> > > executed > >>> > > >> by another MS (e.g. MS2)? > >>> > > >> > >>> > > >> * I would suggest that after we block traffic coming from > >>> > > >> 8080/8443/8250(we > >>> > > >> will need to block this as well right?), we can log the > >>> execution of > >>> > > >> tasks. > >>> > > >> I mean, something saying, there are XXX tasks (enumerate > >>> tasks) > >>> still > >>> > > >> being > >>> > > >> executed, we will wait for them to finish before shutting > >>> down. > >>> > > >> > >>> > > >> * The timeout (60 minutes suggested) could be global > settings > >>> that we > >>> > > can > >>> > > >> load before executing the graceful-shutdown. > >>> > > >> > >>> > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev < > >>> > > >> ilya.mailing.li...@gmail.com<mailto:ilya.mailing.lists@ > >>> gmail.com> > >>> > > >> > >>> > > >> wrote: > >>> > > >> > >>> > > >> Use case: > >>> > > >> In any environment - time to time - administrator needs to > >>> perform a > >>> > > >> maintenance. Current stop sequence of cloudstack management > >>> server > >>> > will > >>> > > >> ignore the fact that there may be long running async jobs - > >>> and > >>> > > >> terminate > >>> > > >> the process. This in turn can create a poor user experience > >>> and > >>> > > >> occasional > >>> > > >> inconsistency in cloudstack db. > >>> > > >> > >>> > > >> This is especially painful in large environments where the > >>> user > >>> has > >>> > > >> thousands of nodes and there is a continuous patching that > >>> happens > >>> > > >> around > >>> > > >> the clock - that requires migration of workload from one > >>> node to > >>> > > >> another. > >>> > > >> > >>> > > >> With that said - i've created a script that monitors the > >>> async > >>> job > >>> > > >> queue > >>> > > >> for given MS and waits for it complete all jobs. More > details > >>> are > >>> > > >> posted > >>> > > >> below. > >>> > > >> > >>> > > >> I'd like to introduce "graceful-shutdown" into the > >>> systemctl/service > >>> > of > >>> > > >> cloudstack-management service. > >>> > > >> > >>> > > >> The details of how it will work is below: > >>> > > >> > >>> > > >> Workflow for graceful shutdown: > >>> > > >> Using iptables/firewalld - block any connection attempts on > >>> 8080/8443 > >>> > > >> (we > >>> > > >> can identify the ports dynamically) > >>> > > >> Identify the MSID for the node, using the proper msid - > query > >>> > > >> async_job > >>> > > >> table for > >>> > > >> 1) any jobs that are still running (or job_status=“0”) > >>> > > >> 2) job_dispatcher not like “pseudoJobDispatcher" > >>> > > >> 3) job_init_msid=$my_ms_id > >>> > > >> > >>> > > >> Monitor this async_job table for 60 minutes - until all > async > >>> jobs for > >>> > > >> MSID > >>> > > >> are done, then proceed with shutdown > >>> > > >> If failed for any reason or terminated, catch the exit via > >>> trap > >>> > > >> command > >>> > > >> and unblock the 8080/8443 > >>> > > >> > >>> > > >> Comments are welcome > >>> > > >> > >>> > > >> Regards, > >>> > > >> ilya > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> -- > >>> > > >> Rafael Weingärtner > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> -- > >>> > > >> Rafael Weingärtner > >>> > > >> > >>> > > > >>> > > >>> > > >>> > > >>> > -- > >>> > > >>> > Andrija Panić > >>> > > >>> > >>> > >>> > >>> > >> > > -- > > Ron Wheeler > > President > > Artifact Software Inc > > email: rwhee...@artifact-software.com > > skype: ronaldmwheeler > > phone: 866-970-2435, ext 102 > > > > > > > -- > Rafael Weingärtner >