Re: [DISCUSS] CloudStack graceful shutdown

2018-04-24 Thread Rafael Weingärtner
Thanks for the feedback Ilya. Then, we would only need to adapt this new feature introduced by you and ShapeBlue. On Sat, Apr 21, 2018 at 4:03 PM, ilya musayev wrote: > Rafael > > What you are suggesting - was already implemented. We've created Load > Balancing algorithms - but we did not take

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-21 Thread ilya musayev
Rafael What you are suggesting - was already implemented. We've created Load Balancing algorithms - but we did not take into account the LB algo for maintenance (yet). Rohit and ShapeBlue were the developers behind the feature. What needs to happen is a tweak to LB Algorithms to become MS mainten

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-20 Thread Rafael Weingärtner
Is that management server load balancing feature using static configurations? I heard about it on the mailing list, but I did not follow the implementation. I do not see many problems with agents reconnecting. We can implement in agents (not just KVM, but also system VMs) a logic that instead of u

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-20 Thread ilya musayev
Rafael and Community All is well and good and i think we are thinking along the similar lines - the only issue that i see right now with any approach is KVM Agents (or direct agents) and using LoadBalancer on 8250. Here is a scenario: You have 2 Management Server setup fronted with a VIP on 8250

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-18 Thread Marc-Aurèle Brothier
As we are already using a list management server API calls to handle the scripting of the shutdown/upgrade/start, I manually backported the code: https://github.com/apache/cloudstack/pull/2578 On Tue, Apr 17, 2018 at 9:31 PM, Rafael Weingärtner < rafaelweingart...@gmail.com> wrote: > Ron, that i

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-17 Thread Rafael Weingärtner
Ron, that is a good analogy. There is something else that I forgot to mention. We discussed the issue of migrating Jobs/tasks to other management servers. This is not something easy to achieve because of the way it is currently implemented in ACS. However, as soon as we have a more comprehensive s

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-17 Thread Ron Wheeler
Part of this sounds like the Windows shut down process which is familiar to many. For those who have never used Windows: Once you initiate the shutdown, it asks the tasks to shut down. If tasks have not shutdown within a "reasonable period", it lists them and asks you if you want to wait a bit

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-17 Thread Rafael Weingärtner
Ilya and others, We have been discussing this idea of graceful/nicely shutdown. Our feeling is that we (in CloudStack community) might have been trying to solve this problem with too much scripting. What if we developed a more integrated (native) solution? Let me explain our idea. ACS has a tab

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-10 Thread Yiping Zhang
As a cloud admin, I would love to have this feature. It so happens that I just accidentally restarted my ACS management server while two instances are migrating to another Xen cluster (via storage migration, not live migration). As results, both instances ends up with corrupted data disk whi

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-05 Thread ilya musayev
Andrija This is a tough scenario. As an admin, they way i would have handled this situation, is to advertise the upcoming outage and then take away specific API commands from a user a day before - so he does not cause any long running async jobs. Once maintenance completes - enable the API comman

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-05 Thread ilya musayev
After much useful input from many of you - i realize my approach is somewhat incomplete and possible very optimistic. Speaking to Marcus, here is what we propose as alternate solution, i was hoping to stay outside of the "core" - but it looks like there is no other away around it. Proposed functi

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-05 Thread ilya musayev
Hi Sergey Glad to see you are doing well, I was gonna say drop "enterprise virtualization company" and save a $fortune$ - but its not for everyone :) I'll post another proposed solution to bottom of this thread. Regards ilya On Wed, Apr 4, 2018 at 5:22 PM, Sergey Levitskiy wrote: > Now with

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-05 Thread ilya musayev
Marc Thank you posting the details on how your implementation works. Unfortunately for us - HAproxy is not an option - hence we cant take advantage of this implementation, but please do share with the community - perhaps it will help someone else. I'm going to post to the bottom of this thread wi

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-05 Thread Andrija Panic
Hi Ilya, thanks for the feedback - but in "real world", you need to "understand" that 60min is next to useless timeout for some jobs (if I understand this specific parameter correctly ?? - job is really canceled, not only job monitoring is canceled ???) - My value for the "job.cancel.threshold.m

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-05 Thread Tutkowski, Mike
Wow, there’s been a lot of good details noted from several people on how this process works today and how we’d like it to work in the near future. 1) Any chance this is already documented on the Wiki? 2) If not, any chance someone would be willing to do so (a flow diagram would be particularly

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-05 Thread Marc-Aurèle Brothier
Hi all, Good point ilya but as stated by Sergey there's more thing to consider before being able to do a proper shutdown. I augmented my script I gave you originally and changed code in CS. What we're doing for our environment is as follow: 1. the MGMT looks for a change in the file /etc/lb-agent

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread Sergey Levitskiy
Now without spellchecking :) This is not simple e.g. for VMware. Each management server also acts as an agent proxy so tasks against a particular ESX host will be always forwarded. That right answer will be to support a native “maintenance mode” for management server. When entered to such mode

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread Sergey Levitskiy
This is not simple e.g. for VMware. Each management server also acts as an agent proxy so tasks against a particular ESX host will be always forwarded. That right answer will be to a native support for “maintenance mode” for management server. When entered to such mode the management server shou

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread Rafael Weingärtner
Ilya, still regarding the management server that is being shut down issue; if other MSs/or maybe system VMs (I am not sure to know if they are able to do such tasks) can direct/redirect/send new jobs to this management server (the one being shut down), the process might never end because new tasks

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread ilya musayev
I'm thinking of using a configuration from "job.cancel.threshold.minutes" - it will be the longest "category": "Advanced", "description": "Time (in minutes) for async-jobs to be forcely cancelled if it has been in process for long", "name": "job.cancel.threshold.minutes",

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread ilya musayev
Rafael > * Regarding the tasks/jobs that management servers (MSs) execute; are these tasks originate from requests that come to the MS, or is it possible that requests received by one management server to be executed by other? I mean, if I execute a request against MS1, will this request always be

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread ilya musayev
Andrija This is the reason for this enhancement, snapshot, migration and others - are all async jobs - and therefore should be tracked in async_job table under specific MS.It is known they may take a while to complete and last thing we want is to interrupt it. Depending on what value you have set

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread Andrija Panic
One comment here (I had to shutdown whole DC for few hours recently), please make sure to perhaps at least consider snapshoting process as the special case - it can take few hours for snapshot to complete really (copy process from Primary to Secondary Storage) I did (in my recent unfortunate D

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread Tutkowski, Mike
I may be remembering this incorrectly, but from what I recall, if a resource is owned by one MS and a request related to that resource comes in to another MS, the MS that received the request passes it on to the other MS. > On Apr 4, 2018, at 2:36 PM, Rafael Weingärtner > wrote: > > Big +1 fo

Re: [DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread Rafael Weingärtner
Big +1 for this feature; I only have a few doubts. * Regarding the tasks/jobs that management servers (MSs) execute; are these tasks originate from requests that come to the MS, or is it possible that requests received by one management server to be executed by other? I mean, if I execute a reques

[DISCUSS] CloudStack graceful shutdown

2018-04-04 Thread ilya musayev
Use case: In any environment - time to time - administrator needs to perform a maintenance. Current stop sequence of cloudstack management server will ignore the fact that there may be long running async jobs - and terminate the process. This in turn can create a poor user experience and occasional