Re: [DISCUSS] CloudStack graceful shutdown

Rafael Weingärtner Fri, 20 Apr 2018 13:18:35 -0700

Is that management server load balancing feature using static
configurations? I heard about it on the mailing list, but I did not follow
the implementation.


I do not see many problems with agents reconnecting. We can implement in
agents (not just KVM, but also system VMs) a logic that instead of using a
static pool of management servers configured in a properties file, they
dynamically request a list of available management servers via that list
management servers API method. This would require us to configure agents
with a load balancer URL that executes the balancing between multiple
management servers.

I am +1 to remove the need for that VIP, which executes the load balance
for connecting agents to management servers.

On Fri, Apr 20, 2018 at 4:41 PM, ilya musayev <ilya.mailing.li...@gmail.com>
wrote:

> Rafael and Community
>
> All is well and good and i think we are thinking along the similar lines -
> the only issue that i see right now with any approach is KVM Agents (or
> direct agents) and using LoadBalancer on 8250.
>
> Here is a scenario:
>
> You have 2 Management Server setup fronted with a VIP on 8250.
> The LB Algorithm is either Round Robin or Least Connections used.
> You initiate a maintenance mode operation on one of the MS servers (call it
> MS1) - assume you have a long running migration job that needs 60 minutes
> to complete.
> We attempt to evacuate the agents by telling them to disconnect and
> reconnect again
> If we are using LB on 8250 with
> 1) Least Connection used - then all agents will continuously try to connect
> to a MS1 node that is attempting to go down for maintenance. Essentially
> with this  LB configuration this operation will never
> 2) Round Robin - this will take a while - but eventually - you will get all
> nodes connected to MS2
>
> The current limitation is usage of external LB on 8250. For this operation
> to work without issue - would mean agents must connect to MS server without
> an LB. This is a recent feature we've developed with ShapeBlue - where we
> maintain the list of CloudStack Management Servers in the agent.properties
> file.
>
> Unless you can think of other solution - it appears we may have to forced
> to bypass the 8250 VIP LB and use the new feature to maintain the list of
> management servers within agent.properties.
>
>
> I need to run now, let me know what your thoughts are.
>
> Regards
> ilya
>
>
>
> On Tue, Apr 17, 2018 at 8:27 AM, Rafael Weingärtner <
> rafaelweingart...@gmail.com> wrote:
>
> > Ilya and others,
> >
> > We have been discussing this idea of graceful/nicely shutdown.  Our
> feeling
> > is that we (in CloudStack community) might have been trying to solve this
> > problem with too much scripting. What if we developed a more integrated
> > (native) solution?
> >
> > Let me explain our idea.
> >
> > ACS has a table called “mshost”, which is used to store management server
> > information. During balancing and when jobs are dispatched to other
> > management servers this table is consulted/queried.  Therefore, we have
> > been discussing the idea of creating a management API for management
> > servers.  We could have an API method that changes the state of
> management
> > servers to “prepare to maintenance” and then “maintenance” (as soon as
> all
> > of the task/jobs it is managing finish). The idea is that during
> > rebalancing we would remove the hosts of servers that are not in “Up”
> state
> > (of course we would also ignore hosts in the aforementioned state to
> > receive hosts to manage).  Moreover, when we send/dispatch jobs to other
> > management servers, we could ignore the ones that are not in “Up” state
> > (which is something already done).
> >
> > By doing this, the nicely shutdown could be executed in a few steps.
> >
> > 1 – issue the maintenance method for the management server you desire
> > 2 – wait until the MS goes into maintenance mode, while there are still
> > running jobs it (the management server) will be maintained in prepare for
> > maintenance
> > 3 – execute the Linux shutdown command
> >
> > We would need other APIs methods to manage MSs then. An (i) API method to
> > list MSs, and we could even create an (ii) API to remove old/de-activated
> > management servers, which we currently do not have (forcing users to
> apply
> > changed directly in the database).
> >
> > Moreover, in this model, we would not kill hanging jobs; we would wait
> > until they expire and ACS expunges them. Of course, it is possible to
> > develop a forceful maintenance method as well. Then, when the “prepare
> for
> > maintenance” takes longer than a parameter, we could kill hanging jobs.
> >
> > All of this would allow the MS to be kept up and receiving requests until
> > it can be safely shutdown. What do you guys about this approach?
> >
> > On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yzh...@marketo.com>
> wrote:
> >
> > > As a cloud admin, I would love to have this feature.
> > >
> > > It so happens that I just accidentally restarted my ACS management
> server
> > > while two instances are migrating to another Xen cluster (via storage
> > > migration, not live migration).  As results, both instances
> > > ends up with corrupted data disk which can't be reattached or migrated.
> > >
> > > Any feature which prevents this from happening would be great.  A low
> > > hanging fruit is simply checking for
> > > if there are any async jobs running, especially any kind of migration
> > jobs
> > > or other known long running type of
> > > jobs and warn the operator  so that he has a chance to abort server
> > > shutdowns.
> > >
> > > Yiping
> > >
> > > On 4/5/18, 3:13 PM, "ilya musayev" <ilya.mailing.li...@gmail.com>
> > wrote:
> > >
> > >     Andrija
> > >
> > >     This is a tough scenario.
> > >
> > >     As an admin, they way i would have handled this situation, is to
> > > advertise
> > >     the upcoming outage and then take away specific API commands from a
> > > user a
> > >     day before - so he does not cause any long running async jobs. Once
> > >     maintenance completes - enable the API commands back to the user.
> > > However -
> > >     i dont know who your user base is and if this would be an
> acceptable
> > >     solution.
> > >
> > >     Perhaps also investigate what can be done to speed up your long
> > running
> > >     tasks...
> > >
> > >     As a side node, we will be working on a feature that would allow
> for
> > a
> > >     graceful termination of the process/job, meaning if agent noticed a
> > >     disconnect or termination request - it will abort the command in
> > > flight. We
> > >     can also consider restarting this tasks again or what not - but it
> > > would
> > >     not be part of this enhancement.
> > >
> > >     Regards
> > >     ilya
> > >
> > >     On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <
> > andrija.pa...@gmail.com
> > > >
> > >     wrote:
> > >
> > >     > Hi Ilya,
> > >     >
> > >     > thanks for the feedback - but in "real world", you need to
> > > "understand"
> > >     > that 60min is next to useless timeout for some jobs (if I
> > understand
> > > this
> > >     > specific parameter correctly ?? - job is really canceled, not
> only
> > > job
> > >     > monitoring is canceled ???) -
> > >     >
> > >     > My value for the  "job.cancel.threshold.minutes" is 2880 minutes
> (2
> > > days?)
> > >     >
> > >     > I can tell you when you have CEPH/NFS (CEPH even "worse" case,
> > since
> > > slower
> > >     > read durign qemu-img convert process...) of 500GB, then imagine
> > > snapshot
> > >     > job will take many hours. Should I mention 1TB volumes (yes, we
> had
> > >     > client's like that...)
> > >     > Than attaching 1TB volume, that was uploaded to ACS (lives
> > > originally on
> > >     > Secondary Storage, and takes time to be copied over to NFS/CEPH)
> > > will take
> > >     > up to few hours.
> > >     > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
> > > takes
> > >     > time...etc.
> > >     >
> > >     > I'm just giving you feedback as "user", admin of the cloud, zero
> > DEV
> > > skills
> > >     > here :) , just to make sure you make practical decisions (and I
> > > admit I
> > >     > might be wrong with my stuff, but just giving you feedback from
> our
> > > public
> > >     > cloud setup)
> > >     >
> > >     >
> > >     > Cheers!
> > >     >
> > >     >
> > >     >
> > >     >
> > >     > On 5 April 2018 at 15:16, Tutkowski, Mike <
> > mike.tutkow...@netapp.com
> > > >
> > >     > wrote:
> > >     >
> > >     > > Wow, there’s been a lot of good details noted from several
> people
> > > on how
> > >     > > this process works today and how we’d like it to work in the
> near
> > > future.
> > >     > >
> > >     > > 1) Any chance this is already documented on the Wiki?
> > >     > >
> > >     > > 2) If not, any chance someone would be willing to do so (a flow
> > > diagram
> > >     > > would be particularly useful).
> > >     > >
> > >     > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
> > > ma...@exoscale.ch>
> > >     > > wrote:
> > >     > > >
> > >     > > > Hi all,
> > >     > > >
> > >     > > > Good point ilya but as stated by Sergey there's more thing to
> > > consider
> > >     > > > before being able to do a proper shutdown. I augmented my
> > script
> > > I gave
> > >     > > you
> > >     > > > originally and changed code in CS. What we're doing for our
> > > environment
> > >     > > is
> > >     > > > as follow:
> > >     > > >
> > >     > > > 1. the MGMT looks for a change in the file /etc/lb-agent
> which
> > > contains
> > >     > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
> > > disable the
> > >     > > > mgmt on the keyword "maint" and the mgmt server stops a
> couple
> > of
> > >     > > > threads[1] to stop processing async jobs in the queue
> > >     > > > 2. Looks for the async jobs and wait until there is none to
> > > ensure you
> > >     > > can
> > >     > > > send the reconnect commands (if jobs are running, a reconnect
> > > will
> > >     > result
> > >     > > > in a failed job since the result will never reach the
> > management
> > >     > server -
> > >     > > > the agent waits for the current job to be done before
> > > reconnecting, and
> > >     > > > discard the result... rooms for improvement here!)
> > >     > > > 3. Issue a reconnectHost command to all the hosts connected
> to
> > > the mgmt
> > >     > > > server so that they reconnect to another one, otherwise the
> > mgmt
> > > must
> > >     > be
> > >     > > up
> > >     > > > since it is used to forward commands to agents.
> > >     > > > 4. when all agents are reconnected, we can shutdown the
> > > management
> > >     > server
> > >     > > > and perform the maintenance.
> > >     > > >
> > >     > > > One issue remains for me, during the reconnect, the commands
> > > that are
> > >     > > > processed at the same time should be kept in a queue until
> the
> > > agents
> > >     > > have
> > >     > > > finished any current jobs and have reconnected. Today the
> > little
> > > time
> > >     > > > window during which the reconnect happens can lead to failed
> > > jobs due
> > >     > to
> > >     > > > the agent not being connected at the right moment.
> > >     > > >
> > >     > > > I could push a PR for the change to stop some processing
> > threads
> > > based
> > >     > on
> > >     > > > the content of a file. It's possible also to cancel the drain
> > of
> > > the
> > >     > > > management by simply changing the content of the file back to
> > > "ready"
> > >     > > > again, instead of "maint" [2].
> > >     > > >
> > >     > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> > >     > > > [2] HA proxy documentation on agent checker:
> > > https://cbonte.github.io/
> > >     > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> > >     > > >
> > >     > > > Regarding your issue on the port blocking, I think it's fair
> to
> > >     > consider
> > >     > > > that if you want to shutdown your server at some point, you
> > have
> > > to
> > >     > stop
> > >     > > > serving (some) requests. Here the only way it's to stop
> serving
> > >     > > everything.
> > >     > > > If the API had a REST design, we could reject any
> > POST/PUT/DELETE
> > >     > > > operations and allow GET ones. I don't know how hard it would
> > be
> > > today
> > >     > to
> > >     > > > only allow listBaseCmd operations to be more friendly with
> the
> > > users.
> > >     > > >
> > >     > > > Marco
> > >     > > >
> > >     > > >
> > >     > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
> > > serg...@hotmail.com>
> > >     > > > wrote:
> > >     > > >
> > >     > > >> Now without spellchecking :)
> > >     > > >>
> > >     > > >> This is not simple e.g. for VMware. Each management server
> > also
> > > acts
> > >     > as
> > >     > > an
> > >     > > >> agent proxy so tasks against a particular ESX host will be
> > > always
> > >     > > >> forwarded. That right answer will be to support a native
> > > “maintenance
> > >     > > mode”
> > >     > > >> for management server. When entered to such mode the
> > management
> > > server
> > >     > > >> should release all agents including SSVM, block/redirect API
> > > calls and
> > >     > > >> login request and finish all async job it originated.
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
> > > serg...@hotmail.com
> > >     > > <mailto:
> > >     > > >> serg...@hotmail.com>> wrote:
> > >     > > >>
> > >     > > >> This is not simple e.g. for VMware. Each management server
> > also
> > > acts
> > >     > as
> > >     > > an
> > >     > > >> agent proxy so tasks against a particular ESX host will be
> > > always
> > >     > > >> forwarded. That right answer will be to a native support for
> > >     > > “maintenance
> > >     > > >> mode” for management server. When entered to such mode the
> > > management
> > >     > > >> server should release all agents including save,
> > block/redirect
> > > API
> > >     > > calls
> > >     > > >> and login request and finish all a sync job it originated.
> > >     > > >>
> > >     > > >> Sent from my iPhone
> > >     > > >>
> > >     > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> > >     > > >> rafaelweingart...@gmail.com<mailto:rafaelweingartner@
> > gmail.com
> > > >>
> > >     > wrote:
> > >     > > >>
> > >     > > >> Ilya, still regarding the management server that is being
> shut
> > > down
> > >     > > issue;
> > >     > > >> if other MSs/or maybe system VMs (I am not sure to know if
> > they
> > > are
> > >     > > able to
> > >     > > >> do such tasks) can direct/redirect/send new jobs to this
> > > management
> > >     > > server
> > >     > > >> (the one being shut down), the process might never end
> because
> > > new
> > >     > tasks
> > >     > > >> are always being created for the management server that we
> > want
> > > to
> > >     > shut
> > >     > > >> down. Is this scenario possible?
> > >     > > >>
> > >     > > >> That is why I mentioned blocking the port 8250 for the
> > >     > > “graceful-shutdown”.
> > >     > > >>
> > >     > > >> If this scenario is not possible, then everything s fine.
> > >     > > >>
> > >     > > >>
> > >     > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> > >     > > ilya.mailing.li...@gmail.com
> > >     > > >> <mailto:ilya.mailing.li...@gmail.com>>
> > >     > > >> wrote:
> > >     > > >>
> > >     > > >> I'm thinking of using a configuration from
> > >     > > "job.cancel.threshold.minutes" -
> > >     > > >> it will be the longest
> > >     > > >>
> > >     > > >>    "category": "Advanced",
> > >     > > >>
> > >     > > >>    "description": "Time (in minutes) for async-jobs to be
> > > forcely
> > >     > > >> cancelled if it has been in process for long",
> > >     > > >>
> > >     > > >>    "name": "job.cancel.threshold.minutes",
> > >     > > >>
> > >     > > >>    "value": "60"
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> > >     > > >> rafaelweingart...@gmail.com<mailto:rafaelweingartner@
> > gmail.com
> > > >>
> > >     > wrote:
> > >     > > >>
> > >     > > >> Big +1 for this feature; I only have a few doubts.
> > >     > > >>
> > >     > > >> * Regarding the tasks/jobs that management servers (MSs)
> > > execute; are
> > >     > > >> these
> > >     > > >> tasks originate from requests that come to the MS, or is it
> > > possible
> > >     > > that
> > >     > > >> requests received by one management server to be executed by
> > > other? I
> > >     > > >> mean,
> > >     > > >> if I execute a request against MS1, will this request always
> > be
> > >     > > >> executed/threated by MS1, or is it possible that this
> request
> > is
> > >     > > executed
> > >     > > >> by another MS (e.g. MS2)?
> > >     > > >>
> > >     > > >> * I would suggest that after we block traffic coming from
> > >     > > >> 8080/8443/8250(we
> > >     > > >> will need to block this as well right?), we can log the
> > > execution of
> > >     > > >> tasks.
> > >     > > >> I mean, something saying, there are XXX tasks (enumerate
> > tasks)
> > > still
> > >     > > >> being
> > >     > > >> executed, we will wait for them to finish before shutting
> > down.
> > >     > > >>
> > >     > > >> * The timeout (60 minutes suggested) could be global
> settings
> > > that we
> > >     > > can
> > >     > > >> load before executing the graceful-shutdown.
> > >     > > >>
> > >     > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> > >     > > >> ilya.mailing.li...@gmail.com<mailto:ilya.mailing.lists@
> > > gmail.com>
> > >     > > >>
> > >     > > >> wrote:
> > >     > > >>
> > >     > > >> Use case:
> > >     > > >> In any environment - time to time - administrator needs to
> > > perform a
> > >     > > >> maintenance. Current stop sequence of cloudstack management
> > > server
> > >     > will
> > >     > > >> ignore the fact that there may be long running async jobs -
> > and
> > >     > > >> terminate
> > >     > > >> the process. This in turn can create a poor user experience
> > and
> > >     > > >> occasional
> > >     > > >> inconsistency  in cloudstack db.
> > >     > > >>
> > >     > > >> This is especially painful in large environments where the
> > user
> > > has
> > >     > > >> thousands of nodes and there is a continuous patching that
> > > happens
> > >     > > >> around
> > >     > > >> the clock - that requires migration of workload from one
> node
> > to
> > >     > > >> another.
> > >     > > >>
> > >     > > >> With that said - i've created a script that monitors the
> async
> > > job
> > >     > > >> queue
> > >     > > >> for given MS and waits for it complete all jobs. More
> details
> > > are
> > >     > > >> posted
> > >     > > >> below.
> > >     > > >>
> > >     > > >> I'd like to introduce "graceful-shutdown" into the
> > > systemctl/service
> > >     > of
> > >     > > >> cloudstack-management service.
> > >     > > >>
> > >     > > >> The details of how it will work is below:
> > >     > > >>
> > >     > > >> Workflow for graceful shutdown:
> > >     > > >> Using iptables/firewalld - block any connection attempts on
> > > 8080/8443
> > >     > > >> (we
> > >     > > >> can identify the ports dynamically)
> > >     > > >> Identify the MSID for the node, using the proper msid -
> query
> > >     > > >> async_job
> > >     > > >> table for
> > >     > > >> 1) any jobs that are still running (or job_status=“0”)
> > >     > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> > >     > > >> 3) job_init_msid=$my_ms_id
> > >     > > >>
> > >     > > >> Monitor this async_job table for 60 minutes - until all
> async
> > > jobs for
> > >     > > >> MSID
> > >     > > >> are done, then proceed with shutdown
> > >     > > >>  If failed for any reason or terminated, catch the exit via
> > trap
> > >     > > >> command
> > >     > > >> and unblock the 8080/8443
> > >     > > >>
> > >     > > >> Comments are welcome
> > >     > > >>
> > >     > > >> Regards,
> > >     > > >> ilya
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >> --
> > >     > > >> Rafael Weingärtner
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >>
> > >     > > >> --
> > >     > > >> Rafael Weingärtner
> > >     > > >>
> > >     > >
> > >     >
> > >     >
> > >     >
> > >     > --
> > >     >
> > >     > Andrija Panić
> > >     >
> > >
> > >
> > >
> >
> >
> > --
> > Rafael Weingärtner
> >
>



-- 
Rafael Weingärtner

Re: [DISCUSS] CloudStack graceful shutdown

Reply via email to