+1
----- Original Message ----- From: "Alex Huang" <alex.hu...@citrix.com> To: dev@cloudstack.apache.org Sent: Thursday, 3 April, 2014 6:47:22 PM Subject: RE: ALARM - ACS reboots host servers!!! This is a severe bug if that's the case. It's supposed to stop the heartbeat script when a primary storage is placed in maintenance. --Alex > -----Original Message----- > From: France [mailto:mailingli...@isg.si] > Sent: Thursday, April 3, 2014 1:06 AM > To: dev@cloudstack.apache.org > Subject: Re: ALARM - ACS reboots host servers!!! > > I'm also interested in this issue. > Can any1 from developers confirm this is expected behavior? > > On 2/4/14 2:32 PM, Andrei Mikhailovsky wrote: > > Coming back to this issue. > > > > This time to perform the maintenance of the nfs primary storage I've > plated the storage in question in the Maintenance mode. After about 20 > minutes ACS showed the nfs storage is in Maintenance. However, none of > the virtual machines with volumes on that storage were stopped. I've > manually stopped the virtual machines and went to upgrade and restart the > nfs server. > > > > A few minutes after the nfs server shutdown all of my host servers went > into reboot killing all vms! > > > > Thus, it seems that putting nfs server in Maintenance mode does not stop > ACS agent from restarting the host servers. > > > > Does anyone know a way to stop this behaviour? > > > > Thanks > > > > Andrei > > > > > > ----- Original Message ----- > > From: "France" <mailingli...@isg.si> > > To: us...@cloudstack.apache.org > > Cc: dev@cloudstack.apache.org > > Sent: Monday, 3 March, 2014 9:49:28 AM > > Subject: Re: ALARM - ACS reboots host servers!!! > > > > I believe this is a bug too, because VMs not running on the storage, > > get destroyed too: > > > > Issue has been around for a long time, like with all others I reported. > > They do not get fixed: > > https://issues.apache.org/jira/browse/CLOUDSTACK-3367 > > > > We even lost assignee today. > > > > Regards, > > F. > > > > On 3/3/14 6:55 AM, Koushik Das wrote: > >> The primary storage needs to be put in maintenance before doing any > upgrade/reboot as mentioned in the previous mails. > >> > >> -Koushik > >> > >> On 03-Mar-2014, at 6:07 AM, Marcus <shadow...@gmail.com> wrote: > >> > >>> Also, please note that in the bug you referenced it doesn't have a > >>> problem with the reboot being triggered, but with the fact that > >>> reboot never completes due to hanging NFS mount (which is why the > >>> reboot occurs, inaccessible primary storage). > >>> > >>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <shadow...@gmail.com> wrote: > >>>> Or do you mean you have multiple primary storages and this one was > >>>> not in use and put into maintenance? > >>>> > >>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <shadow...@gmail.com> > wrote: > >>>>> I'm not sure I understand. How do you expect to reboot your > >>>>> primary storage while vms are running? It sounds like the host is > >>>>> being fenced since it cannot contact the resources it depends on. > >>>>> > >>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <n...@li.nux.ro> wrote: > >>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote: > >>>>>>> Hello guys, > >>>>>>> > >>>>>>> > >>>>>>> I've recently came across the bug CLOUDSTACK-5429 which has > >>>>>>> rebooted all of my host servers without properly shutting down the > guest vms. > >>>>>>> I've simply upgraded and rebooted one of the nfs primary storage > >>>>>>> servers and a few minutes later, to my horror, i've found out > >>>>>>> that all of my host servers have been rebooted. Is it just me > >>>>>>> thinking so, or is this bug should be fixed ASAP and should be a > >>>>>>> blocker for any new ACS release. I mean not only does it cause > >>>>>>> downtime, but also possible data loss and server corruption. > >>>>>> Hi Andrei, > >>>>>> > >>>>>> Do you have HA enabled and did you put that primary storage in > >>>>>> maintenance mode before rebooting it? > >>>>>> It's my understanding that ACS relies on the shared storage to > >>>>>> perform HA so if the storage goes it's expected to go berserk. > >>>>>> I've noticed similar behaviour in Xenserver pools without ACS. > >>>>>> I'd imagine a "cure" for this would be to use network distributed > >>>>>> "filesystems" like GlusterFS or CEPH. > >>>>>> > >>>>>> Lucian > >>>>>> > >>>>>> -- > >>>>>> Sent from the Delta quadrant using Borg technology! > >>>>>> > >>>>>> Nux! > >>>>>> www.nux.ro