Re: Managed Storage and HA

Syed Ahmed Mon, 08 Jun 2020 09:14:13 -0700

My suggestion would be to use a "fake" volume for each host and use that to
check the activity of a host. The volume can be updated by the agent
periodically and then we can use the above API from the management server
to query the volume activity.


On Tue, Jun 2, 2020 at 1:18 PM Tutkowski, Mike <mike.tutkow...@netapp.com>
wrote:

> Hi Sven,
>
> You can use the ListVolumeStats API call (I put in an example request and
> response below).
>
> Since this goes over the management network, though, it's possible if your
> management network is down, but your storage network is up that this call
> could fail, but your VMs might still have perfectly good access to their
> volumes.
>
> Talk to you later!
> Mike
>
> Request:
>
> {
>    "method": "ListVolumeStats",
>    "params": {
>         "volumeIDs": [1, 2]
>    },
>    "id" : 1
> }
>
> Response:
>
> {
>     "id": 1,
>     "result": {
>         "volumeStats": [
>             {
>                 "accountID": 1,
>                 "actualIOPS": 14,
>                 "asyncDelay": null,
>                 "averageIOPSize": 13763,
>                 "burstIOPSCredit": 0,
>                 "clientQueueDepth": 0,
>                 "desiredMetadataHosts": null,
>                 "latencyUSec": 552,
>                 "metadataHosts": {
>                     "deadSecondaries": [],
>                     "liveSecondaries": [],
>                     "primary": 5
>                 },
>                 "nonZeroBlocks": 10962174,
>                 "normalizedIOPS": 34,
>                 "readBytes": 747306804224,
>                 "readBytesLastSample": 0,
>                 "readLatencyUSec": 0,
>                 "readLatencyUSecTotal": 11041939920,
>                 "readOps": 19877559,
>                 "readOpsLastSample": 0,
>                 "samplePeriodMSec": 500,
>                 "throttle": 0,
>                 "timestamp": "2020-06-02T17:14:35.444789Z",
>                 "unalignedReads": 2176454,
>                 "unalignedWrites": 1438822,
>                 "volumeAccessGroups": [
>                     1
>                 ],
>                 "volumeID": 1,
>                 "volumeSize": 2147483648000,
>                 "volumeUtilization": 0.002266666666666667,
>                 "writeBytes": 3231402834432,
>                 "writeBytesLastSample": 106496,
>                 "writeLatencyUSec": 552,
>                 "writeLatencyUSecTotal": 44174792405,
>                 "writeOps": 340339085,
>                 "writeOpsLastSample": 7,
>                 "zeroBlocks": 513325826
>             },
>             {
>                 "accountID": 1,
>                 "actualIOPS": 0,
>                 "asyncDelay": null,
>                 "averageIOPSize": 11261,
>                 "burstIOPSCredit": 0,
>                 "clientQueueDepth": 0,
>                 "desiredMetadataHosts": null,
>                 "latencyUSec": 0,
>                 "metadataHosts": {
>                     "deadSecondaries": [],
>                     "liveSecondaries": [],
>                     "primary": 5
>                 },
>                 "nonZeroBlocks": 28816654,
>                 "normalizedIOPS": 0,
>                 "readBytes": 778768996864,
>                 "readBytesLastSample": 0,
>                 "readLatencyUSec": 0,
>                 "readLatencyUSecTotal": 7068679159,
>                 "readOps": 14977610,
>                 "readOpsLastSample": 0,
>                 "samplePeriodMSec": 500,
>                 "throttle": 0,
>                 "timestamp": "2020-06-02T17:14:35.445978Z",
>                 "unalignedReads": 890959,
>                 "unalignedWrites": 358758,
>                 "volumeAccessGroups": [
>                     1
>                 ],
>                 "volumeID": 2,
>                 "volumeSize": 2147483648000,
>                 "volumeUtilization": 0,
>                 "writeBytes": 8957684071424,
>                 "writeBytesLastSample": 0,
>                 "writeLatencyUSec": 0,
>                 "writeLatencyUSecTotal": 16780712096,
>                 "writeOps": 406101472,
>                 "writeOpsLastSample": 0,
>                 "zeroBlocks": 495471346
>             }
>         ]
>     }
> }
>
> On 6/2/20, 9:11 AM, "Sven Vogel" <s.vo...@ewerk.com> wrote:
>
>     NetApp Security WARNING: This is an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
>     Hi Paul,
>
>     Thanks for the answer and help.
>
>     Ok. Secondary Storage is no good solution what I understand.
>
>     > 1. HAManager
>     > 2. HighAvailbilityManager
>     > 3. KVMHAConfig
>
>
>     which of the three should we expand and which one should be active?
>
>     @Mike did you know somethings like that if there is a check of volume
> activity?
>     Maybe we can poll the API but I think this will be a massive polling
> (overload) if we poll for each volume.
>     Ah the moment I don’t have any idea how this could work.
>
>     Cheers
>
>     Sven
>
>
>     __
>
>     Sven Vogel
>     Lead Cloud Solution Architect
>
>     EWERK DIGITAL GmbH
>     Brühl 24, D-04109 Leipzig
>     P +49 341 42649 - 99
>     F +49 341 42649 - 98
>     s.vo...@ewerk.com
>     www.ewerk.com
>
>     Geschäftsführer:
>     Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke
>     Registergericht: Leipzig HRB 9065
>
>     Zertifiziert nach:
>     ISO/IEC 27001:2013
>     DIN EN ISO 9001:2015
>     DIN ISO/IEC 20000-1:2011
>
>     EWERK-Blog | LinkedIn | Xing | Twitter | Facebook
>
>     Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
>
>     Disclaimer Privacy:
>     Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien)
> ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht der
> bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung,
> Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte
> informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie
> die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem System.
> Vielen Dank.
>
>     The contents of this e-mail (including any attachments) are
> confidential and may be legally privileged. If you are not the intended
> recipient of this e-mail, any disclosure, copying, distribution or use of
> its contents is strictly prohibited, and you should please notify the
> sender immediately and then delete it (including any attachments) from your
> system. Thank you.
>     > Am 01.06.2020 um 19:30 schrieb Paul Angus <paul.an...@shapeblue.com
> >:
>     >
>     > Hi Sven,
>     >
>     > I think that there is a piece of the jigsaw that you are missing.
>     >
>     > Given that the only thing that we know, is that we can no longer
> communicate with the host agent;  To avoid split brain/corruption of VMs,
> CloudStack must determine if the guests VMs are still running on the host
> not.  The only way we can do that is look for disk activity created by
> those VMs.
>     >
>     > Using a secondary storage heartbeat would give a false 'host is
> down' if say a switch went down carrying sec storage and mgmt. traffic
>     >
>     > Wrt solidfire, you could poll SolidFire via API for activity on the
> volumes which belong to the VMs on the unresponsive host.  I don't know if
> there is an equivalent for Ceph.
>     >
>     > Kind regards
>     >
>     >
>     > Paul Angus
>     >
>     >
>     >
>     > paul.an...@shapeblue.com
>     > www.shapeblue.com
>     > 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
>     > @shapeblue
>     >
>     >
>     >
>     >
>     > -----Original Message-----
>     > From: Sven Vogel <s.vo...@ewerk.com>
>     > Sent: 01 June 2020 12:30
>     > To: dev <dev@cloudstack.apache.org>
>     > Subject: Managed Storage and HA
>     >
>     > Hi Community,
>     >
>     > I try to encounter how HA works. Our goal is it to make it usable
> with managed storage like (Netapp Solidfire / maybe it works with CEPH too)
> so if its possible.
>     >
>     > This is a good guide and for some times we fixed and added the
> missing keys.
>     >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availability+Developer%27s+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availability+Developer's+Guide
> >
>     > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>     >
>     > In the database I found out there are three different types of HA.
>     >
>     > If you select in the configuration table "SELECT * FROM
> `configuration` WHERE `component` LIKE '%ha%' LIMIT 0,1000;“ you will get
> three types of components.
>     >
>     > 1. HAManager
>     > 2. HighAvailbilityManager
>     > 3. KVMHAConfig
>     >
>     > "HAManager and HighAvailbilityManager" are the base which was
> extended from Rohit with „KVMHAConfig“ - KVM with stonith fencing.
>     >
>     > I understand all things work together but maybe I need to understand
> the process a little bit better.
>     >
>     >
> ------------------------------------------------------------------------------------
>     > To clarify I write down what I think to each of them. This is what I
> understand but please correct me or help me to understand it a little bit
> better.
>     >
>     > —>I found out that if we use managed storage a restart of virtual
> machines only works on the same host. This is what I understand the lack of
> the missing heartbeat file on the shared storage because we don’t have
> shared storage like NFS.
>     >
>     > —
>     > "If the network ping investigation returns that it cannot detect the
> status of the host, CloudStack HA then relies on the hypervisor specific
> investigation. For VMware, there is no such investigation as the hypervisor
> host handles its own HA. For XenServer and KVM, CloudStack HA deploys a
> monitoring script that writes the current timestamp on to a heartbeat file
> on shared storage."
>     > —
>     >
>     > And
>     >
>     > —
>     > For the initial release, only KVM with NFS storage will be
> supported. However, the storage check component will be implemented in a
> modular fashion allowing for checks using other storage platforms(e.g.
> Ceph) in the future.
>     > —
>     >
> ------------------------------------------------------------------------------------
>     >
>     > We would implement a plugin or extend this for managed storage but
> at the moment I need to understand where this should happen. Since managed
> storage uses different volumes for each VM we should its not easy to make
> an storage heartbeat like NFS. the lack of one missing volume don’t means
> the hole storage has an problem so I think its not easy to encounter from
> one volumes to a complete storage.
>     >
>     > We don’t use KVMHAConfig a the moment and encounter that if a Host
> goes down (offline) the virtual machines will not be restarted on another
> host. They will only restarted on the host if the host comes back.(online).
> We don’t want a hard fencing of the hosts but we want a correct
> determination whether the host is still alive. Fencing would maybe in our
> case a little bit hard because we don’t have an hard data corruption on
> entire storage.
>     >
>     > Some questions.
>     > 1. let's assume correctly that the HA don’t work without an shared
> storage and network ping? Is this the cause why our virtual machines will
> not restarted on another host? Is this correct or do we have an config
> problem?
>     > 2. Where could the plugin be implemented? Is there a preferred place?
>     > 3. If point 1. Is correctly I thought the idea would be to add an
> global flag to use the secondary storage (NFS) as heartbeat to find out if
> there is any host inactive?
>     >
>     > Thanks and Cheers
>     >
>     > Sven
>     >
>     >
>     > __
>     >
>     > Sven Vogel
>     > Lead Cloud Solution Architect
>     >
>     > EWERK DIGITAL GmbH
>     > Brühl 24, D-04109 Leipzig
>     > P +49 341 42649 - 99
>     > F +49 341 42649 - 98
>     > s.vo...@ewerk.com
>     > www.ewerk.com
>     >
>     > Geschäftsführer:
>     > Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke
>     > Registergericht: Leipzig HRB 9065
>     >
>     > Support:
>     > +49 341 42649 555
>     >
>     > Zertifiziert nach:
>     > ISO/IEC 27001:2013
>     > DIN EN ISO 9001:2015
>     > DIN ISO/IEC 20000-1:2011
>     >
>     > ISAE 3402 Typ II Assessed
>     >
>     > EWERK-Blog<https://blog.ewerk.com/> | LinkedIn<
> https://www.linkedin.com/company/ewerk-group> | Xing<
> https://www.xing.com/company/ewerk> | Twitter<
> https://twitter.com/EWERK_Group> | Facebook<
> https://de-de.facebook.com/EWERK.IT/>
>     >
>     >
>     > Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
>     >
>     > Disclaimer Privacy:
>     > Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter
> Dateien) ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie
> nicht der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung,
> Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte
> informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie
> die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem System.
> Vielen Dank.
>     >
>     > The contents of this e-mail (including any attachments) are
> confidential and may be legally privileged. If you are not the intended
> recipient of this e-mail, any disclosure, copying, distribution or use of
> its contents is strictly prohibited, and you should please notify the
> sender immediately and then delete it (including any attachments) from your
> system. Thank you.
>
>
>

Re: Managed Storage and HA

Reply via email to