Re: Managed Storage and HA

Tutkowski, Mike Tue, 02 Jun 2020 10:19:05 -0700

Hi Sven,

You can use the ListVolumeStats API call (I put in an example request and 
response below).


Since this goes over the management network, though, it's possible if your 
management network is down, but your storage network is up that this call could 
fail, but your VMs might still have perfectly good access to their volumes.

Talk to you later!
Mike

Request:

{
   "method": "ListVolumeStats",
   "params": {
        "volumeIDs": [1, 2]
   },
   "id" : 1
}

Response:

{
    "id": 1,
    "result": {
        "volumeStats": [
            {
                "accountID": 1,
                "actualIOPS": 14,
                "asyncDelay": null,
                "averageIOPSize": 13763,
                "burstIOPSCredit": 0,
                "clientQueueDepth": 0,
                "desiredMetadataHosts": null,
                "latencyUSec": 552,
                "metadataHosts": {
                    "deadSecondaries": [],
                    "liveSecondaries": [],
                    "primary": 5
                },
                "nonZeroBlocks": 10962174,
                "normalizedIOPS": 34,
                "readBytes": 747306804224,
                "readBytesLastSample": 0,
                "readLatencyUSec": 0,
                "readLatencyUSecTotal": 11041939920,
                "readOps": 19877559,
                "readOpsLastSample": 0,
                "samplePeriodMSec": 500,
                "throttle": 0,
                "timestamp": "2020-06-02T17:14:35.444789Z",
                "unalignedReads": 2176454,
                "unalignedWrites": 1438822,
                "volumeAccessGroups": [
                    1
                ],
                "volumeID": 1,
                "volumeSize": 2147483648000,
                "volumeUtilization": 0.002266666666666667,
                "writeBytes": 3231402834432,
                "writeBytesLastSample": 106496,
                "writeLatencyUSec": 552,
                "writeLatencyUSecTotal": 44174792405,
                "writeOps": 340339085,
                "writeOpsLastSample": 7,
                "zeroBlocks": 513325826
            },
            {
                "accountID": 1,
                "actualIOPS": 0,
                "asyncDelay": null,
                "averageIOPSize": 11261,
                "burstIOPSCredit": 0,
                "clientQueueDepth": 0,
                "desiredMetadataHosts": null,
                "latencyUSec": 0,
                "metadataHosts": {
                    "deadSecondaries": [],
                    "liveSecondaries": [],
                    "primary": 5
                },
                "nonZeroBlocks": 28816654,
                "normalizedIOPS": 0,
                "readBytes": 778768996864,
                "readBytesLastSample": 0,
                "readLatencyUSec": 0,
                "readLatencyUSecTotal": 7068679159,
                "readOps": 14977610,
                "readOpsLastSample": 0,
                "samplePeriodMSec": 500,
                "throttle": 0,
                "timestamp": "2020-06-02T17:14:35.445978Z",
                "unalignedReads": 890959,
                "unalignedWrites": 358758,
                "volumeAccessGroups": [
                    1
                ],
                "volumeID": 2,
                "volumeSize": 2147483648000,
                "volumeUtilization": 0,
                "writeBytes": 8957684071424,
                "writeBytesLastSample": 0,
                "writeLatencyUSec": 0,
                "writeLatencyUSecTotal": 16780712096,
                "writeOps": 406101472,
                "writeOpsLastSample": 0,
                "zeroBlocks": 495471346
            }
        ]
    }
}

On 6/2/20, 9:11 AM, "Sven Vogel" <s.vo...@ewerk.com> wrote:

    NetApp Security WARNING: This is an external email. Do not click links or 
open attachments unless you recognize the sender and know the content is safe.




    Hi Paul,

    Thanks for the answer and help.

    Ok. Secondary Storage is no good solution what I understand.

    > 1. HAManager
    > 2. HighAvailbilityManager
    > 3. KVMHAConfig


    which of the three should we expand and which one should be active?

    @Mike did you know somethings like that if there is a check of volume 
activity?
    Maybe we can poll the API but I think this will be a massive polling 
(overload) if we poll for each volume.
    Ah the moment I don’t have any idea how this could work.

    Cheers

    Sven


    __

    Sven Vogel
    Lead Cloud Solution Architect

    EWERK DIGITAL GmbH
    Brühl 24, D-04109 Leipzig
    P +49 341 42649 - 99
    F +49 341 42649 - 98
    s.vo...@ewerk.com
    www.ewerk.com

    Geschäftsführer:
    Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke
    Registergericht: Leipzig HRB 9065

    Zertifiziert nach:
    ISO/IEC 27001:2013
    DIN EN ISO 9001:2015
    DIN ISO/IEC 20000-1:2011

    EWERK-Blog | LinkedIn | Xing | Twitter | Facebook

    Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.

    Disclaimer Privacy:
    Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien) ist 
vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht der 
bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung, 
Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte 
informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie die 
E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem System. Vielen 
Dank.

    The contents of this e-mail (including any attachments) are confidential 
and may be legally privileged. If you are not the intended recipient of this 
e-mail, any disclosure, copying, distribution or use of its contents is 
strictly prohibited, and you should please notify the sender immediately and 
then delete it (including any attachments) from your system. Thank you.
    > Am 01.06.2020 um 19:30 schrieb Paul Angus <paul.an...@shapeblue.com>:
    >
    > Hi Sven,
    >
    > I think that there is a piece of the jigsaw that you are missing.
    >
    > Given that the only thing that we know, is that we can no longer 
communicate with the host agent;  To avoid split brain/corruption of VMs, 
CloudStack must determine if the guests VMs are still running on the host not.  
The only way we can do that is look for disk activity created by those VMs.
    >
    > Using a secondary storage heartbeat would give a false 'host is down' if 
say a switch went down carrying sec storage and mgmt. traffic
    >
    > Wrt solidfire, you could poll SolidFire via API for activity on the 
volumes which belong to the VMs on the unresponsive host.  I don't know if 
there is an equivalent for Ceph.
    >
    > Kind regards
    >
    >
    > Paul Angus
    >
    >
    >
    > paul.an...@shapeblue.com
    > www.shapeblue.com
    > 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
    > @shapeblue
    >
    >
    >
    >
    > -----Original Message-----
    > From: Sven Vogel <s.vo...@ewerk.com>
    > Sent: 01 June 2020 12:30
    > To: dev <dev@cloudstack.apache.org>
    > Subject: Managed Storage and HA
    >
    > Hi Community,
    >
    > I try to encounter how HA works. Our goal is it to make it usable with 
managed storage like (Netapp Solidfire / maybe it works with CEPH too) so if 
its possible.
    >
    > This is a good guide and for some times we fixed and added the missing 
keys.
    > 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availability+Developer%27s+Guide<https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availability+Developer's+Guide>
    > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
    >
    > In the database I found out there are three different types of HA.
    >
    > If you select in the configuration table "SELECT * FROM `configuration` 
WHERE `component` LIKE '%ha%' LIMIT 0,1000;“ you will get three types of 
components.
    >
    > 1. HAManager
    > 2. HighAvailbilityManager
    > 3. KVMHAConfig
    >
    > "HAManager and HighAvailbilityManager" are the base which was extended 
from Rohit with „KVMHAConfig“ - KVM with stonith fencing.
    >
    > I understand all things work together but maybe I need to understand the 
process a little bit better.
    >
    > 
------------------------------------------------------------------------------------
    > To clarify I write down what I think to each of them. This is what I 
understand but please correct me or help me to understand it a little bit 
better.
    >
    > —>I found out that if we use managed storage a restart of virtual 
machines only works on the same host. This is what I understand the lack of the 
missing heartbeat file on the shared storage because we don’t have shared 
storage like NFS.
    >
    > —
    > "If the network ping investigation returns that it cannot detect the 
status of the host, CloudStack HA then relies on the hypervisor specific 
investigation. For VMware, there is no such investigation as the hypervisor 
host handles its own HA. For XenServer and KVM, CloudStack HA deploys a 
monitoring script that writes the current timestamp on to a heartbeat file on 
shared storage."
    > —
    >
    > And
    >
    > —
    > For the initial release, only KVM with NFS storage will be supported. 
However, the storage check component will be implemented in a modular fashion 
allowing for checks using other storage platforms(e.g. Ceph) in the future.
    > —
    > 
------------------------------------------------------------------------------------
    >
    > We would implement a plugin or extend this for managed storage but at the 
moment I need to understand where this should happen. Since managed storage 
uses different volumes for each VM we should its not easy to make an storage 
heartbeat like NFS. the lack of one missing volume don’t means the hole storage 
has an problem so I think its not easy to encounter from one volumes to a 
complete storage.
    >
    > We don’t use KVMHAConfig a the moment and encounter that if a Host goes 
down (offline) the virtual machines will not be restarted on another host. They 
will only restarted on the host if the host comes back.(online). We don’t want 
a hard fencing of the hosts but we want a correct determination whether the 
host is still alive. Fencing would maybe in our case a little bit hard because 
we don’t have an hard data corruption on entire storage.
    >
    > Some questions.
    > 1. let's assume correctly that the HA don’t work without an shared 
storage and network ping? Is this the cause why our virtual machines will not 
restarted on another host? Is this correct or do we have an config problem?
    > 2. Where could the plugin be implemented? Is there a preferred place?
    > 3. If point 1. Is correctly I thought the idea would be to add an global 
flag to use the secondary storage (NFS) as heartbeat to find out if there is 
any host inactive?
    >
    > Thanks and Cheers
    >
    > Sven
    >
    >
    > __
    >
    > Sven Vogel
    > Lead Cloud Solution Architect
    >
    > EWERK DIGITAL GmbH
    > Brühl 24, D-04109 Leipzig
    > P +49 341 42649 - 99
    > F +49 341 42649 - 98
    > s.vo...@ewerk.com
    > www.ewerk.com
    >
    > Geschäftsführer:
    > Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke
    > Registergericht: Leipzig HRB 9065
    >
    > Support:
    > +49 341 42649 555
    >
    > Zertifiziert nach:
    > ISO/IEC 27001:2013
    > DIN EN ISO 9001:2015
    > DIN ISO/IEC 20000-1:2011
    >
    > ISAE 3402 Typ II Assessed
    >
    > EWERK-Blog<https://blog.ewerk.com/> | 
LinkedIn<https://www.linkedin.com/company/ewerk-group> | 
Xing<https://www.xing.com/company/ewerk> | 
Twitter<https://twitter.com/EWERK_Group> | 
Facebook<https://de-de.facebook.com/EWERK.IT/>
    >
    >
    > Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
    >
    > Disclaimer Privacy:
    > Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien) 
ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht der 
bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung, 
Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte 
informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie die 
E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem System. Vielen 
Dank.
    >
    > The contents of this e-mail (including any attachments) are confidential 
and may be legally privileged. If you are not the intended recipient of this 
e-mail, any disclosure, copying, distribution or use of its contents is 
strictly prohibited, and you should please notify the sender immediately and 
then delete it (including any attachments) from your system. Thank you.

Re: Managed Storage and HA

Reply via email to