GitHub user daviftorres edited a comment on the discussion: Additional Zone vs
Region - US East and West
Dear @justinestruch and @NuxRo ,
This is a discussion that we circle back every week. How to expand
geographically maintaining the latency as low as possible and add resiliency to
our infrastructure.
Currently, we have one primary region with the core management of CloudStack,
and multiple satellite (remote) Zones geographically dispersed. Here are my
observations:
Facts:
- We use 4 management servers where:
- 1st primarily responds to UI/API. If unhealthy, it fails over to 3rd, 4th,
and 2nd.
- 2nd first priority for Agents to connect. Followed by 4th, 3rd, and 1st.
- Note:
- 1st and 2nd have dedicated host (metal),
- 3rd and 4th shared computing resources (virtualized) in a different
failure domain.
- Primary -> Secondary database replication.
- The Primary has dedicated hardware,
- Secondary share computing resources in a different failure domain.
- Additionally, encrypted database snapshot every 15 min two different
geographies for DR.
Observations:
- Even with multiple Zones across North America connected to the same core
management, we have:
- No latency issue between management servers and database (all in same
geography),
- No perceptible latency to the satellite Zones (~5,000 km apart),
- Resiliency to a datacenter (region) blackout.:
- With automation, a new CloudStack core management can be deployed in
minutes,
- Only the DB needs to be restored. Which is similar to the proposal number 1
from @NuxRo.
- Note: promoting a DB into primary is not trivial and has the risk of
cause split brains.
Reflections:
- The focus of the architecture has to be on the DB.
- Replication, Galera, InnoDB Cluster, you name it!
- Point in time recovery: Snapshot, Dumps, Replication, etc.
- Management servers and databases in different failure domains addresses
partial DC outages.
- Automation plus Snapshots resolve the need for a DR from a total DC outage.
There are many additional topics we can explore, such as serving ACS at the
edge (similar to a CDN) with caching and BGP advertising in multiple locations,
or placing one dedicated management server in each region to serve only that
Zone’s Agents.
Please share your thoughts, this is a highly relevant subject that warrants
thorough discussion.
GitHub link:
https://github.com/apache/cloudstack/discussions/12115#discussioncomment-15109007
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]