Hi Darren

The buildings are connected by fibre, currently with equipment running at 
40gbps, soon to be 100gbps, with sub-millisecond latency, so if the data is 
getting pulled from either building it's not a massive issue, as long as the 
cluster can stay operational if an entire building goes down for some reason - 
that's more the issue for us. Would EC have any issues with this? And is there 
guidance anywhere on the amount of processing power needed for something like 
4:6 parity? From what I've read/heard, the performance hit could be substantial 
at this level.


Third monitor placement is not an issue, got plenty of locations we can drop 
that. Can the monitor roles live on OSD hosts or should they really be entirely 
independent servers/VMs?


Replication is also a possibility for us, although a part of me hates the idea 
of a cluster sitting there doing nothing, just being an insurance policy.


Thanks

Brett

--- original message ---
On June 11, 2020, 12:34 AM GMT+10 darren.sooth...@suse.com wrote:



Hi Brett,



So how far apart are your buildings and what is the network connectivity 
between the buildings? I am going to assume they are close and you have lots of 
bandwidth.



There are a couple of options depending on the protocol and the distance 
between the buildings.



You could build an EC cluster with something like 4:6 so 4 data pieces and 6 
parity pieces (Assuming you have 5 nodes in each DC).



With this setup you can then have a failure of an entire DC and still have 
access to your data with protection. This is basically achieved by building the 
correct crush map rules which place half
the data in one DC and the other half in the other DC.



You would need to think about where you would put a third monitor in this case.



The down side of this is you could be reading data from either DC. Not sure 
where your workloads are.



There is another alternative of this which is to use LRC which creates the 
ability to rebuild the data with a DC this helps when it comes to rebuilds but 
doesn't help with where to read data from.



The other option would be replication. So build two separate clusters and you 
can configure S3 to replicate to the second site. Or setup rsync to replicate 
if using CephFS, not pretty but an option.



Darren







From:
Brett Randall <brett.rand...@gmail.com>

Date: Wednesday, 10 June 2020 at 15:20

To: ceph-users@ceph.io <ceph-users@ceph.io>

Subject: [ceph-users] Combining erasure coding and replication?


Hi all


We are looking at setting up our first ever Ceph cluster to replace Gluster as 
our media asset storage and production system. The Ceph cluster will have 5pb 
of usable storage. Whether we use it as object-storage, or put CephFS in front 
of it, is still TBD.


Obviously we’re keen to protect this data well. Our current Gluster setup 
utilises RAID-6 on each of the nodes and then we have a single replica of each 
brick. The Gluster bricks are split between buildings so that the replica is 
guaranteed to be in another
premises. By doing it this way, we guarantee that we can have a decent number 
of disk or node failures (even an entire building) before we lose both 
connectivity and data.


Our concern with Ceph is the cost of having three replicas. Storage may be 
cheap but I’d rather not buy ANOTHER 5pb for a third replica if there are ways 
to do this more efficiently. Site-level redundancy is important to us so we 
can’t simply create an erasure-coded
volume across two buildings – if we lose power to a building, the entire array 
would become unavailable. Likewise, we can’t simply have a single replica – our 
fault tolerance would drop way down on what it is right now.


Is there a way to use both erasure coding AND replication at the same time in 
Ceph to mimic the architecture we currently have in Gluster? I know we COULD 
just create RAID6 volumes on each node and use the entire volume as a single 
OSD, but that this is not
the recommended way to use Ceph. So is there some other way?


Apologies if this is a nonsensical question, I’m still trying to wrap my head 
around Ceph, CRUSH maps, placement rules, volume types, etc etc!


TIA


Brett


_______________________________________________

ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io
--- end of original message ---
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to