Hello

I am thinking of building a Gluster file system for archival data.  Initially 
it will start as 6 brick dispersed volume then expand to distributed dispersed 
as we increase capacity.

Since metadata in Gluster isn't centralized it will eventually not perform well 
at scale.  So I am wondering if anyone can help identify that point?  Ceph can 
scale to extremely high levels though the complexity required for management 
seems much greater than Gluster.

The first six bricks would be a little over 2PB of raw space.  Each server will 
have 24 7200 RPM NL-SAS drives sans RAID.  I estimate we would max out at about 
100 million files within these first six servers, though that can be reduced by 
having users tar their small files before writing to Gluster.   I/O patterns 
would be sequential upon initial copy with very infrequent reads thereafter.  
Given the demands of erasure coding, especially if we lose a brick, the CPUs 
will be high thread count AMD Rome.  The back-end network would be EDR 
Infiniband, so I will mount via RDMA, while all bricks will be leaf local.

Given these variables can anyone say whether Gluster would be able to operate 
at this level of metadata and continue to scale?  If so where could it break, 
4PB, 12PB, with that being defined as I/O, with all bricks still online, 
breaking down dramatically?

Thank you!
Doug


--
Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit<https://scu.med.cornell.edu/>
Weill Cornell Medicine
E: [email protected]
O: 212-746-6305
F: 212-746-8690
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to