On 7/18/19 8:44 AM, Matthew Pounsett wrote:

I've recently inherited a database that is dangerously close to outgrowing the 
available storage on its existing hardware.  I'm looking for (pointers to) 
advice on scaling the storage in a financially constrained not-for-profit.

The current size of the DB's data directory is just shy of 23TB.  When I 
received the machine it's on, it was configured with 18x3TB drives in RAID10 
(9x 2-drive mirrors striped together) for about 28TB of available storage.  As 
a short term measure I've reconfigured them into RAID50 (3x 6-drive RAID5 
arrays).  This is obviously a poor choice for performance, but it'll get us 
through until we figure out what to do about upgrading/replacing the hardware.  
The host is constrained to 24x3TB drives, so we can't get much of an upgrade by 
just adding/replacing disks.

One of my anticipated requirements for any replacement we design is that I 
should be able to do upgrades of Postgres for up to five years without needing 
major upgrades to the hardware.  My understanding of the standard upgrade 
process is that this requires that the data directory be smaller than the free 
storage (so that there is room to hold two copies of the data directory 
simultaneously).  I haven't got detailed growth statistics yet, but given that 
the DB has grown to 23TB in 5 years, I should assume that it could double in 
the next five years, requiring 100TB of available storage to be able to do 
updates.

This seems to be right on the cusp of what is possible to fit in a single 
chassis with a RAID10 configuration (at least, with commodify hardware), which 
means we're looking at pretty high cost:performance ratio.  I'd like to see if 
we can find designs that get that ratio down a bit, or a lot, but I'm a general 
sysadmin, and the detailed effects on those choices are outside of my limited 
DBA experience.

Are there good documents out there on sizing hardware for this sort of mid-range storage 
requirement, that is neither big data, nor "small data" able to fit on a single 
host?   I'm hoping for an overview of the tradeoffs between single head, dual-head setups 
with a JBOD array, or whatever else is advisable to consider these days.  Corrections of 
any poor assumptions exposed above are also quite welcome. :)

Thanks in advance for any assistance!


Now might be a good time to consider splitting the database onto multiple computers.  
Might be simpler with a mid-range database, then your plan for the future is "add 
more computers".

-Andy


Reply via email to