Can you say more about how and how often these 200TB get used, queried, updated? Is a different usage profile needed? What kind of column families do you have in mind for them?
On Thu, Apr 19, 2012 at 8:24 AM, Franc Carter <franc.car...@sirca.org.au>wrote: > On Thu, Apr 19, 2012 at 10:16 PM, Yiming Sun <yiming....@gmail.com> wrote: > >> 600 TB is really a lot, even 200 TB is a lot. In our organization, >> storage at such scale is handled by our storage team and they purchase >> specialized (and very expensive) equipment from storage hardware vendors >> because at this scale, performance and reliability is absolutely critical. > > > Yep that's what we currently do. We have 200TB sitting on a set of high > end disk arrays which are running RAID6. I'm in the early stages of looking > at whether this is still the best approach. > > >> >> but it sounds like your team may not be able to afford such equipment. >> 600GB per node will require a cloud and you need a data center to house >> them... but 2TB disks are common place nowadays and you can jam multiple >> 2TB disks into each node to reduce the number of machines needed. It all >> depends on what budget you have. >> > > The bit I am trying to understand is whether my figure of 400TB/node in > practice for Cassandra is correct, or whether we can push the GB/node > higher and if so how high > > cheers > > >> -- Y. >> >> >> On Thu, Apr 19, 2012 at 7:54 AM, Franc Carter >> <franc.car...@sirca.org.au>wrote: >> >>> On Thu, Apr 19, 2012 at 9:38 PM, Romain HARDOUIN < >>> romain.hardo...@urssaf.fr> wrote: >>> >>>> >>>> Cassandra supports data compression and depending on your data, you can >>>> gain a reduction in data size up to 4x. >>>> >>> >>> The data is gzip'd already ;-) >>> >>> >>>> 600 TB is a lot, hence requires lots of servers... >>>> >>>> >>>> Franc Carter <franc.car...@sirca.org.au> a écrit sur 19/04/2012 >>>> 13:12:19 : >>>> >>>> > Hi, >>>> > >>>> > One of the projects I am working on is going to need to store about >>>> > 200TB of data - generally in manageable binary chunks. However, >>>> > after doing some rough calculations based on rules of thumb I have >>>> > seen for how much storage should be on each node I'm worried. >>>> > >>>> > 200TB with RF=3 is 600TB = 600,000GB >>>> > Which is 1000 nodes at 600GB per node >>>> > >>>> > I'm hoping I've missed something as 1000 nodes is not viable for us. >>>> > >>>> > cheers >>>> > >>>> > -- >>>> > Franc Carter | Systems architect | Sirca Ltd >>>> > franc.car...@sirca.org.au | www.sirca.org.au >>>> > Tel: +61 2 9236 9118 >>>> > Level 9, 80 Clarence St, Sydney NSW 2000 >>>> > PO Box H58, Australia Square, Sydney NSW 1215 >>> >>> >>> >>> >>> -- >>> >>> *Franc Carter* | Systems architect | Sirca Ltd >>> <marc.zianideferra...@sirca.org.au> >>> >>> franc.car...@sirca.org.au | www.sirca.org.au >>> >>> Tel: +61 2 9236 9118 >>> >>> Level 9, 80 Clarence St, Sydney NSW 2000 >>> >>> PO Box H58, Australia Square, Sydney NSW 1215 >>> >>> >> > > > -- > > *Franc Carter* | Systems architect | Sirca Ltd > <marc.zianideferra...@sirca.org.au> > > franc.car...@sirca.org.au | www.sirca.org.au > > Tel: +61 2 9236 9118 > > Level 9, 80 Clarence St, Sydney NSW 2000 > > PO Box H58, Australia Square, Sydney NSW 1215 > >