Adding to what Benjamin said.. It is hard to estimate disk space if you are using STCS for a table where rows are updated frequently leading to lot of fragmentation. STCS may also lead to scenarios where tombstones are not evicted for long times. You may go live and everything goes well for months. Then gradually you realize that large sstables are holding on to tombstones as they are not getting compacted. It is not easy to test disk space requirements with precision upfront unless you test your system with data patterns for some time. Your life can be easy much easier if you take care of following points with STCS: 1. If you can afford some extra IO, go for slightly aggressive STCS strategy using one or more of following settings: min_threshold=2, bucket_high=2,unchecked_tombstone_compactions=true. Which one of these to use depends on your use case.Study these settings. 2. Estimate free disk required for compactions at any point of time. For example, suppose you have 5 tables with 3 TB data in total and you estimate that data distribution will be as follows:A: 800 gb B:700gb C:600gb D:500gb E:400gb If you have concurrent_compactors=3 and 90% data of your largest tables are getting compacted simultaneously, you will need 90/100*(800+700+600)gb =1.9 TB free disk space. So you wont need 6 TB disk for 3 TB data. Only 4.9 TB would do. 3. Take 10-15% buffer for future schema changes and calculation errors. Better safe than sorry :)
Thanks Anuj On Thu, 26 Jan, 2017 at 2:41 PM, Benjamin Roth<benjamin.r...@jaumo.com> wrote: Hi! This is basically right, but:1. How do you know the 3TB storage will be 3TB on cassandra? This depends how the data is serialized, compressed and how often it changes and it depends on your compaction settings2. 50% free space on STCS is only required if you do a full compaction of a single CF that takes all the space. Normally you need as much free space as the target SSTable of a compaction will take. If you split your data across more CFs, its unlikely you really hit this value. .. probably you should do some tests. But in the end it is always good to have some headroom. I personally would scale out if free space is < 30% but that always depends on your model. 2017-01-26 9:56 GMT+01:00 Raphael Vogel <raphael.vo...@web.de>: HiJust want to validate my estimation for a C* cluster which should have around 3 TB of usable storage.Assuming a RF of 3 and SizeTiered Compaction Strategy.Is it correct, that SizeTiered Compaction Strategy needs (in the worst case) 50% free disc space during compaction? So this would then result in a cluster of 3TB x 3 x 2 == 18 TB of raw storage? Thanks and RegardsRaphael Vogel -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer