Hi, Before looking for a sizing, have you try looking for application side compression before inserting you data (this paper is really interresting https://aaltodoc.aalto.fi/bitstream/handle/123456789/29099/master_Burman_ Michael_2017.pdf?sequence=1 ) ? For timeseries use case this is a major storage cost saving !
IMHO active data are the one that might be involved in a compaction, so it might be every data in your table if you are using STCS/LCS, but TWCS will stop looking at old sstable once their time window is considered close ("An SSTable from a bucket can never be compacted with an SSTable from another bucket" => http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html On 26 February 2018 at 09:58, onmstester onmstester <onmstes...@zoho.com> wrote: > Another Question on node density, in this scenario: > 1. we should keep time series data of some years for a heavy write system > in Cassandra (> 10K Ops in seconds) > 2. the system is insert only and inserted data would never be updated > 3. in partition key, we used number of months since 1970, so data for > every month would be on separate partitions > 4. because of rule 2, after the end of month previous partitions would > never be accessed for write requests > 5. more than 90% of read requests would concern current month partitions, > so we merely access Old data, we should just keep them for that 10% of > reports! > 6. The overall read in comparison to writes are so small (like 0.0001 % of > overall time) > > So, finally the question: > Even in this scenario would the active data be the whole data (this month > + all previous months)? or the one which would be accessed for most reads > and writes (only the past two months)? > Could i use more than 3TB per node for this scenario? > something like: > CPU: 5 Core > RAM: 32 GB > Disk: 5 TB > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > >