Hi,

Before looking for a sizing, have you try looking for application side
compression before inserting you data (this paper is really interresting
https://aaltodoc.aalto.fi/bitstream/handle/123456789/29099/master_Burman_
Michael_2017.pdf?sequence=1 ) ? For timeseries use case this is a major
storage cost saving !

IMHO active data are the one that might be involved in a compaction, so it
might be every data in your table if you are using STCS/LCS, but TWCS will
stop looking at old sstable once their time window is considered close ("An
SSTable from a bucket can never be compacted with an SSTable from another
bucket" => http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html




On 26 February 2018 at 09:58, onmstester onmstester <onmstes...@zoho.com>
wrote:

> Another Question on node density, in this scenario:
> 1. we should keep time series data of some years for a heavy write system
> in Cassandra (> 10K Ops in seconds)
> 2. the system is insert only and inserted data would never be updated
> 3. in partition key, we used number of months since 1970, so data for
> every month would be on separate partitions
> 4. because of rule 2, after the end of month previous partitions would
> never be accessed for write requests
> 5. more than 90% of read requests would concern current month partitions,
> so we merely access Old data, we should just keep them for that 10% of
> reports!
> 6. The overall read in comparison to writes are so small (like 0.0001 % of
> overall time)
>
> So, finally the question:
> Even in this scenario would the active data be the whole data (this month
> + all previous months)? or the one which would be accessed for most reads
> and writes (only the past two months)?
> Could i use more than 3TB  per node for this scenario?
> something like:
> CPU: 5 Core
> RAM: 32 GB
> Disk: 5 TB
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>

Reply via email to