Re: Challenge with initial data load with TWCS

Jeff Jirsa Sat, 28 Sep 2019 18:52:14 -0700

We used to do either:

- CQLSSTableWriter and explicitly break between windows (then nodetool refresh 
or sstableloader to push them into the system), or

- Use the normal write path for a single window at a time, explicitly calling 
flush between windows. You can’t have current data writing while you do your 
historical load using this method



> On Sep 28, 2019, at 1:31 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Hello users
> 
> TWCS works great for permanent state. It creates SSTables of roughly
> fixed size if your insertion rate is pretty constant.
> 
> Now the big deal is about the initial load.
> 
> Let's say we configure a TWCS with window unit = day and window size =
> 1, we would have 1 SSTable per day and with TTL = 365 days all data
> would expire after 1 year
> 
> Now, since the cluster is still empty we need to load data worth of 1
> year. If we use TWCS and if the loading takes 7 days, we would have 7
> SSTables, each of them aggregating 365/7 worth of annual data. Ideally
> we would like TWCS to split these data into 365 distinct SSTables
> 
> So my question is: how to manage this scenario ? How to perform an
> initial load for a table using TWCS and make the compaction split
> nicely the data base on source data timestamp and not insertion
> timestamp ?
> 
> Regards
> 
> Duy Hai DOAN
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Challenge with initial data load with TWCS

Reply via email to