Thanks Jeff for sharing the ideas. I have some question though: - CQLSSTableWriter and explicitly break between windows --> Even if you break between windows, If we have data worth of 1 years it would requires us to use CQLSSTableWriter during 1 year (365 days) because the write time taken into account when flushing to SSTable is the current clock timestamp : https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java#L252-L259
What we're looking for is a way to load 1 year of data and forcing write timestamp to the past so that the initial loading operation is seen by TWCS as if we have loaded the data normally day by day during 1 year - Use the normal write path for a single window at a time, explicitly calling flush between windows. --> I don't understand how calling flush will trigger windowing in TWCS, as far as I know, it is based on write time. And by the way, can we load data using normal CQL and just forcing the write time to be in the past so that TWCS will trigger compaction properly ? Regards On Sun, Sep 29, 2019 at 3:51 AM Jeff Jirsa <jji...@gmail.com> wrote: > > > > We used to do either: > > - CQLSSTableWriter and explicitly break between windows (then nodetool > refresh or sstableloader to push them into the system), or > > - Use the normal write path for a single window at a time, explicitly calling > flush between windows. You can’t have current data writing while you do your > historical load using this method > > > > > On Sep 28, 2019, at 1:31 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > > > > Hello users > > > > TWCS works great for permanent state. It creates SSTables of roughly > > fixed size if your insertion rate is pretty constant. > > > > Now the big deal is about the initial load. > > > > Let's say we configure a TWCS with window unit = day and window size = > > 1, we would have 1 SSTable per day and with TTL = 365 days all data > > would expire after 1 year > > > > Now, since the cluster is still empty we need to load data worth of 1 > > year. If we use TWCS and if the loading takes 7 days, we would have 7 > > SSTables, each of them aggregating 365/7 worth of annual data. Ideally > > we would like TWCS to split these data into 365 distinct SSTables > > > > So my question is: how to manage this scenario ? How to perform an > > initial load for a table using TWCS and make the compaction split > > nicely the data base on source data timestamp and not insertion > > timestamp ? > > > > Regards > > > > Duy Hai DOAN > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org