Re: Challenge with initial data load with TWCS

DuyHai Doan Sun, 29 Sep 2019 00:42:41 -0700

Thanks Jeff for sharing the ideas. I have some question though:

- CQLSSTableWriter and explicitly break between windows --> Even if
you break between windows, If we have data worth of 1 years it would
requires us to use CQLSSTableWriter during 1 year (365 days) because
the write time taken into account when flushing to SSTable is the
current clock timestamp :
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java#L252-L259


What we're looking for is a way to load 1 year of data and forcing
write timestamp to the past so that the initial loading operation is
seen by TWCS as if we have loaded the data normally day by day during
1 year

- Use the normal write path for a single window at a time, explicitly
calling flush between windows. --> I don't understand how calling
flush will trigger windowing in TWCS, as far as I know, it is based on
write time. And by the way, can we load data using normal CQL and just
forcing the write time to be in the past so that TWCS will trigger
compaction properly ?

Regards

On Sun, Sep 29, 2019 at 3:51 AM Jeff Jirsa <jji...@gmail.com> wrote:
>
>
>
> We used to do either:
>
> - CQLSSTableWriter and explicitly break between windows (then nodetool 
> refresh or sstableloader to push them into the system), or
>
> - Use the normal write path for a single window at a time, explicitly calling 
> flush between windows. You can’t have current data writing while you do your 
> historical load using this method
>
>
>
> > On Sep 28, 2019, at 1:31 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
> >
> > Hello users
> >
> > TWCS works great for permanent state. It creates SSTables of roughly
> > fixed size if your insertion rate is pretty constant.
> >
> > Now the big deal is about the initial load.
> >
> > Let's say we configure a TWCS with window unit = day and window size =
> > 1, we would have 1 SSTable per day and with TTL = 365 days all data
> > would expire after 1 year
> >
> > Now, since the cluster is still empty we need to load data worth of 1
> > year. If we use TWCS and if the loading takes 7 days, we would have 7
> > SSTables, each of them aggregating 365/7 worth of annual data. Ideally
> > we would like TWCS to split these data into 365 distinct SSTables
> >
> > So my question is: how to manage this scenario ? How to perform an
> > initial load for a table using TWCS and make the compaction split
> > nicely the data base on source data timestamp and not insertion
> > timestamp ?
> >
> > Regards
> >
> > Duy Hai DOAN
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Challenge with initial data load with TWCS

Reply via email to