Hi all,

i'm unsure if cassandra is appropriate for my use case:

Maintain a query model.

Collect data from several sources (asynchronously) and merge it into
aggregates (rows) in one cassandra table.
The data is mostly updated, except from initial load or adding new data
ranges.
Some source delivers complete new data each day, other only deltas.
The aggregated data is a set of some flat and list columns, lists will be
updated at once (like a column).
Also the updates are in parallel / asynchronously on the same rows,
spreaded over a day.
The are no deletes on rows or columns.
The table will carry around 100 million rows.

Analysis job

A job runs asynchronously one or more times a day to scan the "query model"
table with few criterias and
reads ranges of complete rows to generate a kind of analysis output.
The output, also a several rows of aggregates are inserted into a second
table,
old data will never be update but deleted after some weeks.


My hope was, to use aggregates and in place updates, locking free and fast
writes, easy upserts and transparent scaling.
But i have concerns because of the update scenario / high churn rate.
Is this more an anti-pattern for cassandra?
Or would it better to have multiple query models because of the updates but
with the need to read from multiple tables (instead of one query model to
with all data in one row)?

Thank you,

Ciao,
Matthias

Reply via email to