Hi all, i'm unsure if cassandra is appropriate for my use case:
Maintain a query model. Collect data from several sources (asynchronously) and merge it into aggregates (rows) in one cassandra table. The data is mostly updated, except from initial load or adding new data ranges. Some source delivers complete new data each day, other only deltas. The aggregated data is a set of some flat and list columns, lists will be updated at once (like a column). Also the updates are in parallel / asynchronously on the same rows, spreaded over a day. The are no deletes on rows or columns. The table will carry around 100 million rows. Analysis job A job runs asynchronously one or more times a day to scan the "query model" table with few criterias and reads ranges of complete rows to generate a kind of analysis output. The output, also a several rows of aggregates are inserted into a second table, old data will never be update but deleted after some weeks. My hope was, to use aggregates and in place updates, locking free and fast writes, easy upserts and transparent scaling. But i have concerns because of the update scenario / high churn rate. Is this more an anti-pattern for cassandra? Or would it better to have multiple query models because of the updates but with the need to read from multiple tables (instead of one query model to with all data in one row)? Thank you, Ciao, Matthias