Interesting approach Oded. Is this something similar that has been described here: http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
Regards, Shahab On Sun, Apr 26, 2015 at 4:29 AM, Peer, Oded <oded.p...@rsa.com> wrote: > I would maintain two tables. > > An “archive” table that holds all the active and inactive records, and is > updated hourly (re-inserting the same record has some compaction overhead > but on the other side deleting records has tombstones overhead). > > An “active” table which holds all the records in the last external API > invocation. > > To avoid tombstones and read-before-delete issues “active” should actually > a synonym, an alias, to the most recent active table. > > I suggest you create two identical tables, “active1” and “active2”, and an > “active_alias” table that informs which of the two is the most recent. > > Thus when you query the external API you insert the data to “archive” and > to the unaliased “activeN” table, switch the alias value in “active_alias” > and truncate the new unaliased “activeM” table. > > No need to query the data before inserting it. Make sure truncating > doesn’t create automatic snapshots. > > > > > > *From:* Narendra Sharma [mailto:narendra.sha...@gmail.com] > *Sent:* Friday, April 24, 2015 6:53 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Data model suggestions > > > > I think one table say record should be good. The primary key is record id. > This will ensure good distribution. > Just update the active attribute to true or false. > For range query on active vs archive records maintain 2 indexes or try > secondary index. > > On Apr 23, 2015 1:32 PM, "Ali Akhtar" <ali.rac...@gmail.com> wrote: > > Good point about the range selects. I think they can be made to work with > limits, though. Or, since the active records will never usually be > 500k, > the ids may just be cached in memory. > > > > Most of the time, during reads, the queries will just consist of select * > where primaryKey = someValue . One row at a time. > > > > The question is just, whether to keep all records in one table (including > archived records which wont be queried 99% of the time), or to keep active > records in their own table, and delete them when they're no longer active. > Will that produce tombstone issues? > > > > On Fri, Apr 24, 2015 at 12:56 AM, Manoj Khangaonkar <khangaon...@gmail.com> > wrote: > > Hi, > > If your external API returns active records, that means I am guessing you > need to do a select * on the active table to figure out which records in > the table are no longer active. > > You might be aware that range selects based on partition key will timeout > in cassandra. They can however be made to work using the column cluster > key. > > To comment more, We would need to see your proposed cassandra tables and > queries that you might need to run. > > regards > > > > > > > > On Thu, Apr 23, 2015 at 9:45 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > That's returned by the external API we're querying. We query them for > active records, if a previous active record isn't included in the results, > that means its time to archive that record. > > > > On Thu, Apr 23, 2015 at 9:20 PM, Manoj Khangaonkar <khangaon...@gmail.com> > wrote: > > Hi, > > How do you determine if the record is no longer active ? Is it a perioidic > process that goes through every record and checks when the last update > happened ? > > regards > > > > On Thu, Apr 23, 2015 at 8:09 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > Hey all, > > > > We are working on moving a mysql based application to Cassandra. > > > > The workflow in mysql is this: We have two tables: active and archive . > Every hour, we pull in data from an external API. The records which are > active, are kept in 'active' table. Once a record is no longer active, its > deleted from 'active' and re-inserted into 'archive' > > > > The purpose for that, is because most of the time, queries are only done > against the active records rather than archived. Therefore keeping the > active table small may help with faster queries, if it only has to search > 200k records vs 3 million or more. > > > > Is it advisable to keep the same data model in Cassandra? I'm concerned > about tombstone issues when records are deleted from active. > > > > Thanks. > > > > -- > > http://khangaonkar.blogspot.com/ > > > > > > -- > > http://khangaonkar.blogspot.com/ > > >