Sorry Gerard, I'm afraid I'm not familiar with that project. The time series I've described is a relatively minor component of an application which is already powered by Cassandra, so you can see why I'd prefer a viable way (which I'm quickly learning may not exist) to modelit in Cassandra.
On Fri, Mar 25, 2016 at 2:04 PM, Gerard Maas <gerard.m...@gmail.com> wrote: > Hi, > > It sounds to me like Apache Kafka would be a better fit for your > requirements. Have you considered that option? > > kr, Gerard > Datastax MVP for Apache Cassandra (so, I'm not suggesting other tech for > any other reason that seeing it as a better fit) > > On Fri, Mar 25, 2016 at 1:31 PM, K. Lawson <klawso...@gmail.com> wrote: > >> While adhering to best practices, I am trying to model a time series in >> Cassandra that is compliant with the following access pattern directives: >> >> - Is to be both read and shrank by a single party, grown by multiple >> parties >> - Is to be read as a queue (in other words, its entries, from first to >> last, are to be paged through in order) >> - Is to grown as a queue (in other words, new entries (the number of >> which is expected to fall in the range of 0 to a couple of hundred per day) >> are always APPENDED to the series) >> - Is to be shrunk by way of the removal of any entries which have been >> processed by the application (immediately upon completion of said >> processing) >> >> So far, I've come up with four solutions, listed below (along with their >> pros and cons), that are compliant with >> the directives given above; is there any solution superior to these, and >> if not, which one of these is most optimal? >> >> >> >> Solution #1: >> >> >> //Processing position markers (saved somewhere on disk) >> mostRecentProcessedItemInsertTime = 0 >> mostRecentProcessedItemInsertDayStartTime = 0 >> >> CREATE TABLE IF NOT EXISTS solution_table_1 >> ( >> itemInsertDayStartTime timestamp >> itemInsertTime timestamp >> itemId timeuuid >> PRIMARY KEY (itemInsertDayStartTime, itemInsertTime, itemId) >> ); >> //Initial row retrieval query (presumably, the position markers will be >> appropriately updated after each retrieval) >> >> SELECT * >> >> FROM solution_table_1 >> >> WHERE itemInsertDayStartTime IN >> (mostRecentProcessedItemInsertDayStartTime, >> mostRecentProcessedItemInsertDayStartTime + 86400000, ...) >> >> AND itemInsertTime > mostRecentProcessedItemInsertTime >> >> LIMIT 30 >> >> Pros: >> - Shards table data across the cluster >> >> Cons: >> - Requires the maintenance of position markers >> - Requires the explicit specification of partitions (which may or may not >> have data) to target for retrievals which page the table data by >> itemInsertTime >> - Requires correspondence with multiple nodes to satisfy retrievals which >> page the table data by itemInsertTime >> >> >> Solution #2: >> >> >> CREATE TABLE IF NOT EXISTS solution_table_2 >> ( >> itemInsertTime timestamp >> itemId timeuuid >> PRIMARY KEY (itemInserTime, itemId) >> ); >> CREATE INDEX IF NOT EXISTS ON solution_table_2 (itemInsertTime); >> >> //Initial row retrieval query >> SELECT * FROM solution_table_2 WHERE itemInsertTime > 0 LIMIT 30 ALLOW >> FILTERING >> >> Pros: >> - Shards table data across the cluster >> - Enables retrievals which page table data by itemInsertTime to be >> conducted without explicitly specifying partitions to target >> >> Cons: >> - Specifies the creation of an index on a high-cardinality column >> - Requires correspondence with multiple nodes, as well as data filtering, >> to satisfy retrievals which page the table data by itemInsertTime >> Solution #3: >> >> CREATE TABLE IF NOT EXISTS solution_table_3 >> ( >> itemInsertTime timestamp >> itemId timeuuid >> itemInsertDayStartTime timestamp >> PRIMARY KEY (itemInsertTime, itemId) >> ); >> CREATE INDEX IF NOT EXISTS ON solution_table_3 (itemInsertDayStartTime); >> //Initial row retrieval query >> SELECT * FROM solution_table_3 WHERE itemInsertDayStartTime > 0 LIMIT 30 >> ALLOW FILTERING >> >> Pros: >> - Shards table data across the cluster >> - Enables retrievals which page table data by itemInsertTime to be >> conducted without explicitly specifying partitions to target >> - Specifies the creation of an index on a column with anticipatively >> suitable cardinality >> >> Cons: >> - Requires correspondence with multiple nodes, as well as data filtering, >> to satisfy retrievals which page the table data by itemInsertTime >> Solution #4: >> >> CREATE TABLE IF NOT EXISTS solution_table_4 >> ( >> dummyPartitionInt int >> itemInsertTime timestamp >> itemId timeuuid >> PRIMARY KEY (dummyPartitionInt, itemInsertTime, itemId) >> ); >> //Initial row retrieval query (assuming all rows are inserted with a >> dummyPartitionInt value of 0) >> SELECT * FROM solution_table_4 WHERE dummyPartitionInt = 0 AND >> itemInsertTime > 0 LIMIT 30 >> >> >> Pros: >> - Enables retrieval to be satisfied with a single replica set >> - Enables retrievals which page table data by itemInsertTime to be >> conducted without explicitly specifying more than one partition to target >> >> Cons: >> - Requires the use of a "dummy" column >> - Specifies the constriction of table data (and as a result, all >> operations on it) to a single partition >> > >