The general rule in Cassandra data modeling is to look at all of your queries first and then to declare a table for each query, even if that means storing multiple copies of the data. So, create a second table with bucketed time as the partition key (hour, 15 minutes, or whatever time interval makes sense to give 1 to 10 megabytes per partition) and time and device as the clustering keys.
Or, consider DSE SEarch and then you can do whatever ad hoc queries you want using Solr. Or Stratio or TupleJump Stargate for an open source Lucene plugin. -- Jack Krupansky On Mon, Nov 9, 2015 at 8:05 AM, Guillaume Charhon <guilla...@databerries.com > wrote: > Hello, > > We are currently storing geolocation events (about 1 per 5 minutes) for > each device we track. We currently have 2 TB of data. I would like to store > the device_id, the timestamp of the event, latitude and longitude. I though > about using the device_id as the partition key and timestamp as the > clustering column. It is great as events are naturally grouped by device > (very useful for our Spark jobs). However, if I would like to retrieve all > events of all devices of the last week I understood that Cassandra will > need to load all data and filter which does not seems to be clean on the > long term. > > How should I create my model? > > Best Regards >