The general rule in Cassandra data modeling is to look at all of your
queries first and then to declare a table for each query, even if that
means storing multiple copies of the data. So, create a second table with
bucketed time as the partition key (hour, 15 minutes, or whatever time
interval makes sense to give 1 to 10 megabytes per partition) and time and
device as the clustering keys.

Or, consider DSE SEarch  and then you can do whatever ad hoc queries you
want using Solr. Or Stratio or TupleJump Stargate for an open source Lucene
plugin.

-- Jack Krupansky

On Mon, Nov 9, 2015 at 8:05 AM, Guillaume Charhon <guilla...@databerries.com
> wrote:

> Hello,
>
> We are currently storing geolocation events (about 1 per 5 minutes) for
> each device we track. We currently have 2 TB of data. I would like to store
> the device_id, the timestamp of the event, latitude and longitude. I though
> about using the device_id as the partition key and timestamp as the
> clustering column. It is great as events are naturally grouped by device
> (very useful for our Spark jobs). However, if I would like to retrieve all
> events of all devices of the last week I understood that Cassandra will
> need to load all data and filter which does not seems to be clean on the
> long term.
>
> How should I create my model?
>
> Best Regards
>

Reply via email to