Oops sorry, you wrote below that the shard is what I was suggesting. I didn't fully understand the problem you had. I'll think about it a little bit and come up w/ something.
On Thu, Mar 10, 2016 at 9:47 AM Jonathan Haddad <j...@jonhaddad.com> wrote: > My advice was to use the date that the reading was recorded as part of the > Partition key instead of some arbitrary shard id. Then you don't have to > look anything up in a different table. > > > > create table sensorReadings ( > sensorUnitId int, > sensorId int, > date_recorded date, > time timestamp, > timeShard int, > readings blob, > primary key((sensorUnitId, sensorId, date_recorded), time); > > > On Thu, Mar 10, 2016 at 9:29 AM Jason Kania <jason.ka...@ymail.com> wrote: > >> Hi Jonathan, >> >> Thanks for the response. To make this clearer, this is what we are doing: >> >> create table sensorReadings ( >> sensorUnitId int, >> sensorId int, >> time timestamp, >> timeShard int, >> readings blob, >> primary key((sensorUnitId, sensorId, timeShard), time); >> >> where timeShard is a combination of year and week of year >> >> This works exactly as you mentioned when we know what time range we are >> querying. >> >> The problem is that for those cases where we want to run through all the >> readings for all timestamps, we don't know the first and last timeShard >> value to use to constrain the query or iterate over each shard. Our >> understanding is that updating another table with the maximum or minimum >> timeShard values on every write to the above table would mean pounding a >> single row with updates and running SELECT DISTINCT pulls all partition >> keys. >> >> Hopefully this is clearer. >> >> Again, any suggestions would be appreciated. >> >> Thanks, >> >> Jason >> >> ------------------------------ >> *From:* Jonathan Haddad <j...@jonhaddad.com> >> *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> >> *Sent:* Thursday, March 10, 2016 11:21 AM >> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the >> partition key >> >> Have you considered making the date (or week, or whatever, some time >> component) part of your partition key? >> >> something like: >> >> create table sensordata ( >> sensor_id int, >> day date, >> ts datetime, >> reading int, >> primary key((sensor_id, day), ts); >> >> Then if you know you need data by a particular date range, just issue >> multiple async queries for each day you need. >> >> On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <jason.ka...@ymail.com> >> wrote: >> >> Hi, >> >> We have sensor input that creates very wide rows and operations on these >> rows have started to timeout regulary. We have been trying to find a >> solution to dividing wide rows but keep hitting limitations that move the >> problem around instead of solving it. >> >> We have a partition key consisting of a sensorUnitId and a sensorId and >> use a time field to access each column in the row. We tried adding a time >> based entry, timeShardId, to the partition key that consists of the year >> and week of year during which the reading was taken. This works for a >> number of queries but for scanning all the readings against a particular >> sensorUnitId and sensorId combination, we seem to be stuck. >> >> We won't know the range of valid values of the timeShardId for a given >> sensorUnitId and sensorId combination so would have to write to an >> additional table to track the valid timeShardId. We suspect this would >> create tombstone accumulation problems given the number of updates required >> to the same row so haven't tried this option. >> >> Alternatively, we hit a different bottleneck in the form of SELECT >> DISTINCT in trying to directly access the partition keys. Since SELECT >> DISTINCT does not allow for a where clause to filter on the partition key >> values, we have to filter several hundred thousand partition keys just to >> find those related to the relevant sensorUnitId and sensorId. This problem >> will only grow worse for us. >> >> Are there any other approaches that can be suggested? We have been >> looking around, but haven't found any references beyond the initial >> suggestion to add some sort of shard id to the partition key to handle wide >> rows. >> >> Thanks, >> >> Jason >> >> >> >>