Hi, We have sensor input that creates very wide rows and operations on these rows have started to timeout regulary. We have been trying to find a solution to dividing wide rows but keep hitting limitations that move the problem around instead of solving it. We have a partition key consisting of a sensorUnitId and a sensorId and use a time field to access each column in the row. We tried adding a time based entry, timeShardId, to the partition key that consists of the year and week of year during which the reading was taken. This works for a number of queries but for scanning all the readings against a particular sensorUnitId and sensorId combination, we seem to be stuck. We won't know the range of valid values of the timeShardId for a given sensorUnitId and sensorId combination so would have to write to an additional table to track the valid timeShardId. We suspect this would create tombstone accumulation problems given the number of updates required to the same row so haven't tried this option.
Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in trying to directly access the partition keys. Since SELECT DISTINCT does not allow for a where clause to filter on the partition key values, we have to filter several hundred thousand partition keys just to find those related to the relevant sensorUnitId and sensorId. This problem will only grow worse for us. Are there any other approaches that can be suggested? We have been looking around, but haven't found any references beyond the initial suggestion to add some sort of shard id to the partition key to handle wide rows. Thanks, Jason