Thanks, that level of query detail gives us a better picture to focus on. I think through this some more over the weekend.
Also, these queries focus on raw, bulk retrieval of sensor data readings, but do you have reading-based queries, such as range of an actual sensor reading? -- Jack Krupansky On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <jason.ka...@ymail.com> wrote: > The 5000 readings mentioned would be against a single sensor on a single > sensor unit. > > The scope of the queries on this table is intended to be fairly simple. > Here are some example queries, without 'sharding', that we would perform on > this table: > > SELECT "time","readings" FROM "sensorReadings" > WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? > ORDER BY time DESC LIMIT 5000 > > SELECT "time","readings" FROM "sensorReadings" > WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? > ORDER BY time LIMIT 5000 > > SELECT "time","readings" FROM "sensorReadings" > WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND > classification=? > ORDER BY time DESC LIMIT 5000 > > where 'classification' is secondary index that we expect to add. > > In some cases, we have to revisit all values too so a complete table scan > is needed: > > SELECT "time","readings" FROM "sensorReadings" > > Getting the "next" and "previous" 5000 readings is also something we do, > but is manageable from our standpoint as we can look at the range-end > timestamps that are returned and use those in the subsequent queries. > > SELECT "time","readings" FROM "sensorReadings" > WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=? > ORDER BY time LIMIT 5000 > > Splitting the bulk content out of the main table is something we > considered too but we didn't find any detail on whether that would solve > our timeout problem. If there is a reference for using this approach, it > would be of interest to us to avoid any assumptions on how we would > approach it. > > A question: Is the probability of a timeout directly linked to a longer > seek time in reading through a partition's contents? If that is the case, > splitting the partition keys into a separate table would be straightforward. > > Regards, > > Jason > > ------------------------------ > *From:* Jack Krupansky <jack.krupan...@gmail.com> > *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> > *Sent:* Friday, March 11, 2016 6:22 PM > > *Subject:* Re: Strategy for dividing wide rows beyond just adding to the > partition key > > Thanks for the additional information, but there is still not enough color > on the queries and too much focus on a premature data model. > > Is this 5000 readings for a single sensor of a single sensor unit, or for > all sensors of a specified unit, or... both? > > I presume you want "next" and "previous" 5000 readings as well as first > and last, but... you will have to confirm that. > > One technique is to store the bulk of your raw sensor data in a separate > table and then simply store the PK of that data in your time series. That > way you can have a much wider row of time series (number of rows) without > hitting a bulk size issue for the partition. But... I don't want to jump to > solutions until we have a firmer handle on the query side of the fence. > > -- Jack Krupansky > > On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com> > wrote: > > Jack, > > Thanks for the response. > > We are targeting our database design to 10000 sensor units and each sensor > unit has 32 sensors. We are seeing about 700 events per day per sensor, > each providing about 2K of data. Based on keeping each partition to about > 10 Mb (based on readings we saw on performance), we chose to break our > partitions on a weekly basis. This is possibly finer than we need as we > were seeing timeouts only once a single partition was about 150Mb in size > > When pulling in data, we will typically need to pull 1 to 4 months of data > for our analysis and will use only the sensorUnitId and sensorId to > uniquely identify the data source with the timeShard value used to break up > our partitions. We have handling to sequentially scan based on our > "timeShard" value, but don't have a good handle on the determination of the > "timeShard" portion of the partition key at read time. The data starts > coming in when a subscriber starts using our system and finishes when they > discontinue service or put the service on hold temporarily. > > When I talk about hotspots, it isn't the time series data that is the > concern, it is with respect to storing the maximum and minimum timeShard > values in another table for subsequent lookup or the cost of running the > current implementation of SELECT DISTINCT. We need to run queries such as > getting the first or last 5000 sensor readings when we don't know the time > frame at which they occurred so cannot directly supply the timeShard > portion of our partition key. > > I appreciate your input, > > Thanks, > > Jason > > ------------------------------ > *From:* Jack Krupansky <jack.krupan...@gmail.com> > *To:* "user@cassandra.apache.org" <user@cassandra.apache.org> > *Sent:* Friday, March 11, 2016 4:45 PM > > *Subject:* Re: Strategy for dividing wide rows beyond just adding to the > partition key > > I'll stay away from advising on a specific schema per se, but I'll stick > to the advice that you need to make sure that your queries are depending > solely on the columns of the primary key or relatively short slices/scans, > rather than run the risk of very long scans or having to process multiple > partitions for a single query. That's canned to some extent, but still > essential. > > Of course we generally wish to avoid hotspots, but with time series they > are unavoidable. I mean, sure you could place successive events at separate > partitions, but then you can't do any kind of scanning/slicing. > > But, events for separate sensors are not true hotspots in the traditional > sense - unless you have only a single sensor/unit. > > After considering your queries, the next step is to consider the > cardinality of your data - how many sensors, how many units, rate of > events, etc. That will feedback into queries as well, such as how big a > slice or scan might be, as well as sizing of partitions. > > So, how many sensor units do you expect, how many sensors per unit, and > expected rate of events per sensor? > > Try not to jump too quickly to specific solutions - there really is a > method to understanding all of this other stuff upfront. > > -- Jack Krupansky > > On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <jason.ka...@ymail.com> > wrote: > > Jack, > > Thanks for the response. I don't think I provided enough information and > used the wrong terminology as your response is more the canned advice is > response to Cassandra antipatterns. > > To make this clearer, this is what we are doing: > > create table sensorReadings ( > sensorUnitId int, > sensorId int, > time timestamp, > timeShard int, > readings blob, > primary key((sensorUnitId, sensorId, timeShard), time); > > where timeShard is a combination of year and week of year > > For known time range based queries, this works great. However, the > specific problem is in knowing the maximum and minimum timeShard values > when we want to select the entire range of data. Our understanding is that > if we update another related table with the maximum and minimum timeShard > value for a given sensorUnitId and sensorId combination, we will create a > hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list > of partition keys for the table because we cannot reduce the scope with a > where clause. > > If there is a recommended pattern that solves this, we haven't come across > it. > > I hope makes the problem clearer. > > Thanks, > > Jason > > ------------------------------ > *From:* Jack Krupansky <jack.krupan...@gmail.com> > *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> > *Sent:* Thursday, March 10, 2016 10:42 AM > *Subject:* Re: Strategy for dividing wide rows beyond just adding to the > partition key > > There is an effort underway to support wider rows: > https://issues.apache.org/jira/browse/CASSANDRA-9754 > > This won't help you now though. Even with that improvement you still may > need a more optimal data model since large-scale scanning/filtering is > always a very bad idea with Cassandra. > > The data modeling methodology for Cassandra dictates that queries drive > the data model and that each form of query requires a separate table > ("query table.") Materialized view can automate that process for a lot of > cases, but in any case it does sound as if some of your queries do require > additional tables. > > As a general proposition, Cassandra should not be used for heavy filtering > - query tables with the filtering criteria baked into the PK is the way to > go. > > > -- Jack Krupansky > > On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com> > wrote: > > Hi, > > We have sensor input that creates very wide rows and operations on these > rows have started to timeout regulary. We have been trying to find a > solution to dividing wide rows but keep hitting limitations that move the > problem around instead of solving it. > > We have a partition key consisting of a sensorUnitId and a sensorId and > use a time field to access each column in the row. We tried adding a time > based entry, timeShardId, to the partition key that consists of the year > and week of year during which the reading was taken. This works for a > number of queries but for scanning all the readings against a particular > sensorUnitId and sensorId combination, we seem to be stuck. > > We won't know the range of valid values of the timeShardId for a given > sensorUnitId and sensorId combination so would have to write to an > additional table to track the valid timeShardId. We suspect this would > create tombstone accumulation problems given the number of updates required > to the same row so haven't tried this option. > > Alternatively, we hit a different bottleneck in the form of SELECT > DISTINCT in trying to directly access the partition keys. Since SELECT > DISTINCT does not allow for a where clause to filter on the partition key > values, we have to filter several hundred thousand partition keys just to > find those related to the relevant sensorUnitId and sensorId. This problem > will only grow worse for us. > > Are there any other approaches that can be suggested? We have been looking > around, but haven't found any references beyond the initial suggestion to > add some sort of shard id to the partition key to handle wide rows. > > Thanks, > > Jason > > > > > > > > > > >