Re: Strategy for dividing wide rows beyond just adding to the partition key

Jack Krupansky Fri, 11 Mar 2016 16:26:17 -0800

Thanks, that level of query detail gives us a better picture to focus on. I
think through this some more over the weekend.


Also, these queries focus on raw, bulk retrieval of sensor data readings,
but do you have reading-based queries, such as range of an actual sensor
reading?

-- Jack Krupansky

On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <jason.ka...@ymail.com> wrote:

> The 5000 readings mentioned would be against a single sensor on a single
> sensor unit.
>
> The scope of the queries on this table is intended to be fairly simple.
> Here are some example queries, without 'sharding', that we would perform on
> this table:
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?
> ORDER BY time DESC LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?
> ORDER BY time LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND
> classification=?
> ORDER BY time DESC LIMIT 5000
>
> where 'classification' is secondary index that we expect to add.
>
> In some cases, we have to revisit all values too so a complete table scan
> is needed:
>
> SELECT "time","readings" FROM "sensorReadings"
>
> Getting the "next" and "previous" 5000 readings is also something we do,
> but is manageable from our standpoint as we can look at the range-end
> timestamps that are returned and use those in the subsequent queries.
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?
> ORDER BY time LIMIT 5000
>
> Splitting the bulk content out of the main table is something we
> considered too but we didn't find any detail on whether that would solve
> our timeout problem. If there is a reference for using this approach, it
> would be of interest to us to avoid any assumptions on how we would
> approach it.
>
> A question: Is the probability of a timeout directly linked to a longer
> seek time in reading through a partition's contents? If that is the case,
> splitting the partition keys into a separate table would be straightforward.
>
> Regards,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <jack.krupan...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com>
> *Sent:* Friday, March 11, 2016 6:22 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Thanks for the additional information, but there is still not enough color
> on the queries and too much focus on a premature data model.
>
> Is this 5000 readings for a single sensor of a single sensor unit, or for
> all sensors of a specified unit, or... both?
>
> I presume you want "next" and "previous" 5000 readings as well as first
> and last, but... you will have to confirm that.
>
> One technique is to store the bulk of your raw sensor data in a separate
> table and then simply store the PK of that data in your time series. That
> way you can have a much wider row of time series (number of rows) without
> hitting a bulk size issue for the partition. But... I don't want to jump to
> solutions until we have a firmer handle on the query side of the fence.
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response.
>
> We are targeting our database design to 10000 sensor units and each sensor
> unit has 32 sensors. We are seeing about 700 events per day per sensor,
> each providing about 2K of data. Based on keeping each partition to about
> 10 Mb (based on readings we saw on performance), we chose to break our
> partitions on a weekly basis. This is possibly finer than we need as we
> were seeing timeouts only once a single partition was about 150Mb in size
>
> When pulling in data, we will typically need to pull 1 to 4 months of data
> for our analysis and will use only the sensorUnitId and sensorId to
> uniquely identify the data source with the timeShard value used to break up
> our partitions. We have handling to sequentially scan based on our
> "timeShard" value, but don't have a good handle on the determination of the
> "timeShard" portion of the partition key at read time. The data starts
> coming in when a subscriber starts using our system and finishes when they
> discontinue service or put the service on hold temporarily.
>
> When I talk about hotspots, it isn't the time series data that is the
> concern, it is with respect to storing the maximum and minimum timeShard
> values in another table for subsequent lookup or the cost of running the
> current implementation of SELECT DISTINCT. We need to run queries such as
> getting the first or last 5000 sensor readings when we don't know the time
> frame at which they occurred so cannot directly supply the timeShard
> portion of our partition key.
>
> I appreciate your input,
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <jack.krupan...@gmail.com>
> *To:* "user@cassandra.apache.org" <user@cassandra.apache.org>
> *Sent:* Friday, March 11, 2016 4:45 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> I'll stay away from advising on a specific schema per se, but I'll stick
> to the advice that you need to make sure that your queries are depending
> solely on the columns of the primary key or relatively short slices/scans,
> rather than run the risk of very long scans or having to process multiple
> partitions for a single query. That's canned to some extent, but still
> essential.
>
> Of course we generally wish to avoid hotspots, but with time series they
> are unavoidable. I mean, sure you could place successive events at separate
> partitions, but then you can't do any kind of scanning/slicing.
>
> But, events for separate sensors are not true hotspots in the traditional
> sense - unless you have only a single sensor/unit.
>
> After considering your queries, the next step is to consider the
> cardinality of your data - how many sensors, how many units, rate of
> events, etc. That will feedback into queries as well, such as how big a
> slice or scan might be, as well as sizing of partitions.
>
> So, how many sensor units do you expect, how many sensors per unit, and
> expected rate of events per sensor?
>
> Try not to jump too quickly to specific solutions - there really is a
> method to understanding all of this other stuff upfront.
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response. I don't think I provided enough information and
> used the wrong terminology as your response is more the canned advice is
> response to Cassandra antipatterns.
>
> To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> For known time range based queries, this works great. However, the
> specific problem is in knowing the maximum and minimum timeShard values
> when we want to select the entire range of data. Our understanding is that
> if we update another related table with the maximum and minimum timeShard
> value for a given sensorUnitId and sensorId combination, we will create a
> hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list
> of partition keys for the table because we cannot reduce the scope with a
> where clause.
>
> If there is a recommended pattern that solves this, we haven't come across
> it.
>
> I hope makes the problem clearer.
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <jack.krupan...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com>
> *Sent:* Thursday, March 10, 2016 10:42 AM
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> There is an effort underway to support wider rows:
> https://issues.apache.org/jira/browse/CASSANDRA-9754
>
> This won't help you now though. Even with that improvement you still may
> need a more optimal data model since large-scale scanning/filtering is
> always a very bad idea with Cassandra.
>
> The data modeling methodology for Cassandra dictates that queries drive
> the data model and that each form of query requires a separate table
> ("query table.") Materialized view can automate that process for a lot of
> cases, but in any case it does sound as if some of your queries do require
> additional tables.
>
> As a general proposition, Cassandra should not be used for heavy filtering
> - query tables with the filtering criteria baked into the PK is the way to
> go.
>
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>
>
>
>
>
>
>
>
>
>
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Reply via email to