Re: data distribution along column family partitions

Marcelo Valle (BLOOMBERG/ LONDON) Wed, 04 Feb 2015 07:59:35 -0800

From: clohfin...@gmail.com 
Subject: Re: data distribution along column family partitions

> not ok :) don't let a single partition get to 1gb, 100's of mb should be when 
> flares are going up.  The main reasoning is compactions would be horrifically 
> slow and there will be a lot of gc pain. Bringing the time bucket to be by 
> day will probably be sufficient. It would take billions of alarm events in 
> single time bucket if thats entire data payload to get that bad.
> Wide rows work well, the keeping them smaller is an optimization that will 
> save you a lot of pain down the road from troublesome jvm gcs, slower 
> compactions, unbalanced nodes, and higher read latencies.

That's the point, I won't have many partitions with more than 15gb. But suppose 
will have for 1000 users, among 10 million. Almost all partitions will have a 
good size, but I won't have a problem with the few ones which are big then? 

I am asking this because in prior experience I felt I was having a huge 
performance penalty reading updates from these 1000 users, like, I might have 
few cases, but assuming every time data changes I will have to process the user 
again, I will read the worst case very often. 

Chris

On Wed, Feb 4, 2015 at 9:33 AM, Marcelo Valle (BLOOMBERG/ LONDON) 
<mvallemil...@bloomberg.net> wrote:

> The data model lgtm. You may need to balance the size of the time buckets 
> with the amount of alarms to prevent partitions from getting too large. 1
month may be a little large, I would aim to keep the partitions below 25mb (can 
check with nodetool cfstats) or so in size to keep everything happy. Its ok if 
occasional ones go larger, something like 1gb can be bad.. but it would still 
work if not very efficiently.

What about 15 gb?

> Deletes on an entire time-bucket at a time seems like a good approach, but 
> just setting TTL would be far far better imho (why not just set it to two 
> years?). May want to look into new DateTieredCompactionStrategy, or 
> LeveledCompactionStrategy or the obsoleted data will very rarely go away.

Excellent hint, I will take a good look on this. I didn't know 
DateTieredCompactionStrategy

> When reading just be sure to use paging (the good cql drivers will have it 
> built in) and don't actually read it all in one massive query. If you 
> decrease size of your time bucket you may end up having to page the query 
> across multiple partitions if Y-X > bucket size.

If I use paging, Cassandra won't try to allocate the whole partition on the 
server node, it will just allocate memory in the heap for that page. Check?

Marcelo Valle
From: user@cassandra.apache.org 
Subject: Re: data distribution along column family partitions

The data model lgtm.  You may need to balance the size of the time buckets with 
the amount of alarms to prevent partitions from getting too large.  1 month may 
be a little large, I would aim to keep the partitions below 25mb (can check 
with nodetool cfstats) or so in size to keep everything happy.  Its ok if 
occasional ones go larger, something like 1gb can be bad.. but it would still 
work if not very efficiently.

Deletes on an entire time-bucket at a time seems like a good approach, but just 
setting TTL would be far far better imho (why not just set it to two years?).  
May want to look into new DateTieredCompactionStrategy, or 
LeveledCompactionStrategy or the obsoleted data will very rarely go away.

When reading just be sure to use paging (the good cql drivers will have it 
built in) and don't actually read it all in one massive query.  If you decrease 
size of your time bucket you may end up having to page the query across 
multiple partitions if Y-X > bucket size.

Chris

On Wed, Feb 4, 2015 at 4:34 AM, Marcelo Elias Del Valle <mvall...@gmail.com> 
wrote:

Hello,

I am designing a model to store alerts users receive over time. I will want to 
store probably the last two years of alerts for each user.

The first thought I had was having a column family partitioned by user + 
timebucket, where time bucket could be something like year + month. For 
instance:

partition key:
user-id = f47ac10b-58cc-4372-a567-0e02b2c3d479
time-bucket = 201502
rest of primary key:
timestamp = column of tipy timestamp
alert id = f47ac10b-58cc-4372-a567-0e02b2c3d480

Question, would this make it easier to delete old data? Supposing I am not 
using TTL and I want to remove alerts older than 2 years, what would be better, 
just deleting the entire time-bucket for each user-id (through a map/reduce 
process) or having just user-id as partition key and deleting, for each user, 
where X > timestamp > Y? 

Is it the same for Cassandra, internally?

Another question is: would data be distributed enough if I just choose to 
partition by user-id? I will have some users with a large number of alerts, but 
in average I could consider alerts would have a good distribution along user 
ids. The problem is I don't fell confident having few partitions with A LOT of 
alerts would not be a problem at read time. 

What happens at read time when I try to read data from a big partition? 
Like, I want to read alerts for a user where X > timestamp > Y, but it would 
return 1 million alerts. As it's all in a single partition, this read will 
occur in the same node, thus allocating a lot of memory for this single 
operation, right? 

What if the memory needed for this operation is bigger than it fits in java 
heap? Would this be a problem to Cassandra?

Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Re: data distribution along column family partitions

Reply via email to