From: clohfin...@gmail.com Subject: Re: data distribution along column family partitions
> not ok :) don't let a single partition get to 1gb, 100's of mb should be when > flares are going up. The main reasoning is compactions would be horrifically > slow and there will be a lot of gc pain. Bringing the time bucket to be by > day will probably be sufficient. It would take billions of alarm events in > single time bucket if thats entire data payload to get that bad. > Wide rows work well, the keeping them smaller is an optimization that will > save you a lot of pain down the road from troublesome jvm gcs, slower > compactions, unbalanced nodes, and higher read latencies. That's the point, I won't have many partitions with more than 15gb. But suppose will have for 1000 users, among 10 million. Almost all partitions will have a good size, but I won't have a problem with the few ones which are big then? I am asking this because in prior experience I felt I was having a huge performance penalty reading updates from these 1000 users, like, I might have few cases, but assuming every time data changes I will have to process the user again, I will read the worst case very often. Chris On Wed, Feb 4, 2015 at 9:33 AM, Marcelo Valle (BLOOMBERG/ LONDON) <mvallemil...@bloomberg.net> wrote: > The data model lgtm. You may need to balance the size of the time buckets > with the amount of alarms to prevent partitions from getting too large. 1 month may be a little large, I would aim to keep the partitions below 25mb (can check with nodetool cfstats) or so in size to keep everything happy. Its ok if occasional ones go larger, something like 1gb can be bad.. but it would still work if not very efficiently. What about 15 gb? > Deletes on an entire time-bucket at a time seems like a good approach, but > just setting TTL would be far far better imho (why not just set it to two > years?). May want to look into new DateTieredCompactionStrategy, or > LeveledCompactionStrategy or the obsoleted data will very rarely go away. Excellent hint, I will take a good look on this. I didn't know DateTieredCompactionStrategy > When reading just be sure to use paging (the good cql drivers will have it > built in) and don't actually read it all in one massive query. If you > decrease size of your time bucket you may end up having to page the query > across multiple partitions if Y-X > bucket size. If I use paging, Cassandra won't try to allocate the whole partition on the server node, it will just allocate memory in the heap for that page. Check? Marcelo Valle From: user@cassandra.apache.org Subject: Re: data distribution along column family partitions The data model lgtm. You may need to balance the size of the time buckets with the amount of alarms to prevent partitions from getting too large. 1 month may be a little large, I would aim to keep the partitions below 25mb (can check with nodetool cfstats) or so in size to keep everything happy. Its ok if occasional ones go larger, something like 1gb can be bad.. but it would still work if not very efficiently. Deletes on an entire time-bucket at a time seems like a good approach, but just setting TTL would be far far better imho (why not just set it to two years?). May want to look into new DateTieredCompactionStrategy, or LeveledCompactionStrategy or the obsoleted data will very rarely go away. When reading just be sure to use paging (the good cql drivers will have it built in) and don't actually read it all in one massive query. If you decrease size of your time bucket you may end up having to page the query across multiple partitions if Y-X > bucket size. Chris On Wed, Feb 4, 2015 at 4:34 AM, Marcelo Elias Del Valle <mvall...@gmail.com> wrote: Hello, I am designing a model to store alerts users receive over time. I will want to store probably the last two years of alerts for each user. The first thought I had was having a column family partitioned by user + timebucket, where time bucket could be something like year + month. For instance: partition key: user-id = f47ac10b-58cc-4372-a567-0e02b2c3d479 time-bucket = 201502 rest of primary key: timestamp = column of tipy timestamp alert id = f47ac10b-58cc-4372-a567-0e02b2c3d480 Question, would this make it easier to delete old data? Supposing I am not using TTL and I want to remove alerts older than 2 years, what would be better, just deleting the entire time-bucket for each user-id (through a map/reduce process) or having just user-id as partition key and deleting, for each user, where X > timestamp > Y? Is it the same for Cassandra, internally? Another question is: would data be distributed enough if I just choose to partition by user-id? I will have some users with a large number of alerts, but in average I could consider alerts would have a good distribution along user ids. The problem is I don't fell confident having few partitions with A LOT of alerts would not be a problem at read time. What happens at read time when I try to read data from a big partition? Like, I want to read alerts for a user where X > timestamp > Y, but it would return 1 million alerts. As it's all in a single partition, this read will occur in the same node, thus allocating a lot of memory for this single operation, right? What if the memory needed for this operation is bigger than it fits in java heap? Would this be a problem to Cassandra? Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr