Thanks for the long explanation, I looked at the link you pointed to and it does seem to concur with my mental model.
Do you see any issues with that model and simplify this logic to 1. Create windows from start (min) to end (max) going from maximum possible size. 2. Scan all SS Tables and put them in appropriate buckets. To be honest, the Target class is really difficult to reason about. The reason I investigated this was we wanted to reason about how our SS Tables are looking, and I unfortunately can't. Thanks again for the explanation !! -----Original Message----- From: Björn Hegerfors [mailto:bj...@spotify.com] Sent: Thursday, March 17, 2016 11:19 AM To: dev@cassandra.apache.org Subject: Re: DTCS Question That is probably close to the actual way it works, but not quite equal. My mental model when making this went backwards in time, towards 0, not forwards. It's something like this (using the numbers from your first example): make a bucket of the specified "timeUnit" size (1000), that contains the "now" timestamp (4050), where the starting (and therefore also the ending) timestamp of the bucket is 0 modulo the size of the bucket. That last point is perhaps the trickiest point to follow. There is only one such place for the bucket, [4000-5000) in this case. No other bucket that is aligned with the 1000s can contain 4050. Now, the next bucket (backwards) is computed based on this [4000-5000) bucket. Most of the time it will simply be the same-sized bucket right before it, i.e. [3000-4000), but if the start timestamp of our bucket (4000), divided by its size (so 4), is 0 modulo "base" (2 in this case), which it happens to be here, then we increase out bucket size "base" times, and instead make the bucket of *that* size that ends right before our current bucket. So the result will be [2000-4000). This method of getting the next bucket is repeated until we reach timestamp 0. Using the above logic, we don't increase the size of the bucket this time, because we have a start timestamp of 2000 which becomes 1 when divided by the size (2000). So we end up with [0, 2000), and we're done. The buckets were [4000-5000), [2000-4000) and [0-2000). What's more important than understanding these rules is of course getting some kind of intuition for this. Here's what it boils down to: we want there to be "base" equally sized buckets right next to each other before we *coalesce* them. Every bucket is aligned with its own size (as an analogy, compilers typically align 4-byte integers on addresses divisible by 4, same concept). So, by extension, the bigger bucket they coalesce into must be aligned with *its* size. Not just any "base" adjacent buckets will do, it will be those that align with the next size. The remaining question is when do they coalesce? There will always be at least 1 and at most "base" buckets of every size. Say "base"=4, then there can be 4 bucket of some size (by necessity next to each other and aligned on 4 times their size). The moment a new bucket of the same size appears, the 4 buckets become one and this "fifth" bucket will be alone with its size (and the start of a new group of 4 such buckets). (The rule for making the bucket were the "now" timestamp lives in, is where new buckets come from). I wish this was easier to explain in simple terms. I personally find this to have very nice properties, in that it gives every bucket a fair amount of time to settle before it's time for the next compaction. Interestingly, I proposed an alternative algorithm in this ticket <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fissues.apache.org%2fjira%2fbrowse%2fCASSANDRA-9013&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7ca41a30e8a81144de0cd008d34e90a502%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=QvmtOPUWqYnoFLLY0jx7%2fqgg0tJSleAsdBzo%2bzh22G4%3d>, including a patch implementing it. My gut tells me that the mental model that you've used here is actually equivalent to that algorithm in the ticket. It's just expressed in a very different way. Might be something for me to try to prove when I'm really bored :) Hope this helped! Any particular reason you're investigating this? / Bj0rn On Thu, Mar 17, 2016 at 5:43 PM, Anubhav Kale <anubhav.k...@microsoft.com> wrote: > Hello, > > I am trying to concretely understand how DTCS makes buckets and I am > looking at the DateTieredCompactionStrategyTest.testGetBuckets method > and played with some of the parameters to GetBuckets method call > (Cassandra 2.1.12). > > I don't think I fully understand something there. Let me try to explain. > > Consider the second test there. I changed the pairs a bit for easier > explanation and changed base (initial window size)=1000L and > Min_Threshold=2 > > pairs = Lists.newArrayList( > Pair.create("a", 200L), > Pair.create("b", 2000L), > Pair.create("c", 3600L), > Pair.create("d", 3899L), > Pair.create("e", 3900L), > Pair.create("f", 3950L), > Pair.create("too new", 4125L) > ); > buckets = getBuckets(pairs, 1000L, 2, 4050L, Long.MAX_VALUE); > > In this case, the buckets should look like [0-4000] [4000-]. Is this > correct ? The buckets that I get back are different ("a" lives in its > bucket and everyone else in another). What I am missing here ? > > Another case, > > pairs = Lists.newArrayList( > Pair.create("a", 200L), > Pair.create("b", 2000L), > Pair.create("c", 3600L), > Pair.create("d", 3899L), > Pair.create("e", 3900L), > Pair.create("f", 3950L), > Pair.create("too new", 4125L) > ); > buckets = getBuckets(pairs, 50L, 4, 4050L, Long.MAX_VALUE); > > Here, the buckets should be [0-3200] [3200-4000] [4000-4050] [4050-]. > Is this correct ? Again, the buckets that come back are quite different. > > Note, that if I keep the base to original (100L) or increase it and > play with min_threshold the results are exactly what I would expect. > > The way I think about DTCS is, try to make buckets of maximum possible > sizes from 0, and once you can't make do that , make smaller buckets > (similar to what the comment suggests). Is this mental model wrong ? I > am afraid that the math in Target class is somewhat hard to follow so > I am thinking about it this way. > > Thanks a lot in advance. > > -Anubhav >