On Wed, Feb 24, 2016 at 6:28 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote:
> Thanks. I didn't pay enough attention to that statement on my initial > reading of that post (which was where I became aware of the 3.2 behavior in > the first place.) > > Considering that the doc explicitly recommends that the byte ordered > partitioner not be used, that implies that the 3.2 JBOD behavior should be > used for all recommended partitioner use cases. > > I'm still not clear on when exactly a node would not have "localRanges" - > in terms of how the user would hit that scenario, or is than merely a > defensive check for a scenario which cannot normally be encountered? I > mean, it means that the endpoint is not responsible for any range of > tokens, but how can that ever be true, or is that simply if the user > configures the node to own zero tokens? But other than that, is there any > normal way a user could end up with a node that has no "localRanges"? > IIRC it is only defensive now - before https://issues.apache.org/jira/browse/CASSANDRA-9317 it could be empty during startup > > But even if the node owns no "local" ranges, can't it have replicated data > from RF=k-1 other nodes? Or does empty localRanges mean than the RF=k-1 > nodes that might have replicated data for this node are all also configured > to own zero tokens? Seems that way. But is there any reasonable scenario > under which the user would hit this? I mean, why would the code care either > way with respect to JBOD strategy for the case where no local data is > stored? > local ranges are all ranges the node should store - if you have 256 vnode tokens and RF=3, you will have 768 local ranges /Marcus > > > -- Jack Krupansky > > On Wed, Feb 24, 2016 at 2:15 AM, Marcus Eriksson <krum...@gmail.com> > wrote: > >> It is mentioned here btw: http://www.datastax.com/dev/blog/improving-jbod >> >> On Wed, Feb 24, 2016 at 8:14 AM, Marcus Eriksson <krum...@gmail.com> >> wrote: >> >>> If you don't use RandomPartitioner/Murmur3Partitioner you will get the >>> old behavior. >>> >>> On Wed, Feb 24, 2016 at 2:47 AM, Jack Krupansky < >>> jack.krupan...@gmail.com> wrote: >>> >>>> I just wanted to confirm whether my understanding of how JBOD allocates >>>> device space is correct of not... >>>> >>>> Pre-3.2: >>>> On each memtable flush Cassandra will select the directory (device) >>>> which has the most available space as a percentage of the total available >>>> space on all of the listed directories/devices. A random weighted value is >>>> used so it won't always pick the same directory/device with the most space, >>>> the goal being to balance writes for performance. >>>> >>>> As of 3.2: >>>> The ranges of tokens stored on the local node will be evenly >>>> distributed among the configured storage devices - even by token range, >>>> even if that may be uneven by actual partition sizes. The code presumes >>>> that each of the configured local storage devices has the same capacity. >>>> >>>> The relevant change in 3.2 appears to be: >>>> Make sure tokens don't exist in several data directories >>>> (CASSANDRA-6696) >>>> >>>> The code for the pre-3.2 model is still in 3.x - is there some other >>>> code path which will cause the pre-3.2 behavior even when runing 3.2 or >>>> later? >>>> >>>> I see this code which seems to allow for at least some cases where the >>>> pre-3.2 behavior would still be invoked, but I'm not sure what user-level >>>> cases that might be: >>>> >>>> if (!cfs.getPartitioner().splitter().isPresent() || >>>> localRanges.isEmpty()) >>>> return Collections.singletonList(new >>>> FlushRunnable(lastReplayPosition.get(), txn)); >>>> >>>> return createFlushRunnables(localRanges, txn); >>>> >>>> IOW, if the partitioner does not have a splitter present or the >>>> localRanges for the node cannot be determined. But... what exactly would a >>>> user do to cause that? >>>> >>>> There is no doc for this stuff - can a committer (or adventurous user!) >>>> confirm what is actually implemented, both pre and post 3.2? (I already >>>> pinged docs on this.) >>>> >>>> Or if anybody is actually using JBOD, what behavior they are seeing for >>>> device space utilization. >>>> >>>> Thanks! >>>> >>>> -- Jack Krupansky >>>> >>> >>> >> >