Re: JBOD device space allocation?

Jack Krupansky Wed, 24 Feb 2016 09:29:20 -0800

Thanks. I didn't pay enough attention to that statement on my initial
reading of that post (which was where I became aware of the 3.2 behavior in
the first place.)

Considering that the doc explicitly recommends that the byte ordered
partitioner not be used, that implies that the 3.2 JBOD behavior should be
used for all recommended partitioner use cases.

I'm still not clear on when exactly a node would not have "localRanges" -
in terms of how the user would hit that scenario, or is than merely a
defensive check for a scenario which cannot normally be encountered? I
mean, it means that the endpoint is not responsible for any range of
tokens, but how can that ever be true, or is that simply if the user
configures the node to own zero tokens? But other than that, is there any
normal way a user could end up with a node that has no "localRanges"?

But even if the node owns no "local" ranges, can't it have replicated data
from RF=k-1 other nodes? Or does empty localRanges mean than the RF=k-1
nodes that might have replicated data for this node are all also configured
to own zero tokens? Seems that way. But is there any reasonable scenario
under which the user would hit this? I mean, why would the code care either
way with respect to JBOD strategy for the case where no local data is
stored?

-- Jack Krupansky

On Wed, Feb 24, 2016 at 2:15 AM, Marcus Eriksson <krum...@gmail.com> wrote:

> It is mentioned here btw: http://www.datastax.com/dev/blog/improving-jbod
>
> On Wed, Feb 24, 2016 at 8:14 AM, Marcus Eriksson <krum...@gmail.com>
> wrote:
>
>> If you don't use RandomPartitioner/Murmur3Partitioner you will get the
>> old behavior.
>>
>> On Wed, Feb 24, 2016 at 2:47 AM, Jack Krupansky <jack.krupan...@gmail.com
>> > wrote:
>>
>>> I just wanted to confirm whether my understanding of how JBOD allocates
>>> device space is correct of not...
>>>
>>> Pre-3.2:
>>> On each memtable flush Cassandra will select the directory (device)
>>> which has the most available space as a percentage of the total available
>>> space on all of the listed directories/devices. A random weighted value is
>>> used so it won't always pick the same directory/device with the most space,
>>> the goal being to balance writes for performance.
>>>
>>> As of 3.2:
>>> The ranges of tokens stored on the local node will be evenly distributed
>>> among the configured storage devices - even by token range, even if that
>>> may be uneven by actual partition sizes. The code presumes that each of the
>>> configured local storage devices has the same capacity.
>>>
>>> The relevant change in 3.2 appears to be:
>>> Make sure tokens don't exist in several data directories (CASSANDRA-6696)
>>>
>>> The code for the pre-3.2 model is still in 3.x - is there some other
>>> code path which will cause the pre-3.2 behavior even when runing 3.2 or
>>> later?
>>>
>>> I see this code which seems to allow for at least some cases where the
>>> pre-3.2 behavior would still be invoked, but I'm not sure what user-level
>>> cases that might be:
>>>
>>> if (!cfs.getPartitioner().splitter().isPresent() ||
>>> localRanges.isEmpty())
>>>   return Collections.singletonList(new
>>> FlushRunnable(lastReplayPosition.get(), txn));
>>>
>>> return createFlushRunnables(localRanges, txn);
>>>
>>> IOW, if the partitioner does not have a splitter present or the
>>> localRanges for the node cannot be determined. But... what exactly would a
>>> user do to cause that?
>>>
>>> There is no doc for this stuff - can a committer (or adventurous user!)
>>> confirm what is actually implemented, both pre and post 3.2? (I already
>>> pinged docs on this.)
>>>
>>> Or if anybody is actually using JBOD, what behavior they are seeing for
>>> device space utilization.
>>>
>>> Thanks!
>>>
>>> -- Jack Krupansky
>>>
>>
>>
>

Re: JBOD device space allocation?

Reply via email to