Yes. It greatly increases the splits and the random io.
-Tupshin
On Jul 2, 2014 6:09 PM, "Clint Kelly" wrote:
> Sorry BTW in case what I wrote below is unclear, is the concern that
> the Hadoop InputFormat (as an example) will need to have a separate
> InputSplit (which corresponds to a "SELECT
Sorry BTW in case what I wrote below is unclear, is the concern that
the Hadoop InputFormat (as an example) will need to have a separate
InputSplit (which corresponds to a "SELECT foo FROM bar WHERE
token(baz) > min AND token(baz) < max") for every vnode instead of for
every token?
(I assume this
Hi Tupshin,
Thanks for the quick reply. Is the performance concern from the
Hadoop integration needing to set up separate SELECT operations for
all of the unique vnode ranges?
Best regards,
Clint
On Wed, Jul 2, 2014 at 6:00 PM, Tupshin Harper wrote:
> For performance reasons, you shouldn't ena
For performance reasons, you shouldn't enable vnodes on any Cassandra/DSE
datacenter that is doing hadoop analytics workloads. Other DCs in the
cluster can use vnodes.
-Tupshin
On Jul 2, 2014 5:50 PM, "Clint Kelly" wrote:
> Hi everyone,
>
> Apologies if this is the incorrect forum for a questio
Hi everyone,
Apologies if this is the incorrect forum for a question like this.
I am going to set up a mixed-workload (real-time and analytics)
installation of DSE 4.5 using bring-your-own Hadoop (BYOH). We are
using CDH 5.0.
I was reviewing the installation instructions, and I came across the