Virtual node support for Hadoop workloads

Paulo Motta Thu, 17 Oct 2013 13:56:30 -0700

Hello,

According to DSE3.1 documentation [1], "DataStax recommends using virtual
nodes only on data centers running purely Cassandra workloads. You should
disable virtual nodes on data centers running either Hadoop or Solr
workloads by setting num_tokens to 1.".


There was a thread in this mailing list earlier this year [2], where it was
suggested a workaround to the problem of having a minimum of one map task
per token (unfeasible with vnodes). This suggestion involved implementing a
new Hadoop InputSplitFormat that could combine many tokens from a single
node, thus reducing the overhead of having too many tasks per node.

Is there any JIRA ticket around this issue yet, or something being worked
on to support VNodes for Hadoop workloads, or the suggestion remains to
avoid VNodes for analytics workloads (hadoop, solr)?

Thanks,

-- 
Paulo

[1]
http://www.datastax.com/docs/datastax_enterprise3.1/deploy/configuring_replication
**
[2]
http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV_UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=QY=2zgy...@mail.gmtokenail.com%3E<http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV_UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=QY=2zgy...@mail.gmail.com%3E>

Virtual node support for Hadoop workloads

Reply via email to