Hello, According to DSE3.1 documentation [1], "DataStax recommends using virtual nodes only on data centers running purely Cassandra workloads. You should disable virtual nodes on data centers running either Hadoop or Solr workloads by setting num_tokens to 1.".
There was a thread in this mailing list earlier this year [2], where it was suggested a workaround to the problem of having a minimum of one map task per token (unfeasible with vnodes). This suggestion involved implementing a new Hadoop InputSplitFormat that could combine many tokens from a single node, thus reducing the overhead of having too many tasks per node. Is there any JIRA ticket around this issue yet, or something being worked on to support VNodes for Hadoop workloads, or the suggestion remains to avoid VNodes for analytics workloads (hadoop, solr)? Thanks, -- Paulo [1] http://www.datastax.com/docs/datastax_enterprise3.1/deploy/configuring_replication ** [2] http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV_UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=QY=2zgy...@mail.gmtokenail.com%3E<http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV_UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=QY=2zgy...@mail.gmail.com%3E>