If you are using replication factor 1 and 3 cassandra nodes, 256 virtual nodes should be evenly distributed on 3 nodes. So there are totally 256 virtual nodes. But in your experiment, you saw 3*257 mapper. Is that because of the setting cassandra.input.split.size=3? It is nothing with node number=3. Otherwise, I am confused why there are 256 virtual nodes on every cassandra node.
On Wed, Jan 28, 2015 at 12:29 AM, Shenghua(Daniel) Wan < wansheng...@gmail.com> wrote: > I did another experiment to verify indeed 3*257 (1 of 257 ranges is null > effectively) mappers were created. > > Thanks mcm for the information ! > > On Wed, Jan 28, 2015 at 12:17 AM, mck <m...@apache.org> wrote: > >> Shenghua, >> >> > The problem is the user might only want all the data via a "select *" >> > like statement. It seems that 257 connections to query the rows are >> necessary. >> > However, is there any way to prohibit 257 concurrent connections? >> >> >> Your reasoning is correct. >> The number of connections should be tunable via the >> "cassandra.input.split.size" property. See >> ConfigHelper.setInputSplitSize(..) >> >> The problem is that vnodes completely trashes this, since splits >> returned don't span across vnodes. >> There's an issue out for this – >> https://issues.apache.org/jira/browse/CASSANDRA-6091 >> but part of the problem is that the thrift stuff involved here is >> getting rewritten¹ to be pure cql. >> >> In the meantime you override the CqlInputFormat and manually re-merge >> splits together, where location sets match, so to better honour >> inputSplitSize and to return to a more reasonable number of connections. >> We do this, using code similar to this patch >> https://github.com/michaelsembwever/cassandra/pull/2/files >> >> ~mck >> >> ¹ https://issues.apache.org/jira/browse/CASSANDRA-8358 >> > > > > -- > > Regards, > Shenghua (Daniel) Wan >