I did another experiment to verify indeed 3*257 (1 of 257 ranges is null effectively) mappers were created.
Thanks mcm for the information ! On Wed, Jan 28, 2015 at 12:17 AM, mck <m...@apache.org> wrote: > Shenghua, > > > The problem is the user might only want all the data via a "select *" > > like statement. It seems that 257 connections to query the rows are > necessary. > > However, is there any way to prohibit 257 concurrent connections? > > > Your reasoning is correct. > The number of connections should be tunable via the > "cassandra.input.split.size" property. See > ConfigHelper.setInputSplitSize(..) > > The problem is that vnodes completely trashes this, since splits > returned don't span across vnodes. > There's an issue out for this – > https://issues.apache.org/jira/browse/CASSANDRA-6091 > but part of the problem is that the thrift stuff involved here is > getting rewritten¹ to be pure cql. > > In the meantime you override the CqlInputFormat and manually re-merge > splits together, where location sets match, so to better honour > inputSplitSize and to return to a more reasonable number of connections. > We do this, using code similar to this patch > https://github.com/michaelsembwever/cassandra/pull/2/files > > ~mck > > ¹ https://issues.apache.org/jira/browse/CASSANDRA-8358 > -- Regards, Shenghua (Daniel) Wan