Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

Huiliang Zhang Tue, 27 Jan 2015 23:01:29 -0800

Hi Shenghua, as I understand, each range is assigned to a mapper. Mapper
will not share connections. So, it needs at least 256 connections to read
all. But all 256 connections should not be set up at the same time unless
you have 256 mappers running at the same time.


On Tue, Jan 27, 2015 at 9:34 PM, Shenghua(Daniel) Wan <wansheng...@gmail.com
> wrote:

> By default, each C* node is set with 256 tokens. On a local 1-node C*
> server, my hadoop drop creates 256 connections to the server. Is there any
> way to control this behavior? e.g. reduce the number of connections to a
> pre-configured gap.
>
> I debugged C* source code and found the client asks for partition ranges,
> or virtual nodes. Then the client was told by server there were 257 ranges,
> corresponding to 257 column family splits.
>
> Here is a snapshot of my logs
>
> 15/01/27 18:02:20 DEBUG hadoop.AbstractColumnFamilyInputFormat: adding
> ColumnFamilySplit((9121856086738887846, '-9223372036854775808] @[localhost])
> ...
> totally 257 splits.
>
> The problem is the user might only want all the data via a "select *" like
> statement. It seems that 257 connections to query the rows are necessary.
> However, is there any way to prohibit 257 concurrent connections?
>
> My C* version is 2.0.11 and I also tried CqlPagingInputFormat, which has
> same behavior.
>
> Thank you.
>
> --
>
> Regards,
> Shenghua (Daniel) Wan
>

Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

Reply via email to