Re: Spark opening to many connection with zookeeper

Ted Yu Tue, 20 Oct 2015 09:03:08 -0700

I need to dig deeper into saveAsHadoopDataset to see what might have caused
the effect you observed.


Cheers

On Tue, Oct 20, 2015 at 8:57 AM, Amit Hora <[email protected]> wrote:

> Hi Ted,
>
> I made mistake last time yes the connection are very controlled when I
> used put like iterated over rdd for each and within that for each partition
> made connection and executed put list for hbase
>
> But why it was that the connection were getting too much when I used
> hibconf and storehadoopdataset method?
> ------------------------------
> From: Amit Hora <[email protected]>
> Sent: ‎20-‎10-‎2015 20:38
> To: Ted Yu <[email protected]>
> Cc: user <[email protected]>
> Subject: RE: Spark opening to many connection with zookeeper
>
> I used that also but the number of connection goes on increasing started
> frm 10 and went till 299
> Than I changed my zookeeper conf to set max client connection to just 30
> and restarted job
> Now the connections are between 18- 24 from last 2 hours
>
> I am unable to understand such a behaviour
> ------------------------------
> From: Ted Yu <[email protected]>
> Sent: ‎20-‎10-‎2015 20:19
> To: Amit Hora <[email protected]>
> Cc: user <[email protected]>
> Subject: Re: Spark opening to many connection with zookeeper
>
> Can you take a look at example 37 on page 225 of:
> http://hbase.apache.org/apache_hbase_reference_guide.pdf
>
> You can use the following method of Table:
>
>   void put(List<Put> puts) throws IOException;
>
> After the put() returns, the connection is closed.
>
> Cheers
>
> On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora <[email protected]> wrote:
>
>> One region
>> ------------------------------
>> From: Ted Yu <[email protected]>
>> Sent: ‎20-‎10-‎2015 15:01
>> To: Amit Singh Hora <[email protected]>
>> Cc: user <[email protected]>
>> Subject: Re: Spark opening to many connection with zookeeper
>>
>> How many regions do your table have ?
>>
>> Which hbase release do you use ?
>>
>> Cheers
>>
>> On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <[email protected]>
>> wrote:
>>
>>> Hi All ,
>>>
>>> My spark job started reporting zookeeper errors after seeing the zkdumps
>>> from Hbase master i realized that there are N number of connection being
>>> made from the nodes where worker of spark are running i  believe some how
>>> the connections are not getting closed that is leading to error
>>>
>>> please find below code
>>>
>>> val conf = ConfigFactory.load("connection.conf").getConfig("connection")
>>>       val hconf = HBaseConfiguration.create();
>>>     hconf.set(TableOutputFormat.OUTPUT_TABLE,
>>> conf.getString("hbase.tablename"))
>>>     hconf.set("zookeeper.session.timeout",
>>> conf.getString("hbase.zookeepertimeout"));
>>>     hconf.set("hbase.client.retries.number", Integer.toString(1));
>>>     hconf.set("zookeeper.recovery.retry", Integer.toString(1));
>>>     hconf.set("hbase.master", conf.getString("hbase.hbase_master"));
>>>
>>>
>>> hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
>>> // zkquorum consists of 5 nodes
>>>     hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
>>>     hconf.set("hbase.zookeeper.property.clientPort",
>>> conf.getString("hbase.hbase_zk_port"));
>>>
>>>
>>> hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
>>>     val jobConfig: JobConf = new JobConf(hconf, this.getClass)
>>>     jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
>>> "/user/user01/out")
>>>     jobConfig.setOutputFormat(classOf[TableOutputFormat])
>>>     jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
>>> conf.getString("hbase.tablename"))
>>>
>>>          try{
>>>          rdd.map(convertToPut).
>>>         saveAsHadoopDataset(jobConfig)
>>>          }
>>>
>>> the method convertToPut does nothing but jsut converts the json to Put
>>> objects of HBase
>>>
>>> After i killed the application/driver the number of connection decreased
>>> drastically
>>>
>>> Kindly help in understanding and resolving the issue
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>

Re: Spark opening to many connection with zookeeper

Reply via email to