Hi I'm accessing multiple regions (~5k) of an HBase table using spark's newAPIHadoopRDD. But the driver is trying to calculate the region size of all the regions. It is not even reusing the hconnection and creting a new connection for every request (see below) which is taking lots of time.
Is there a better approach to do this? 8 Nov 2016 22:25:22,759] [INFO Driver] RecoverableZooKeeper: Process identifier=*hconnection-0x1e7824af* connecting to ZooKeeper ensemble= hbase19.cloud.com:2181,hbase24.cloud.com:2181,hbase28.cloud.com:2181 [18 Nov 2016 22:25:22,759] [INFO Driver] ZooKeeper: Initiating client connection, connectString=hbase19.cloud.com:2181,hbase24.cloud.com:2181, hbase28.cloud.com:2181 sessionTimeout=60000 watcher=hconnection-0x1e7824af0x0, quorum=hbase19.cloud.com:2181, hbase24.cloud.com:2181,hbase28.cloud.com:2181, baseZNode=/hbase [18 Nov 2016 22:25:22,761] [INFO Driver-SendThread(hbase24.cloud.com:2181)] ClientCnxn: Opening socket connection to server hbase24.cloud.com/10.193.150.217:2181. Will not attempt to authenticate using SASL (unknown error) [18 Nov 2016 22:25:22,763] [INFO Driver-SendThread(hbase24.cloud.com:2181)] ClientCnxn: Socket connection established, initiating session, client: / 10.193.138.145:47891, server: hbase24.cloud.com/10.193.150.217:2181 [18 Nov 2016 22:25:22,766] [INFO Driver-SendThread(hbase24.cloud.com:2181)] ClientCnxn: Session establishment complete on server hbase24.cloud.com/10.193.150.217:2181, sessionid = 0x2564f6f013e0e95, negotiated timeout = 60000 [18 Nov 2016 22:25:22,766] [INFO Driver] RegionSizeCalculator: Calculating region sizes for table "message". [18 Nov 2016 22:25:27,867] [INFO Driver] ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService [18 Nov 2016 22:25:27,868] [INFO Driver] ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x2564f6f013e0e95 [18 Nov 2016 22:25:27,869] [INFO Driver] ZooKeeper: Session: 0x2564f6f013e0e95 closed [18 Nov 2016 22:25:27,869] [INFO Driver-EventThread] ClientCnxn: EventThread shut down [18 Nov 2016 22:25:27,880] [INFO Driver] RecoverableZooKeeper: Process identifier=*hconnection-0x6a8a1efa* connecting to ZooKeeper ensemble= hbase19.cloud.com:2181,hbase24.cloud.com:2181,hbase28.cloud.com:2181 [18 Nov 2016 22:25:27,880] [INFO Driver] ZooKeeper: Initiating client connection, connectString=hbase19.cloud.com:2181,hbase24.cloud.com:2181, hbase28.cloud.com:2181 sessionTimeout=60000 watcher=hconnection-0x6a8a1efa0x0, quorum=hbase19.cloud.com:2181, hbase24.cloud.com:2181,hbase28.cloud.com:2181, baseZNode=/hbase [18 Nov 2016 22:25:27,883] [INFO Driver-SendThread(hbase24.cloud.com:2181)] ClientCnxn: Opening socket connection to server hbase24.cloud.com/10.193.150.217:2181. Will not attempt to authenticate using SASL (unknown error) [18 Nov 2016 22:25:27,885] [INFO Driver-SendThread(hbase24.cloud.com:2181)] ClientCnxn: Socket connection established, initiating session, client: / 10.193.138.145:47894, server: hbase24.cloud.com/10.193.150.217:2181 [18 Nov 2016 22:25:27,887] [INFO Driver-SendThread(hbase24.cloud.com:2181)] ClientCnxn: Session establishment complete on server hbase24.cloud.com/10.193.150.217:2181, sessionid = 0x2564f6f013e0e97, negotiated timeout = 60000 [18 Nov 2016 22:25:27,888] [INFO Driver] RegionSizeCalculator: Calculating region sizes for table "message". .... -- Thanks & Regards, *Mukesh Jha <me.mukesh....@gmail.com>*