For monotonically increasing data: can you try to do pre-splitting of the destination table? That can help in avoiding one region getting overloaded at the time of bulkimport.
Regards, Shahab On Fri, Sep 5, 2014 at 12:14 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Thanks Ted, I'll try to do a major compact. > > Hi Steven, > > Yes, most of my rows are hashed to make it randomly distributed, but one > column family has monotonically increasing rowkeys, and it's used for > recording sequence of events. > > Do you have a solution how to bulk import this kind of data? > > Jianshi > > > > On Sat, Sep 6, 2014 at 12:00 AM, Magana-zook, Steven Alan < > maganazo...@llnl.gov> wrote: > > > Hi Jianshi, > > > > What are the field(s) in your row key? If your row key is monotonically > > increasing then you will be sending all of your requests to one region > > server. Even after the region splits, all new entries will keep punishing > > one server (the region responsible for the split containing the new > keys). > > > > See these articles that may help if this is indeed your issue: > > 1. http://hbase.apache.org/book/rowkey.design.html > > 2. > > > http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-incre > > asing-values-are-bad/ > > > > Regards, > > Steven Magana-Zook > > > > > > > > > > > > > > On 9/5/14 8:54 AM, "Jianshi Huang" <jianshi.hu...@gmail.com> wrote: > > > > >Hi JM, > > > > > >What do you mean by the 'destination cluster'? The files are in the same > > >Hadoop/HDFS cluster where HBase is running. > > > > > >Do you mean do the bulk importing on HBase Master node? > > > > > > > > >Jianshi > > > > > > > > >On Fri, Sep 5, 2014 at 11:18 PM, Jean-Marc Spaggiari < > > >jean-m...@spaggiari.org> wrote: > > > > > >> Hi Jianshi, > > >> > > >> You might want to upload the file on the destination cluster first and > > >>then > > >> re-run your bulk load from there. That way the transfer time will not > be > > >> taken into consideration for the timeout size the files will be local. > > >> > > >> JM > > >> > > >> > > >> 2014-09-05 11:15 GMT-04:00 Jianshi Huang <jianshi.hu...@gmail.com>: > > >> > > >> > I'm importing 2TB of generated HFiles to HBase and I constantly get > > >>the > > >> > following errors: > > >> > > > >> > Caused by: > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.h > > >>base.RegionTooBusyException): > > >> > org.apache.hadoop.hbase.RegionTooBusyException: failed to get a lock > > >>in > > >> > 60000 ms. > > >> > > > >> > > > >> > > > >>regionName=grapple_edges_v2,ff000000,1409817320781.6d2955c780b39523de733f > > >>3565642d96., > > >> > server=xxxxx.xxx.xxx,60020,1404854700728 > > >> > at > > >> > org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5851) > > >> > at > > >> > org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5837) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.regionserver.HRegion.startBulkRegionOperation(HRe > > >>gion.java:5795) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java: > > >>3543) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java: > > >>3525) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFile(HRegionS > > >>erver.java:3277) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.c > > >>allBlockingMethod(ClientProtos.java:28863) > > >> > at > > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) > > >> > at > > >>org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSche > > >>duler.java:160) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSchedu > > >>ler.java:38) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.j > > >>ava:110) > > >> > at java.lang.Thread.run(Thread.java:724) > > >> > > > >> > at > > >> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1498) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1 > > >>684) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.ca > > >>llBlockingMethod(RpcClient.java:1737) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$Blo > > >>ckingStub.bulkLoadHFile(ClientProtos.java:29276) > > >> > at > > >> > > > >> > > > >> > > > >>org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile(ProtobufUtil. > > >>java:1548) > > >> > ... 11 more > > >> > > > >> > > > >> > What makes the region too busy? Is there a way to improve it? > > >> > > > >> > Does that also mean some part of my data are not correctly imported? > > >> > > > >> > > > >> > Thanks, > > >> > > > >> > -- > > >> > Jianshi Huang > > >> > > > >> > LinkedIn: jianshi > > >> > Twitter: @jshuang > > >> > Github & Blog: http://huangjs.github.com/ > > >> > > > >> > > > > > > > > > > > >-- > > >Jianshi Huang > > > > > >LinkedIn: jianshi > > >Twitter: @jshuang > > >Github & Blog: http://huangjs.github.com/ > > > > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >