For monotonically increasing data: can you try to do pre-splitting of the
destination table? That can help in avoiding one region getting overloaded
at the time of bulkimport.

Regards,
Shahab


On Fri, Sep 5, 2014 at 12:14 PM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Thanks Ted, I'll try to do a major compact.
>
> Hi Steven,
>
> Yes, most of my rows are hashed to make it randomly distributed, but one
> column family has monotonically increasing rowkeys, and it's used for
> recording sequence of events.
>
> Do you have a solution how to bulk import this kind of data?
>
> Jianshi
>
>
>
> On Sat, Sep 6, 2014 at 12:00 AM, Magana-zook, Steven Alan <
> maganazo...@llnl.gov> wrote:
>
> > Hi Jianshi,
> >
> > What are the field(s) in your row key? If your row key is monotonically
> > increasing then you will be sending all of your requests to one region
> > server. Even after the region splits, all new entries will keep punishing
> > one server (the region responsible for the split containing the new
> keys).
> >
> > See these articles that may help if this is indeed your issue:
> > 1. http://hbase.apache.org/book/rowkey.design.html
> > 2.
> >
> http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-incre
> > asing-values-are-bad/
> >
> > Regards,
> > Steven Magana-Zook
> >
> >
> >
> >
> >
> >
> > On 9/5/14 8:54 AM, "Jianshi Huang" <jianshi.hu...@gmail.com> wrote:
> >
> > >Hi JM,
> > >
> > >What do you mean by the 'destination cluster'? The files are in the same
> > >Hadoop/HDFS cluster where HBase is running.
> > >
> > >Do you mean do the bulk importing on HBase Master node?
> > >
> > >
> > >Jianshi
> > >
> > >
> > >On Fri, Sep 5, 2014 at 11:18 PM, Jean-Marc Spaggiari <
> > >jean-m...@spaggiari.org> wrote:
> > >
> > >> Hi Jianshi,
> > >>
> > >> You might want to upload the file on the destination cluster first and
> > >>then
> > >> re-run your bulk load from there. That way the transfer time will not
> be
> > >> taken into consideration for the timeout size the files will be local.
> > >>
> > >> JM
> > >>
> > >>
> > >> 2014-09-05 11:15 GMT-04:00 Jianshi Huang <jianshi.hu...@gmail.com>:
> > >>
> > >> > I'm importing 2TB of generated HFiles to HBase and I constantly get
> > >>the
> > >> > following errors:
> > >> >
> > >> > Caused by:
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.h
> > >>base.RegionTooBusyException):
> > >> > org.apache.hadoop.hbase.RegionTooBusyException: failed to get a lock
> > >>in
> > >> > 60000 ms.
> > >> >
> > >> >
> > >>
> >
> >>regionName=grapple_edges_v2,ff000000,1409817320781.6d2955c780b39523de733f
> > >>3565642d96.,
> > >> > server=xxxxx.xxx.xxx,60020,1404854700728
> > >> >         at
> > >> > org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5851)
> > >> >         at
> > >> > org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5837)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.regionserver.HRegion.startBulkRegionOperation(HRe
> > >>gion.java:5795)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:
> > >>3543)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:
> > >>3525)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFile(HRegionS
> > >>erver.java:3277)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.c
> > >>allBlockingMethod(ClientProtos.java:28863)
> > >> >         at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
> > >> >         at
> > >>org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSche
> > >>duler.java:160)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSchedu
> > >>ler.java:38)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.j
> > >>ava:110)
> > >> >         at java.lang.Thread.run(Thread.java:724)
> > >> >
> > >> >         at
> > >> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1498)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1
> > >>684)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.ca
> > >>llBlockingMethod(RpcClient.java:1737)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$Blo
> > >>ckingStub.bulkLoadHFile(ClientProtos.java:29276)
> > >> >         at
> > >> >
> > >> >
> > >>
> >
> >>org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile(ProtobufUtil.
> > >>java:1548)
> > >> >         ... 11 more
> > >> >
> > >> >
> > >> > What makes the region too busy? Is there a way to improve it?
> > >> >
> > >> > Does that also mean some part of my data are not correctly imported?
> > >> >
> > >> >
> > >> > Thanks,
> > >> >
> > >> > --
> > >> > Jianshi Huang
> > >> >
> > >> > LinkedIn: jianshi
> > >> > Twitter: @jshuang
> > >> > Github & Blog: http://huangjs.github.com/
> > >> >
> > >>
> > >
> > >
> > >
> > >--
> > >Jianshi Huang
> > >
> > >LinkedIn: jianshi
> > >Twitter: @jshuang
> > >Github & Blog: http://huangjs.github.com/
> >
> >
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Reply via email to