Hi,

I'm scanning a relatively large table stored in HBase using pig.
I've got a column family named event_data with 3 columns (tab_event_string, 
date and Id).
The table is indexed by a key which has a event code and a time stamp.
Nothing special about this table except for the fact that it is relatively 
large.

Below is the PIG code for scanning the table (parameters $GTE and $LTE are 
basically the begin and end timestamps).

RAW_1 = LOAD 'my_events'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
        'event_data:tab_event_string event_data:date event_data:Id',
        '-caching 100 -limit 2000 -gte 000100_$GTE -lte=000100_$LTE -caster 
HBaseBinaryConverter'
)
AS (tab_event_string:bytearray,date:bytearray,Id:bytearray);

Now, the problem is that one of the mappers that scan this table always takes 
too long to initialize. I always get two messages (for attempts 0 and 1)

Task attempt_201107151702_48728_m_000000_0 failed to report status for 600 
seconds. Killing!
Task attempt_201107151702_48728_m_000000_1 failed to report status for 600 
seconds. Killing!


And once it initializes - e.g at attempt 2 - I always end up with a scanner 
exception:

org.apache.hadoop.hbase.client.ScannerTimeoutException: 61056ms passed since 
the last invocation, timeout is currently set to 60000
        at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1128)
        at 
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:143)
        at 
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142)
        at 
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat$HBaseTableRecordReader.nextKeyValue(HBaseTableInputFormat.java:162)
        at 
org.apache.pig.backend.hadoop.hbase.HBaseStorage.getNext(HBaseStorage.java:319)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
        at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.hbase.UnknownScannerException: 
org.apache.hadoop.hbase.UnknownScannerException: Name: 5364262096576298375
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1794)
        at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:83)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:38)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1012)
        at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1119)
        ... 11 more

Other taks attempts usually also fail:

Task attempt_201107151702_48728_m_000000_3 failed to report status for 600 
seconds. Killing!

...

Now, this *only happens for one mapper* --> mapper_0
No matter how I change my scanning parameters - different begin / end 
timestamp; more data vs less data; change caching ,etc.. - , it is always ends 
up like this: mapper 0 fails, even when 500 hundreds other succeed.

I really don't have a clue of why this is happening. The most intriguing is 
that this happens always for mapper 0, no matter on what machine of the cluster 
it runs.

Does anyone have a clue about this?

Thanks!

Luis




 



Reply via email to