Hi, I'm scanning a relatively large table stored in HBase using pig. I've got a column family named event_data with 3 columns (tab_event_string, date and Id). The table is indexed by a key which has a event code and a time stamp. Nothing special about this table except for the fact that it is relatively large.
Below is the PIG code for scanning the table (parameters $GTE and $LTE are basically the begin and end timestamps). RAW_1 = LOAD 'my_events' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'event_data:tab_event_string event_data:date event_data:Id', '-caching 100 -limit 2000 -gte 000100_$GTE -lte=000100_$LTE -caster HBaseBinaryConverter' ) AS (tab_event_string:bytearray,date:bytearray,Id:bytearray); Now, the problem is that one of the mappers that scan this table always takes too long to initialize. I always get two messages (for attempts 0 and 1) Task attempt_201107151702_48728_m_000000_0 failed to report status for 600 seconds. Killing! Task attempt_201107151702_48728_m_000000_1 failed to report status for 600 seconds. Killing! And once it initializes - e.g at attempt 2 - I always end up with a scanner exception: org.apache.hadoop.hbase.client.ScannerTimeoutException: 61056ms passed since the last invocation, timeout is currently set to 60000 at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1128) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:143) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142) at org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat$HBaseTableRecordReader.nextKeyValue(HBaseTableInputFormat.java:162) at org.apache.pig.backend.hadoop.hbase.HBaseStorage.getNext(HBaseStorage.java:319) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 5364262096576298375 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1794) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:83) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:38) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1012) at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1119) ... 11 more Other taks attempts usually also fail: Task attempt_201107151702_48728_m_000000_3 failed to report status for 600 seconds. Killing! ... Now, this *only happens for one mapper* --> mapper_0 No matter how I change my scanning parameters - different begin / end timestamp; more data vs less data; change caching ,etc.. - , it is always ends up like this: mapper 0 fails, even when 500 hundreds other succeed. I really don't have a clue of why this is happening. The most intriguing is that this happens always for mapper 0, no matter on what machine of the cluster it runs. Does anyone have a clue about this? Thanks! Luis