> This is definitely a curious problem.
>

It's data corruption. The file is tab-separated, so I created a quick Perl
pipe to print out the number of tabs on a given line:

-bash-3.2$ hadoop fs -cat
/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25 | perl -pe
's/[^\t\n]//g' | perl -pe 's/\t/-/g' | sort | uniq -c

The STDOUT was slightly disturbing:

      1 --
1552318 -------

The STDERR moreso:

11/05/04 11:07:49 INFO hdfs.DFSClient: No node available for block:
blk_-1511269407958713809_10494
file=/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25
11/05/04 11:07:49 INFO hdfs.DFSClient: Could not obtain block
blk_-1511269407958713809_10494 from any node: java.io.IOException: No live
nodes contain current block. Will get new block locations from namenode and
retry...
11/05/04 11:07:52 INFO hdfs.DFSClient: No node available for block:
blk_-1511269407958713809_10494
file=/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25
11/05/04 11:07:52 INFO hdfs.DFSClient: Could not obtain block
blk_-1511269407958713809_10494 from any node: java.io.IOException: No live
nodes contain current block. Will get new block locations from namenode and
retry...
11/05/04 11:07:58 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could
not obtain block: blk_-1511269407958713809_10494
file=/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25
    at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
    at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
    at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
    at java.io.DataInputStream.read(DataInputStream.java:83)
(...etc)
cat: Could not obtain block: blk_-1511269407958713809_10494
file=/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25

-- 
Tim Ellis
Riot Games

Reply via email to