Hello I have a small query and need little help on the same. I have a hive table which loads its data from files partitioned by timestamp (every 15 minutes) and placed there in gzipped format. There may be some gzip files which are corrupted (while transferring files, network error etc. may have resulted a corrupted file).
Now, when I run any job on this table trying to dump data to another table, my hive job starts with the following error: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver *Is there a way to catch this error so that I can ignore corrupted files and still get the job completed?* *Hadoop log* shows that error is related while uncompressing my gzipped file if I am right. * 2010-10-15 10:38:48,027 INFO org.apache.hadoop.mapred.TaskInProgress (IPC Server handler 2 on 9001): Error from attempt_201010150837_0002_m_000041_0: java.io.EOFException: Unexpected end of input stream at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:98) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:86) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.mapred.LineRecordReader$LineReader.backfill(LineRecordReader.java:94) at org.apache.hadoop.mapred.LineRecordReader$LineReader.readLine(LineRecordReader.java:124) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:266) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:39) at org.apache.hadoop.hive.ql.io.HiveRecordReader.next(HiveRecordReader.java:58) at org.apache.hadoop.hive.ql.io.HiveRecordReader.next(HiveRecordReader.java:27) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:167) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:231) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216) *