Hello, I'm using Hortonworks Data Platform 2.3.4 which includes Apache Hive 1.2.1 and Apache Storm 0.10. I've build Storm topology using Hive Bolt, which eventually using Hive StreamingAPI to stream data into hive table. In Hive I've created transactional table:
1. CREATE EXTERNAL TABLE cdr1 ( 2. ........ 3. ) 4. PARTITIONED BY (dt INT) 5. CLUSTERED BY (telcoId) INTO 5 buckets 6. STORED AS ORC 7. LOCATION '/data/sorm3/cdr/cdr1' 8. TBLPROPERTIES ("transactional"="true") Hive settings: 1. hive.support.concurrency=true 2. hive.enforce.bucketing=true 3. hive.exec.dynamic.partition.mode=nonstrict 4. hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 5. hive.compactor.initiator.on=true 6. hive.compactor.worker.threads=1 When I run my Storm Topology it fails with OutOfMemoryException. The Storm exception doesn't bother me, it was just a test. But after topology fail my Hive table is not consistent. Simple select from table leads into exception: SELECT COUNT(*) FROM cdr1 ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1453891518300_0098_1_00, diagnostics=[Task failed, taskId=task_1453891518300_0098_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.EOFException .... Caused by: java.io.IOException: java.io.EOFException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) ... 19 more Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:370) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:317) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:238) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:460) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249) ... 20 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1453891518300_0098_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE] ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1453891518300_0098_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1453891518300_0098_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE] ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 Compaction fails with same exception: 2016-03-10 13:20:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.EOFException: Cannot seek after EOF at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1488) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:368) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:317) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:238) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:460) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1362) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:565) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:544) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Looking throw files that was created by streaming I've found several zero sized ORC files. Probably these files leads to exception. Is it normal for hive transactional table? How can I prevent such behavior?