[ https://issues.apache.org/jira/browse/HIVE-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates resolved HIVE-15189. ------------------------------- Resolution: Not A Bug > No base file for ACID table > --------------------------- > > Key: HIVE-15189 > URL: https://issues.apache.org/jira/browse/HIVE-15189 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 1.2.1 > Environment: HDP 2.4, HDP2.5 > Reporter: Benjamin BONNET > > Hi, > When one creates a new ACID table and inserts data into it using INSERT INTO, > Hive does not write a 'base' file : it only creates a delta. That may lead to > two issues (at least): > # when you try to read it, you might get a 'serious problem' like that : > {noformat} > java.lang.RuntimeException: serious problem > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.collect(RDD.scala:926) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:635) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:64) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: delta_0000000_0000000 does not start with > base_ > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) > ... 44 more > {noformat} > I do not always get this error, but when I get it, I need to drop/recreate > table. > # Spark-sql does not see data as long as there is no base file. So as long as > compaction has not occurred, no data can be read using Spar-SQL. See > https://issues.apache.org/jira/browse/SPARK-16996 > I know Hive always creates a base file when ou INSERT OVERWRITE, but you > cannot always use OVERWRITE instead of INTO : in my use case, I use a > statement that writes data into partitions that already exist and partitions > that do not exist yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)