[ https://issues.apache.org/jira/browse/HIVE-12091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rui Li updated HIVE-12091: -------------------------- Summary: Merge file doesn't work for ORC table when running on Spark. [Spark Branch] (was: HiveException (Failed to close AbstractFileMergeOperator) occurs during loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark Branch]) > Merge file doesn't work for ORC table when running on Spark. [Spark Branch] > --------------------------------------------------------------------------- > > Key: HIVE-12091 > URL: https://issues.apache.org/jira/browse/HIVE-12091 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xin Hao > Assignee: Rui Li > Attachments: HIVE-12091.1-spark.patch, HIVE-12091.2-spark.patch > > > This issue occurs when hive.merge.sparkfiles is set to true. And can be > workaround by setting hive.merge.sparkfiles to false. > BTW, we did a local experiment to run the case with MR engine (set > hive.merge.mapfiles=true; set hive.merge.mapredfiles=true;), it can pass. > (1)Component Version: > -- Hive Spark Branch 70eeadd2f019dcb2e301690290c8807731eab7a1 + Hive-11473 > patch (HIVE-11473.3-spark.patch) ---> This is to support Spark 1.5 for Hive > on Spark > -- Spark 1.5.1 > (2)Case used: > -- Big-Bench Data Load (load data from HDFS to Hive warehouse, scored as ORC > format). The related HiveQL: > {noformat} > DROP TABLE IF EXISTS customer_temporary; > CREATE EXTERNAL TABLE customer_temporary > ( c_customer_sk bigint --not null > , c_customer_id string --not null > , c_current_cdemo_sk bigint > , c_current_hdemo_sk bigint > , c_current_addr_sk bigint > , c_first_shipto_date_sk bigint > , c_first_sales_date_sk bigint > , c_salutation string > , c_first_name string > , c_last_name string > , c_preferred_cust_flag string > , c_birth_day int > , c_birth_month int > , c_birth_year int > , c_birth_country string > , c_login string > , c_email_address string > , c_last_review_date string > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > STORED AS TEXTFILE LOCATION > '/user/root/benchmarks/bigbench_n1t/data/customer' > ; > DROP TABLE IF EXISTS customer; > CREATE TABLE customer > STORED AS ORC > AS > SELECT * FROM customer_temporary > ; > {noformat} > (3)Error/Exception Message: > {noformat} > 15/10/12 14:28:38 INFO exec.Utilities: PLAN PATH = > hdfs://bhx2:8020/tmp/hive/root/4e145415-d4ea-4751-9e16-ff31edb0c258/hive_2015-10-12_14-28-12_485_2093357701513622173-1/-mr-10005/d891fdec-eacc-4f66-8827-e2b650c24810/map.xml > 15/10/12 14:28:38 INFO OrcFileMergeOperator: ORC merge file input path: > hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/000001_0 > 15/10/12 14:28:38 INFO OrcFileMergeOperator: Merged stripe from file > hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/000001_0 > [ offset : 3 length: 10525754 row: 247500 ] > 15/10/12 14:28:38 INFO spark.SparkMergeFileRecordHandler: Closing Merge > Operator OFM > 15/10/12 14:28:38 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 > (TID 4) > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Failed to close AbstractFileMergeOperator > at > org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:115) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118) > at > org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984) > at > org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close > AbstractFileMergeOperator > at > org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:235) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:236) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:113) > ... 15 more > Caused by: java.io.IOException: Unable to rename > hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/_task_tmp.-ext-10001/_tmp.000001_0 > to > hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/_tmp.-ext-10001/000001_0 > at > org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:208) > ... 17 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)