Hi,
This problem has blocked me a whole week, anybodies have any ideas? Many thanks. Mh F ________________________________ From: 明浩 冯 <qiuff...@hotmail.com> Sent: Monday, August 15, 2016 2:43:58 PM To: user@hive.apache.org Subject: hive throws ConcurrentModificationException when executing insert overwrite table Hi everyone, When I run the following SQL in beeline, hive just throws a ConcurrentModificationException. Anybody knows what's wrong in my hive? Or give me some ideas to target where the problem is? INSERT OVERWRITE TABLE kylin_intermediate_prom_group_by_ws_name_cur_cube_19700101010000_20100101010000 SELECT TBL_HIS_UWIP_SCAN_PROM.ORDER_NAME FROM TESTMES.TBL_HIS_UWIP_SCAN_PROM as TBL_HIS_UWIP_SCAN_PROM WHERE (TBL_HIS_UWIP_SCAN_PROM.START_TIME >= '1970-01-01 01:00:00' AND TBL_HIS_UWIP_SCAN_PROM.START_TIME < '2010-01-01 01:00:00') DISTRIBUTE BY RAND(); My environment: 12 nodes cluster with Hadoop 2.7.2 Spark 1.6.2 Zookeeper 3.4.6 Hbase 1.2.2 Hive 2.1.0 Kylin 1.5.3 Also list some settings in hive-site.xml I think maybe helpful for you to analyze the problem: hive.support.concurrency=true hive.lock.manager=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager hive.execution.engine=spark hive.server2.transport.mode=http hive.server2.authentication=NONE Actually it's one step of building a Kylin cube. The select query returns about 3,000,000 lines. Here is the log I got from hive.log: 2016-08-12T18:43:07,473 INFO [HiveServer2-Background-Pool: Thread-83]: status.SparkJobMonitor (:()) - 2016-08-12 18:43:07,472 Stage-0_0: 58/58 Finished Stage-1_0: 13/13 Finished 2016-08-12T18:43:07,476 INFO [HiveServer2-Background-Pool: Thread-83]: status.SparkJobMonitor (:()) - Status: Finished successfully in 264.96 seconds 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - =====Spark Job[85a00425-c044-4e22-b54a-f2c12feb4e82] statistics===== 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - Spark Job[85a00425-c044-4e22-b54a-f2c12feb4e82] Metrics 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - ExecutorDeserializeTime: 157772 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - ExecutorRunTime: 4102583 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - ResultSize: 149069 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - JvmGCTime: 234246 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - ResultSerializationTime: 23 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - MemoryBytesSpilled: 0 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - DiskBytesSpilled: 0 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - BytesRead: 6831052047 2016-08-12T18:43:07,488 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - RemoteBlocksFetched: 702 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - LocalBlocksFetched: 52 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - TotalBlocksFetched: 754 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - FetchWaitTime: 12 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - RemoteBytesRead: 2611264054 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - ShuffleBytesWritten: 2804791500 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - ShuffleWriteTime: 56641742751 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - HIVE 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - CREATED_FILES: 13 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - RECORDS_OUT_1_default.kylin_intermediate_prom_group_by_ws_name_cur_cube_19700101010000_20100101010000: 271942413 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - RECORDS_IN: 1076808610 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - RECORDS_OUT_INTERMEDIATE: 271942413 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - DESERIALIZE_ERRORS: 0 2016-08-12T18:43:07,489 INFO [HiveServer2-Background-Pool: Thread-83]: spark.SparkTask (:()) - Execution completed successfully 2016-08-12T18:43:07,521 INFO [HiveServer2-Background-Pool: Thread-83]: exec.FileSinkOperator (:()) - Moving tmp dir: hdfs://bigdata/kylin/kylin_metadata/kylin-38250257-1649-4530-8ccb-975469aa6d22/kylin_intermediate_prom_group_by_ws_name_cur_cube_19700101010000_20100101010000/.hive-staging_hive_2016-08-12_18-37-50_610_2817977227856616745-2/_tmp.-ext-10000 to: hdfs://bigdata/kylin/kylin_metadata/kylin-38250257-1649-4530-8ccb-975469aa6d22/kylin_intermediate_prom_group_by_ws_name_cur_cube_19700101010000_20100101010000/.hive-staging_hive_2016-08-12_18-37-50_610_2817977227856616745-2/-ext-10000 2016-08-12T18:43:07,740 INFO [HiveServer2-Background-Pool: Thread-83]: ql.Driver (:()) - Starting task [Stage-0:MOVE] in serial mode 2016-08-12T18:43:07,741 INFO [HiveServer2-Background-Pool: Thread-83]: hive.metastore (:()) - Closed a connection to metastore, current connections: 1 2016-08-12T18:43:07,742 INFO [HiveServer2-Background-Pool: Thread-83]: exec.Task (:()) - Loading data to table default.kylin_intermediate_prom_group_by_ws_name_cur_cube_19700101010000_20100101010000 from hdfs://bigdata/kylin/kylin_metadata/kylin-38250257-1649-4530-8ccb-975469aa6d22/kylin_intermediate_prom_group_by_ws_name_cur_cube_19700101010000_20100101010000/.hive-staging_hive_2016-08-12_18-37-50_610_2817977227856616745-2/-ext-10000 2016-08-12T18:43:07,743 INFO [HiveServer2-Background-Pool: Thread-83]: hive.metastore (:()) - Trying to connect to metastore with URI thrift://bigdata-master:9083 2016-08-12T18:43:07,744 INFO [HiveServer2-Background-Pool: Thread-83]: hive.metastore (:()) - Opened a connection to metastore, current connections: 2 2016-08-12T18:43:07,769 INFO [HiveServer2-Background-Pool: Thread-83]: hive.metastore (:()) - Connected to metastore. 2016-08-12T18:43:08,110 INFO [Delete-Thread-1]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,110 INFO [Delete-Thread-12]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,110 INFO [Delete-Thread-0]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,110 INFO [Delete-Thread-7]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,110 INFO [Delete-Thread-4]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,110 INFO [Delete-Thread-8]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,111 INFO [Delete-Thread-2]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,111 INFO [Delete-Thread-9]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,112 INFO [Delete-Thread-10]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,112 INFO [Delete-Thread-3]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,112 INFO [Delete-Thread-5]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,113 INFO [Delete-Thread-6]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,113 INFO [Delete-Thread-11]: fs.TrashPolicyDefault (:()) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. 2016-08-12T18:43:08,164 INFO [HiveServer2-Background-Pool: Thread-83]: common.FileUtils (:()) - Creating directory if it doesn't exist: hdfs://bigdata/kylin/kylin_metadata/kylin-38250257-1649-4530-8ccb-975469aa6d22/kylin_intermediate_prom_group_by_ws_name_cur_cube_19700101010000_20100101010000 2016-08-12T18:43:08,177 ERROR [HiveServer2-Background-Pool: Thread-83]: hdfs.KeyProviderCache (:()) - Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 2016-08-12T18:43:08,285 ERROR [HiveServer2-Background-Pool: Thread-83]: exec.Task (:()) - Failed with exception java.util.ConcurrentModificationException org.apache.hadoop.hive.ql.metadata.HiveException: java.util.ConcurrentModificationException at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2942) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3198) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1805) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:355) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) at java.util.ArrayList$Itr.next(ArrayList.java:831) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convertAclEntryProto(PBHelper.java:2325) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setAcl(ClientNamenodeProtocolTranslatorPB.java:1325) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy28.setAcl(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.setAcl(DFSClient.java:3242) at org.apache.hadoop.hdfs.DistributedFileSystem$43.doCall(DistributedFileSystem.java:2052) at org.apache.hadoop.hdfs.DistributedFileSystem$43.doCall(DistributedFileSystem.java:2049) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.setAcl(DistributedFileSystem.java:2049) at org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:126) at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2919) at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2911) ... 4 more 2016-08-12T18:43:08,286 ERROR [HiveServer2-Background-Pool: Thread-83]: ql.Driver (:()) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.util.ConcurrentModificationException 2016-08-12T18:43:08,286 INFO [HiveServer2-Background-Pool: Thread-83]: ql.Driver (:()) - Completed executing command(queryId=hadoop_20160812183750_2f4560e7-7a07-4443-8937-cd0ec03ee887); Time taken: 267.439 seconds 2016-08-12T18:43:08,664 ERROR [HiveServer2-Background-Pool: Thread-83]: operation.Operation (:()) - Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.util.ConcurrentModificationException at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.ConcurrentModificationException at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2942) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3198) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1805) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:355) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) ... 11 more Caused by: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) at java.util.ArrayList$Itr.next(ArrayList.java:831) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convertAclEntryProto(PBHelper.java:2325) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setAcl(ClientNamenodeProtocolTranslatorPB.java:1325) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy28.setAcl(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.setAcl(DFSClient.java:3242) at org.apache.hadoop.hdfs.DistributedFileSystem$43.doCall(DistributedFileSystem.java:2052) at org.apache.hadoop.hdfs.DistributedFileSystem$43.doCall(DistributedFileSystem.java:2049) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.setAcl(DistributedFileSystem.java:2049) at org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:126) at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2919) at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2911) ... 4 more An interesting thing is that if I narrow down the 'where' to make the select query only return about 300,000 line, the insert SQL can be completed successfully. Thanks, Mh F