Hi, thanks for reply, Alan. Here's one more test. Wed Jun 08 16:36:02 MSK 2016 Start thread 1 Wed Jun 08 16:36:05 MSK 2016 Start thread 2 Wed Jun 08 16:36:08 MSK 2016 Start thread 3 Wed Jun 08 16:36:11 MSK 2016 Start thread 4 Wed Jun 08 16:36:17 MSK 2016 Finish thread 1 Wed Jun 08 16:36:17 MSK 2016 Thread 1 result: '344186' Wed Jun 08 16:36:17 MSK 2016 Thread 1 completed in 14443 ms
Wed Jun 08 16:36:19 MSK 2016 Finished 2 Wed Jun 08 16:36:19 MSK 2016 Thread 2 completed in 13967 ms Wed Jun 08 16:36:20 MSK 2016 Finish thread 3 Wed Jun 08 16:36:20 MSK 2016 Thread 3 result: '344186' Wed Jun 08 16:36:20 MSK 2016 Thread 3 completed in 11737 ms java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1461923723503_0931_1_00, diagnostics=[Vertex vertex_1461923723503_0931_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections initializer failed, vertex=vertex_1461923723503_0931_1_00 [Map 1], java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059) HiveMetaStore.log <https://drive.google.com/open?id=0ByB92PAoAkrKVU1rT25OcjJuOEk> HiveServer2.log <https://drive.google.com/open?id=0ByB92PAoAkrKakhqYXhmdzlidkk> I didn't find anything intresing in metastore log, but HiveServer2 log contains this: Line 1023: 2016-06-08 16:36:04,456 INFO [HiveServer2-Background-Pool: Thread-42]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) - Requesting: queryId=hive_20160608163602_542056d9-c524-4df4-af18-6aa5e906284f LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE, dbname:default, tablename:mobile_connections), LockComponent(type:SHARED_READ, level:PARTITION, dbname:default, tablename:mobile_connections, partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs, hostname:mercury) Line 1043: 2016-06-08 16:36:04,546 INFO [HiveServer2-Background-Pool: Thread-42]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response to queryId=hive_20160608163602_542056d9-c524-4df4-af18-6aa5e906284f LockResponse(lockid:179728, state:ACQUIRED) Line 1349: 2016-06-08 16:36:05,214 INFO [HiveServer2-Background-Pool: Thread-50]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) - Requesting: queryId=hive_20160608163604_832abbff-6199-497e-b969-fd8ac1465abc LockRequest(component:[LockComponent(type:EXCLUSIVE, level:PARTITION, dbname:default, tablename:mobile_connections, partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs, hostname:mercury) Line 1390: 2016-06-08 16:36:05,270 INFO [HiveServer2-Background-Pool: Thread-50]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response to queryId=hive_20160608163604_832abbff-6199-497e-b969-fd8ac1465abc LockResponse(lockid:179729, state:WAITING) Line 2346: 2016-06-08 16:36:08,028 INFO [HiveServer2-Background-Pool: Thread-68]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) - Requesting: queryId=hive_20160608163607_7b18da12-6f86-41c9-b4b1-be45252c18c2 LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE, dbname:default, tablename:mobile_connections), LockComponent(type:SHARED_READ, level:PARTITION, dbname:default, tablename:mobile_connections, partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs, hostname:mercury) Line 2370: 2016-06-08 16:36:08,069 INFO [HiveServer2-Background-Pool: Thread-68]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response to queryId=hive_20160608163607_7b18da12-6f86-41c9-b4b1-be45252c18c2 LockResponse(lockid:179730, state:ACQUIRED) Line 3561: 2016-06-08 16:36:11,000 INFO [HiveServer2-Background-Pool: Thread-91]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) - Requesting: queryId=hive_20160608163610_b78a201b-ae6d-4040-9115-f92118d5b629 LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE, dbname:default, tablename:mobile_connections), LockComponent(type:SHARED_READ, level:PARTITION, dbname:default, tablename:mobile_connections, partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs, hostname:mercury) Line 3587: 2016-06-08 16:36:11,060 INFO [HiveServer2-Background-Pool: Thread-91]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response to queryId=hive_20160608163610_b78a201b-ae6d-4040-9115-f92118d5b629 LockResponse(lockid:179731, state:ACQUIRED) Line 4468: 2016-06-08 16:36:16,481 DEBUG [HiveServer2-Background-Pool: Thread-42]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) - Unlocking lockid:179728 Line 4473: 2016-06-08 16:36:16,522 DEBUG [HiveServer2-Background-Pool: Thread-42]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) - Removed a lock true Line 4880: 2016-06-08 16:36:18,676 DEBUG [HiveServer2-Background-Pool: Thread-50]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) - Unlocking lockid:179729 Line 4889: 2016-06-08 16:36:18,829 DEBUG [HiveServer2-Background-Pool: Thread-50]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) - Removed a lock true Line 4976: 2016-06-08 16:36:19,439 DEBUG [HiveServer2-Background-Pool: Thread-68]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) - Unlocking lockid:179730 Line 4979: 2016-06-08 16:36:19,486 DEBUG [HiveServer2-Background-Pool: Thread-68]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) - Removed a lock true Line 5467: 2016-06-08 16:36:28,090 DEBUG [HiveServer2-Background-Pool: Thread-91]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) - Unlocking lockid:179731 Line 5470: 2016-06-08 16:36:28,131 DEBUG [HiveServer2-Background-Pool: Thread-91]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) - Removed a lock true On Tue, Jun 7, 2016 at 9:01 PM, Alan Gates <alanfga...@gmail.com> wrote: > Eugene Koifman pointed out to me that there is one other possibility. If > we ignore query 3 for a minute, I can explain this with the following > timeline: > > 1) Query one requests and gets a read lock, starts its select. > 2) Query two (alter table…) requests an exclusive lock, and then blocks > until query one has completed. > 3) Query four requests a read lock, and then blocks until query two has > completed. Once it can run it finds that part of its inputs have gone away > and thus fails with an error. > > Hive’s metadata is not part of the transactional system. So we don’t keep > the old dropped partition around after query two drops it. So the error > from query four is actually the correct thing to do. If four ignored the > fact that the partition had vanished and returned results minus that > partition it would not be returning correct results as of the time its > transaction started. Because the underlying data has been dropped the > query is now unanswerable an error is the best possible answer. > > This situation cannot be fixed until Hive DDL operations take part in > transactions. I don’t know when that will happen. > > However, this explanation is incomplete because it doesn’t account for > query three. That query should also have blocked. The code is not > supposed to allow “jumping” exclusive locks. That is, query three > shouldn’t be allowed to acquire the read lock along with query one because > query two’s exclusive lock is in between. It’s possible there’s a bug in > that code. It’s also possible that the above explanation is not what’s > happening. > > Would it be possible for you to turn on debug logging on your thrift > metastore process and rerun this test and post the logs somewhere? Apache > lists strip attachments so you won’t be able to attach them here, you’ll > have to put them up on paste bin or something. > > Alan. > > > On Jun 7, 2016, at 04:00, Igor Kuzmenko <f1she...@gmail.com> wrote: > > > > • hive.support.concurrency – true > > • hive.enforce.bucketing – true (Not required as of Hive 2.0) > > • hive.exec.dynamic.partition.mode – nonstrict > > • hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager > > • hive.compactor.initiator.on – true (for exactly one instance of > the Thrift metastore service) > > • hive.compactor.worker.threads – a positive number on at least > one instance of the Thrift metastore service > > I've double check all settings and restart hive. The problem remained. > > I've complete the same test with little modification in console output. > > asyncExecute("Select count(distinct in_info_msisdn) from > mobile_connections where dt=20151124 and msisdn_last_digit=1", 1); > > Thread.sleep(3000); > > asyncExecute("alter table mobile_connections drop if exists partition > (dt=20151124, msisdn_last_digit=1) purge", 2); > > Thread.sleep(3000); > > asyncExecute("Select count(distinct in_info_msisdn) from > mobile_connections where dt=20151124 and msisdn_last_digit=1", 3); > > Thread.sleep(3000); > > asyncExecute("Select count(distinct in_info_msisdn) from > mobile_connections where dt=20151124 and msisdn_last_digit=1", 4); > > > > Like in previous test four queries executing in parallel. First, third > and fourth are simple selects. Second is drop partition command. > > Here's out put: > > > > Tue Jun 07 13:45:18 MSK 2016 Start thread 1 > > Tue Jun 07 13:45:21 MSK 2016 Start thread 2 > > Tue Jun 07 13:45:24 MSK 2016 Start thread 3 > > Tue Jun 07 13:45:27 MSK 2016 Start thread 4 > > > > Tue Jun 07 13:45:32 MSK 2016 Finish thread 1 > > Tue Jun 07 13:45:32 MSK 2016 Thread 1 result: '210802' > > Tue Jun 07 13:45:32 MSK 2016 Thread 1 completed in 13810 ms > > > > Tue Jun 07 13:45:35 MSK 2016 Finished thread 2 > > Tue Jun 07 13:45:35 MSK 2016 Thread 2 completed in 13568 ms > > > > Tue Jun 07 13:45:39 MSK 2016 Finish tread 3 > > Tue Jun 07 13:45:39 MSK 2016 Thread 3 result: '210802' > > Tue Jun 07 13:45:39 MSK 2016 Thread 3 completed in 14970 ms > > > > java.sql.SQLException: Error while processing statement: FAILED: > Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map > 1, vertexId=vertex_1461923723503_0256_1_00, diagnostics=[Vertex > vertex_1461923723503_0256_1_00 [Map 1] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections initializer > failed, vertex=vertex_1461923723503_0256_1_00 [Map 1], > java.lang.RuntimeException: serious problem > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059) > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086) > > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305) > > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407) > > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) > > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255) > > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248) > > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.util.concurrent.ExecutionException: > java.io.FileNotFoundException: File > hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2 > does not exist. > > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036) > > ... 15 more > > > > So second thread definitely waits until first thread completes and than > make a partition drop. Than, somehow, after partition was droped, third > query completes and shows result. Fourth query doesn't complete at all, > throwing exception. > > > > > > > > > > > > > > On Mon, Jun 6, 2016 at 8:30 PM, Alan Gates <alanfga...@gmail.com> wrote: > > Do you have the system configured to use the DbTxnManager? See > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > for details on how to set this up. The transaction manager is what manages > locking and makes sure that your queries don’t stomp each other. > > > > Alan. > > > > > On Jun 6, 2016, at 06:01, Igor Kuzmenko <f1she...@gmail.com> wrote: > > > > > > Hello, I'm trying to find a safe way to delete partition with all data > it includes. > > > > > > I'm using Hive 1.2.1, Hive JDBC driver 1.2.1 and perform simple test > on transactional table: > > > > > > asyncExecute("Select count(distinct in_info_msisdn) from > mobile_connections where dt=20151124 and msisdn_last_digit=2", 1); > > > Thread.sleep(3000); > > > asyncExecute("alter table mobile_connections drop if exists partition > (dt=20151124, msisdn_last_digit=2) purge", 2); > > > Thread.sleep(3000); > > > asyncExecute("Select count(distinct in_info_msisdn) from > mobile_connections where dt=20151124 and msisdn_last_digit=2", 3); > > > Thread.sleep(3000); > > > asyncExecute("Select count(distinct in_info_msisdn) from > mobile_connections where dt=20151124 and msisdn_last_digit=2", 4); > > > (full code here) > > > > > > I cretate several threads, each execute query async. First is querying > partition. Second drop partition. Others are the same as first. First query > takes about 10-15 seconds to complete, so "alter table" query starts before > first query completes. > > > As a result i get: > > > • First query - successfully completes > > > • Second query - successfully completes > > > • Third query - successfully completes > > > • Fourth query - throw exception: > > > java.sql.SQLException: Error while processing statement: FAILED: > Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map > 1, vertexId=vertex_1461923723503_0189_1_00, diagnostics=[Vertex > vertex_1461923723503_0189_1_00 [Map 1] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections initializer > failed, vertex=vertex_1461923723503_0189_1_00 [Map 1], > java.lang.RuntimeException: serious problem > > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059) > > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086) > > > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305) > > > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407) > > > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) > > > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255) > > > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248) > > > at java.security.AccessController.doPrivileged(Native Method) > > > at javax.security.auth.Subject.doAs(Subject.java:422) > > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > > > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248) > > > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > > at java.lang.Thread.run(Thread.java:745) > > > Caused by: java.util.concurrent.ExecutionException: > java.io.FileNotFoundException: File > hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2 > does not exist. > > > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > > > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036) > > > ... 15 more > > > Caused by: java.io.FileNotFoundException: File > hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2 > does not exist. > > > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:958) > > > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:937) > > > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:882) > > > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:878) > > > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > > > at > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:878) > > > at > org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694) > > > at > org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:690) > > > at > org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:366) > > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:648) > > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634) > > > ... 4 more > > > ]Vertex killed, vertexName=Reducer 3, > vertexId=vertex_1461923723503_0189_1_02, diagnostics=[Vertex received Kill > in INITED state., Vertex vertex_1461923723503_0189_1_02 [Reducer 3] > killed/failed due to:OTHER_VERTEX_FAILURE]Vertex killed, vertexName=Reducer > 2, vertexId=vertex_1461923723503_0189_1_01, diagnostics=[Vertex received > Kill in INITED state., Vertex vertex_1461923723503_0189_1_01 [Reducer 2] > killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:2 > > > at > org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) > > > at Test$MyRunnable.run(Test.java:54) > > > at java.lang.Thread.run(Thread.java:745) > > > > > > Since I'm using transactional table, I expect, that all queries, > executed after partition drop, will complete successfully with no result. > Am I doing something wrong? Is there other way to drop partition with data? > > > > > >