I've opened jira issue <https://issues.apache.org/jira/browse/HIVE-13978>

On Wed, Jun 8, 2016 at 9:01 PM, Eugene Koifman <ekoif...@hortonworks.com>
wrote:

> This looks like proof of a bug.  The reads locks 179730 and 179731 should
> have been blocked by 179729.
> As Alan said this won’t prevent the exception you are getting but it needs
> to be fixed to prevent a partition from disappearing while query 3 and 4
> are in progress.
>
> Could you file a Jira please?
>
> thanks,
> Eugene
>
> From: Igor Kuzmenko <f1she...@gmail.com>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
> Date: Wednesday, June 8, 2016 at 7:30 AM
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: Re: Delete hive partition while executing query.
>
> Hi, thanks for reply, Alan.
> Here's one more test.
>
> Wed Jun 08 16:36:02 MSK 2016 Start thread 1
> Wed Jun 08 16:36:05 MSK 2016 Start thread 2
> Wed Jun 08 16:36:08 MSK 2016 Start thread 3
> Wed Jun 08 16:36:11 MSK 2016 Start thread 4
> Wed Jun 08 16:36:17 MSK 2016 Finish thread 1
> Wed Jun 08 16:36:17 MSK 2016 Thread 1 result: '344186'
> Wed Jun 08 16:36:17 MSK 2016 Thread 1 completed in 14443 ms
>
> Wed Jun 08 16:36:19 MSK 2016 Finished 2
> Wed Jun 08 16:36:19 MSK 2016 Thread 2 completed in 13967 ms
>
> Wed Jun 08 16:36:20 MSK 2016 Finish thread 3
> Wed Jun 08 16:36:20 MSK 2016 Thread 3 result: '344186'
> Wed Jun 08 16:36:20 MSK 2016 Thread 3 completed in 11737 ms
>
> java.sql.SQLException: Error while processing statement: FAILED: Execution
> Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
> Vertex failed, vertexName=Map 1, vertexId=vertex_1461923723503_0931_1_00,
> diagnostics=[Vertex vertex_1461923723503_0931_1_00 [Map 1] killed/failed
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections
> initializer failed, vertex=vertex_1461923723503_0931_1_00 [Map 1],
> java.lang.RuntimeException: serious problem
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
>
>
>
> HiveMetaStore.log
> <https://drive.google.com/open?id=0ByB92PAoAkrKVU1rT25OcjJuOEk>
> HiveServer2.log
> <https://drive.google.com/open?id=0ByB92PAoAkrKakhqYXhmdzlidkk>
>
> I didn't find anything intresing in metastore log, but HiveServer2 log
> contains this:
>
> Line 1023: 2016-06-08 16:36:04,456 INFO  [HiveServer2-Background-Pool:
> Thread-42]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
> Requesting:
> queryId=hive_20160608163602_542056d9-c524-4df4-af18-6aa5e906284f
> LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
> dbname:default, tablename:mobile_connections),
> LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
> tablename:mobile_connections,
> partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
> hostname:mercury)
> Line 1043: 2016-06-08 16:36:04,546 INFO  [HiveServer2-Background-Pool:
> Thread-42]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
> to queryId=hive_20160608163602_542056d9-c524-4df4-af18-6aa5e906284f
> LockResponse(lockid:179728, state:ACQUIRED)
> Line 1349: 2016-06-08 16:36:05,214 INFO  [HiveServer2-Background-Pool:
> Thread-50]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
> Requesting:
> queryId=hive_20160608163604_832abbff-6199-497e-b969-fd8ac1465abc
> LockRequest(component:[LockComponent(type:EXCLUSIVE, level:PARTITION,
> dbname:default, tablename:mobile_connections,
> partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
> hostname:mercury)
> Line 1390: 2016-06-08 16:36:05,270 INFO  [HiveServer2-Background-Pool:
> Thread-50]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
> to queryId=hive_20160608163604_832abbff-6199-497e-b969-fd8ac1465abc
> LockResponse(lockid:179729, state:WAITING)
> Line 2346: 2016-06-08 16:36:08,028 INFO  [HiveServer2-Background-Pool:
> Thread-68]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
> Requesting:
> queryId=hive_20160608163607_7b18da12-6f86-41c9-b4b1-be45252c18c2
> LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
> dbname:default, tablename:mobile_connections),
> LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
> tablename:mobile_connections,
> partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
> hostname:mercury)
> Line 2370: 2016-06-08 16:36:08,069 INFO  [HiveServer2-Background-Pool:
> Thread-68]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
> to queryId=hive_20160608163607_7b18da12-6f86-41c9-b4b1-be45252c18c2
> LockResponse(lockid:179730, state:ACQUIRED)
> Line 3561: 2016-06-08 16:36:11,000 INFO  [HiveServer2-Background-Pool:
> Thread-91]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
> Requesting:
> queryId=hive_20160608163610_b78a201b-ae6d-4040-9115-f92118d5b629
> LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
> dbname:default, tablename:mobile_connections),
> LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
> tablename:mobile_connections,
> partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
> hostname:mercury)
> Line 3587: 2016-06-08 16:36:11,060 INFO  [HiveServer2-Background-Pool:
> Thread-91]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
> to queryId=hive_20160608163610_b78a201b-ae6d-4040-9115-f92118d5b629
> LockResponse(lockid:179731, state:ACQUIRED)
> Line 4468: 2016-06-08 16:36:16,481 DEBUG [HiveServer2-Background-Pool:
> Thread-42]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) -
> Unlocking lockid:179728
> Line 4473: 2016-06-08 16:36:16,522 DEBUG [HiveServer2-Background-Pool:
> Thread-42]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) -
> Removed a lock true
> Line 4880: 2016-06-08 16:36:18,676 DEBUG [HiveServer2-Background-Pool:
> Thread-50]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) -
> Unlocking lockid:179729
> Line 4889: 2016-06-08 16:36:18,829 DEBUG [HiveServer2-Background-Pool:
> Thread-50]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) -
> Removed a lock true
> Line 4976: 2016-06-08 16:36:19,439 DEBUG [HiveServer2-Background-Pool:
> Thread-68]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) -
> Unlocking lockid:179730
> Line 4979: 2016-06-08 16:36:19,486 DEBUG [HiveServer2-Background-Pool:
> Thread-68]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) -
> Removed a lock true
> Line 5467: 2016-06-08 16:36:28,090 DEBUG [HiveServer2-Background-Pool:
> Thread-91]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) -
> Unlocking lockid:179731
> Line 5470: 2016-06-08 16:36:28,131 DEBUG [HiveServer2-Background-Pool:
> Thread-91]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) -
> Removed a lock true
>
>
> On Tue, Jun 7, 2016 at 9:01 PM, Alan Gates <alanfga...@gmail.com> wrote:
>
>> Eugene Koifman pointed out to me that there is one other possibility.  If
>> we ignore query 3 for a minute, I can explain this with the following
>> timeline:
>>
>> 1) Query one requests and gets a read lock, starts its select.
>> 2) Query two (alter table…) requests an exclusive lock, and then blocks
>> until query one has completed.
>> 3) Query four requests a read lock, and then blocks until query two has
>> completed.  Once it can run it finds that part of its inputs have gone away
>> and thus fails with an error.
>>
>> Hive’s metadata is not part of the transactional system.  So we don’t
>> keep the old dropped partition around after query two drops it.  So the
>> error from query four is actually the correct thing to do.  If four ignored
>> the fact that the partition had vanished and returned results minus that
>> partition it would not be returning correct results as of the time its
>> transaction started.  Because the underlying data has been dropped the
>> query is now unanswerable an error is the best possible answer.
>>
>> This situation cannot be fixed until Hive DDL operations take part in
>> transactions.  I don’t know when that will happen.
>>
>> However, this explanation is incomplete because it doesn’t account for
>> query three.  That query should also have blocked.  The code is not
>> supposed to allow “jumping” exclusive locks.  That is, query three
>> shouldn’t be allowed to acquire the read lock along with query one because
>> query two’s exclusive lock is in between.  It’s possible there’s a bug in
>> that code.  It’s also possible that the above explanation is not what’s
>> happening.
>>
>> Would it be possible for you to turn on debug logging on your thrift
>> metastore process and rerun this test and post the logs somewhere?  Apache
>> lists strip attachments so you won’t be able to attach them here, you’ll
>> have to put them up on paste bin or something.
>>
>> Alan.
>>
>> > On Jun 7, 2016, at 04:00, Igor Kuzmenko <f1she...@gmail.com> wrote:
>> >
>> >       • hive.support.concurrency – true
>> >       • hive.enforce.bucketing – true (Not required as of Hive 2.0)
>> >       • hive.exec.dynamic.partition.mode – nonstrict
>> >       • hive.txn.manager –
>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
>> >       • hive.compactor.initiator.on – true (for exactly one instance of
>> the Thrift metastore service)
>> >       • hive.compactor.worker.threads – a positive number on at least
>> one instance of the Thrift metastore service
>> > I've double check all settings and restart hive. The problem remained.
>> > I've complete the same test with little modification in console output.
>> > asyncExecute("Select count(distinct in_info_msisdn) from
>> mobile_connections where dt=20151124 and msisdn_last_digit=1", 1);
>> > Thread.sleep(3000);
>> > asyncExecute("alter table mobile_connections drop if exists partition
>> (dt=20151124, msisdn_last_digit=1) purge", 2);
>> > Thread.sleep(3000);
>> > asyncExecute("Select count(distinct in_info_msisdn) from
>> mobile_connections where dt=20151124 and msisdn_last_digit=1", 3);
>> > Thread.sleep(3000);
>> > asyncExecute("Select count(distinct in_info_msisdn) from
>> mobile_connections where dt=20151124 and msisdn_last_digit=1", 4);
>> >
>> > Like in previous test four queries executing in parallel. First, third
>> and fourth are simple selects. Second is drop partition command.
>> > Here's out put:
>> >
>> > Tue Jun 07 13:45:18 MSK 2016 Start thread 1
>> > Tue Jun 07 13:45:21 MSK 2016 Start thread 2
>> > Tue Jun 07 13:45:24 MSK 2016 Start thread 3
>> > Tue Jun 07 13:45:27 MSK 2016 Start thread 4
>> >
>> > Tue Jun 07 13:45:32 MSK 2016 Finish thread 1
>> > Tue Jun 07 13:45:32 MSK 2016 Thread 1 result: '210802'
>> > Tue Jun 07 13:45:32 MSK 2016 Thread 1 completed in 13810 ms
>> >
>> > Tue Jun 07 13:45:35 MSK 2016 Finished thread 2
>> > Tue Jun 07 13:45:35 MSK 2016 Thread 2 completed in 13568 ms
>> >
>> > Tue Jun 07 13:45:39 MSK 2016 Finish tread 3
>> > Tue Jun 07 13:45:39 MSK 2016 Thread 3 result: '210802'
>> > Tue Jun 07 13:45:39 MSK 2016 Thread 3 completed in 14970 ms
>> >
>> > java.sql.SQLException: Error while processing statement: FAILED:
>> Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map
>> 1, vertexId=vertex_1461923723503_0256_1_00, diagnostics=[Vertex
>> vertex_1461923723503_0256_1_00 [Map 1] killed/failed due
>> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections initializer
>> failed, vertex=vertex_1461923723503_0256_1_00 [Map 1],
>> java.lang.RuntimeException: serious problem
>> >       at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
>> >       at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
>> >       at
>> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
>> >       at
>> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
>> >       at
>> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
>> >       at
>> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
>> >       at
>> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
>> >       at java.security.AccessController.doPrivileged(Native Method)
>> >       at javax.security.auth.Subject.doAs(Subject.java:422)
>> >       at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> >       at
>> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
>> >       at
>> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
>> >       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> >       at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> >       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> >       at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.util.concurrent.ExecutionException:
>> java.io.FileNotFoundException: File
>> hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
>> does not exist.
>> >       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> >       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>> >       at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036)
>> >       ... 15 more
>> >
>> > So second thread definitely waits until first thread completes and than
>> make a partition drop. Than, somehow, after partition was droped, third
>> query completes and shows result. Fourth query doesn't complete at all,
>> throwing exception.
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Jun 6, 2016 at 8:30 PM, Alan Gates <alanfga...@gmail.com>
>> wrote:
>> > Do you have the system configured to use the DbTxnManager?  See
>> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>> for details on how to set this up.  The transaction manager is what manages
>> locking and makes sure that your queries don’t stomp each other.
>> >
>> > Alan.
>> >
>> > > On Jun 6, 2016, at 06:01, Igor Kuzmenko <f1she...@gmail.com> wrote:
>> > >
>> > > Hello, I'm trying to find a safe way to delete partition with all
>> data it includes.
>> > >
>> > > I'm using Hive 1.2.1, Hive JDBC driver 1.2.1 and perform simple test
>> on transactional table:
>> > >
>> > > asyncExecute("Select count(distinct in_info_msisdn) from
>> mobile_connections where dt=20151124 and msisdn_last_digit=2", 1);
>> > > Thread.sleep(3000);
>> > > asyncExecute("alter table mobile_connections drop if exists partition
>> (dt=20151124, msisdn_last_digit=2) purge", 2);
>> > > Thread.sleep(3000);
>> > > asyncExecute("Select count(distinct in_info_msisdn) from
>> mobile_connections where dt=20151124 and msisdn_last_digit=2", 3);
>> > > Thread.sleep(3000);
>> > > asyncExecute("Select count(distinct in_info_msisdn) from
>> mobile_connections where dt=20151124 and msisdn_last_digit=2", 4);
>> > > (full code here)
>> > >
>> > > I cretate several threads, each execute query async. First is
>> querying partition. Second drop partition. Others are the same as first.
>> First query takes about 10-15 seconds to complete, so "alter table" query
>> starts before first query completes.
>> > > As a result i get:
>> > >       • First query - successfully completes
>> > >       • Second query - successfully completes
>> > >       • Third query - successfully completes
>> > >       • Fourth query - throw exception:
>> > > java.sql.SQLException: Error while processing statement: FAILED:
>> Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map
>> 1, vertexId=vertex_1461923723503_0189_1_00, diagnostics=[Vertex
>> vertex_1461923723503_0189_1_00 [Map 1] killed/failed due
>> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections initializer
>> failed, vertex=vertex_1461923723503_0189_1_00 [Map 1],
>> java.lang.RuntimeException: serious problem
>> > >       at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
>> > >       at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
>> > >       at
>> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
>> > >       at
>> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
>> > >       at
>> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
>> > >       at
>> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
>> > >       at
>> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
>> > >       at java.security.AccessController.doPrivileged(Native Method)
>> > >       at javax.security.auth.Subject.doAs(Subject.java:422)
>> > >       at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> > >       at
>> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
>> > >       at
>> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
>> > >       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > >       at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > >       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > >       at java.lang.Thread.run(Thread.java:745)
>> > > Caused by: java.util.concurrent.ExecutionException:
>> java.io.FileNotFoundException: File
>> hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
>> does not exist.
>> > >       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> > >       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>> > >       at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036)
>> > >       ... 15 more
>> > > Caused by: java.io.FileNotFoundException: File
>> hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
>> does not exist.
>> > >       at
>> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:958)
>> > >       at
>> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:937)
>> > >       at
>> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:882)
>> > >       at
>> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:878)
>> > >       at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>> > >       at
>> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:878)
>> > >       at
>> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694)
>> > >       at
>> org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:690)
>> > >       at
>> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:366)
>> > >       at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:648)
>> > >       at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
>> > >       ... 4 more
>> > > ]Vertex killed, vertexName=Reducer 3,
>> vertexId=vertex_1461923723503_0189_1_02, diagnostics=[Vertex received Kill
>> in INITED state., Vertex vertex_1461923723503_0189_1_02 [Reducer 3]
>> killed/failed due to:OTHER_VERTEX_FAILURE]Vertex killed, vertexName=Reducer
>> 2, vertexId=vertex_1461923723503_0189_1_01, diagnostics=[Vertex received
>> Kill in INITED state., Vertex vertex_1461923723503_0189_1_01 [Reducer 2]
>> killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
>> VERTEX_FAILURE. failedVertices:1 killedVertices:2
>> > >       at
>> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
>> > >       at Test$MyRunnable.run(Test.java:54)
>> > >       at java.lang.Thread.run(Thread.java:745)
>> > >
>> > > Since I'm using transactional table, I expect, that all queries,
>> executed after partition drop, will complete successfully with no result.
>> Am I doing something wrong? Is there other way to drop partition with data?
>> >
>> >
>>
>>
>

Reply via email to