Re: Delete hive partition while executing query.

Igor Kuzmenko Wed, 08 Jun 2016 07:32:05 -0700

Hi, thanks for reply, Alan.
Here's one more test.

Wed Jun 08 16:36:02 MSK 2016 Start thread 1
Wed Jun 08 16:36:05 MSK 2016 Start thread 2
Wed Jun 08 16:36:08 MSK 2016 Start thread 3
Wed Jun 08 16:36:11 MSK 2016 Start thread 4
Wed Jun 08 16:36:17 MSK 2016 Finish thread 1
Wed Jun 08 16:36:17 MSK 2016 Thread 1 result: '344186'
Wed Jun 08 16:36:17 MSK 2016 Thread 1 completed in 14443 ms


Wed Jun 08 16:36:19 MSK 2016 Finished 2
Wed Jun 08 16:36:19 MSK 2016 Thread 2 completed in 13967 ms

Wed Jun 08 16:36:20 MSK 2016 Finish thread 3
Wed Jun 08 16:36:20 MSK 2016 Thread 3 result: '344186'
Wed Jun 08 16:36:20 MSK 2016 Thread 3 completed in 11737 ms

java.sql.SQLException: Error while processing statement: FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1461923723503_0931_1_00,
diagnostics=[Vertex vertex_1461923723503_0931_1_00 [Map 1] killed/failed
due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections
initializer failed, vertex=vertex_1461923723503_0931_1_00 [Map 1],
java.lang.RuntimeException: serious problem
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)



HiveMetaStore.log
<https://drive.google.com/open?id=0ByB92PAoAkrKVU1rT25OcjJuOEk>
HiveServer2.log
<https://drive.google.com/open?id=0ByB92PAoAkrKakhqYXhmdzlidkk>

I didn't find anything intresing in metastore log, but HiveServer2 log
contains this:

Line 1023: 2016-06-08 16:36:04,456 INFO  [HiveServer2-Background-Pool:
Thread-42]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
Requesting:
queryId=hive_20160608163602_542056d9-c524-4df4-af18-6aa5e906284f
LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
dbname:default, tablename:mobile_connections),
LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
tablename:mobile_connections,
partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
hostname:mercury)
Line 1043: 2016-06-08 16:36:04,546 INFO  [HiveServer2-Background-Pool:
Thread-42]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
to queryId=hive_20160608163602_542056d9-c524-4df4-af18-6aa5e906284f
LockResponse(lockid:179728, state:ACQUIRED)
Line 1349: 2016-06-08 16:36:05,214 INFO  [HiveServer2-Background-Pool:
Thread-50]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
Requesting:
queryId=hive_20160608163604_832abbff-6199-497e-b969-fd8ac1465abc
LockRequest(component:[LockComponent(type:EXCLUSIVE, level:PARTITION,
dbname:default, tablename:mobile_connections,
partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
hostname:mercury)
Line 1390: 2016-06-08 16:36:05,270 INFO  [HiveServer2-Background-Pool:
Thread-50]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
to queryId=hive_20160608163604_832abbff-6199-497e-b969-fd8ac1465abc
LockResponse(lockid:179729, state:WAITING)
Line 2346: 2016-06-08 16:36:08,028 INFO  [HiveServer2-Background-Pool:
Thread-68]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
Requesting:
queryId=hive_20160608163607_7b18da12-6f86-41c9-b4b1-be45252c18c2
LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
dbname:default, tablename:mobile_connections),
LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
tablename:mobile_connections,
partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
hostname:mercury)
Line 2370: 2016-06-08 16:36:08,069 INFO  [HiveServer2-Background-Pool:
Thread-68]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
to queryId=hive_20160608163607_7b18da12-6f86-41c9-b4b1-be45252c18c2
LockResponse(lockid:179730, state:ACQUIRED)
Line 3561: 2016-06-08 16:36:11,000 INFO  [HiveServer2-Background-Pool:
Thread-91]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
Requesting:
queryId=hive_20160608163610_b78a201b-ae6d-4040-9115-f92118d5b629
LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
dbname:default, tablename:mobile_connections),
LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
tablename:mobile_connections,
partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
hostname:mercury)
Line 3587: 2016-06-08 16:36:11,060 INFO  [HiveServer2-Background-Pool:
Thread-91]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
to queryId=hive_20160608163610_b78a201b-ae6d-4040-9115-f92118d5b629
LockResponse(lockid:179731, state:ACQUIRED)
Line 4468: 2016-06-08 16:36:16,481 DEBUG [HiveServer2-Background-Pool:
Thread-42]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) -
Unlocking lockid:179728
Line 4473: 2016-06-08 16:36:16,522 DEBUG [HiveServer2-Background-Pool:
Thread-42]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) -
Removed a lock true
Line 4880: 2016-06-08 16:36:18,676 DEBUG [HiveServer2-Background-Pool:
Thread-50]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) -
Unlocking lockid:179729
Line 4889: 2016-06-08 16:36:18,829 DEBUG [HiveServer2-Background-Pool:
Thread-50]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) -
Removed a lock true
Line 4976: 2016-06-08 16:36:19,439 DEBUG [HiveServer2-Background-Pool:
Thread-68]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) -
Unlocking lockid:179730
Line 4979: 2016-06-08 16:36:19,486 DEBUG [HiveServer2-Background-Pool:
Thread-68]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) -
Removed a lock true
Line 5467: 2016-06-08 16:36:28,090 DEBUG [HiveServer2-Background-Pool:
Thread-91]: lockmgr.DbLockManager (DbLockManager.java:unlock(182)) -
Unlocking lockid:179731
Line 5470: 2016-06-08 16:36:28,131 DEBUG [HiveServer2-Background-Pool:
Thread-91]: lockmgr.DbLockManager (DbLockManager.java:unlock(185)) -
Removed a lock true


On Tue, Jun 7, 2016 at 9:01 PM, Alan Gates <alanfga...@gmail.com> wrote:

> Eugene Koifman pointed out to me that there is one other possibility.  If
> we ignore query 3 for a minute, I can explain this with the following
> timeline:
>
> 1) Query one requests and gets a read lock, starts its select.
> 2) Query two (alter table…) requests an exclusive lock, and then blocks
> until query one has completed.
> 3) Query four requests a read lock, and then blocks until query two has
> completed.  Once it can run it finds that part of its inputs have gone away
> and thus fails with an error.
>
> Hive’s metadata is not part of the transactional system.  So we don’t keep
> the old dropped partition around after query two drops it.  So the error
> from query four is actually the correct thing to do.  If four ignored the
> fact that the partition had vanished and returned results minus that
> partition it would not be returning correct results as of the time its
> transaction started.  Because the underlying data has been dropped the
> query is now unanswerable an error is the best possible answer.
>
> This situation cannot be fixed until Hive DDL operations take part in
> transactions.  I don’t know when that will happen.
>
> However, this explanation is incomplete because it doesn’t account for
> query three.  That query should also have blocked.  The code is not
> supposed to allow “jumping” exclusive locks.  That is, query three
> shouldn’t be allowed to acquire the read lock along with query one because
> query two’s exclusive lock is in between.  It’s possible there’s a bug in
> that code.  It’s also possible that the above explanation is not what’s
> happening.
>
> Would it be possible for you to turn on debug logging on your thrift
> metastore process and rerun this test and post the logs somewhere?  Apache
> lists strip attachments so you won’t be able to attach them here, you’ll
> have to put them up on paste bin or something.
>
> Alan.
>
> > On Jun 7, 2016, at 04:00, Igor Kuzmenko <f1she...@gmail.com> wrote:
> >
> >       • hive.support.concurrency – true
> >       • hive.enforce.bucketing – true (Not required as of Hive 2.0)
> >       • hive.exec.dynamic.partition.mode – nonstrict
> >       • hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
> >       • hive.compactor.initiator.on – true (for exactly one instance of
> the Thrift metastore service)
> >       • hive.compactor.worker.threads – a positive number on at least
> one instance of the Thrift metastore service
> > I've double check all settings and restart hive. The problem remained.
> > I've complete the same test with little modification in console output.
> > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=1", 1);
> > Thread.sleep(3000);
> > asyncExecute("alter table mobile_connections drop if exists partition
> (dt=20151124, msisdn_last_digit=1) purge", 2);
> > Thread.sleep(3000);
> > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=1", 3);
> > Thread.sleep(3000);
> > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=1", 4);
> >
> > Like in previous test four queries executing in parallel. First, third
> and fourth are simple selects. Second is drop partition command.
> > Here's out put:
> >
> > Tue Jun 07 13:45:18 MSK 2016 Start thread 1
> > Tue Jun 07 13:45:21 MSK 2016 Start thread 2
> > Tue Jun 07 13:45:24 MSK 2016 Start thread 3
> > Tue Jun 07 13:45:27 MSK 2016 Start thread 4
> >
> > Tue Jun 07 13:45:32 MSK 2016 Finish thread 1
> > Tue Jun 07 13:45:32 MSK 2016 Thread 1 result: '210802'
> > Tue Jun 07 13:45:32 MSK 2016 Thread 1 completed in 13810 ms
> >
> > Tue Jun 07 13:45:35 MSK 2016 Finished thread 2
> > Tue Jun 07 13:45:35 MSK 2016 Thread 2 completed in 13568 ms
> >
> > Tue Jun 07 13:45:39 MSK 2016 Finish tread 3
> > Tue Jun 07 13:45:39 MSK 2016 Thread 3 result: '210802'
> > Tue Jun 07 13:45:39 MSK 2016 Thread 3 completed in 14970 ms
> >
> > java.sql.SQLException: Error while processing statement: FAILED:
> Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map
> 1, vertexId=vertex_1461923723503_0256_1_00, diagnostics=[Vertex
> vertex_1461923723503_0256_1_00 [Map 1] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections initializer
> failed, vertex=vertex_1461923723503_0256_1_00 [Map 1],
> java.lang.RuntimeException: serious problem
> >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
> >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
> >       at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
> >       at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
> >       at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
> >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
> >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
> >       at java.security.AccessController.doPrivileged(Native Method)
> >       at javax.security.auth.Subject.doAs(Subject.java:422)
> >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
> >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
> >       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >       at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.util.concurrent.ExecutionException:
> java.io.FileNotFoundException: File
> hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
> does not exist.
> >       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> >       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036)
> >       ... 15 more
> >
> > So second thread definitely waits until first thread completes and than
> make a partition drop. Than, somehow, after partition was droped, third
> query completes and shows result. Fourth query doesn't complete at all,
> throwing exception.
> >
> >
> >
> >
> >
> >
> > On Mon, Jun 6, 2016 at 8:30 PM, Alan Gates <alanfga...@gmail.com> wrote:
> > Do you have the system configured to use the DbTxnManager?  See
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
> for details on how to set this up.  The transaction manager is what manages
> locking and makes sure that your queries don’t stomp each other.
> >
> > Alan.
> >
> > > On Jun 6, 2016, at 06:01, Igor Kuzmenko <f1she...@gmail.com> wrote:
> > >
> > > Hello, I'm trying to find a safe way to delete partition with all data
> it includes.
> > >
> > > I'm using Hive 1.2.1, Hive JDBC driver 1.2.1 and perform simple test
> on transactional table:
> > >
> > > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=2", 1);
> > > Thread.sleep(3000);
> > > asyncExecute("alter table mobile_connections drop if exists partition
> (dt=20151124, msisdn_last_digit=2) purge", 2);
> > > Thread.sleep(3000);
> > > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=2", 3);
> > > Thread.sleep(3000);
> > > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=2", 4);
> > > (full code here)
> > >
> > > I cretate several threads, each execute query async. First is querying
> partition. Second drop partition. Others are the same as first. First query
> takes about 10-15 seconds to complete, so "alter table" query starts before
> first query completes.
> > > As a result i get:
> > >       • First query - successfully completes
> > >       • Second query - successfully completes
> > >       • Third query - successfully completes
> > >       • Fourth query - throw exception:
> > > java.sql.SQLException: Error while processing statement: FAILED:
> Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map
> 1, vertexId=vertex_1461923723503_0189_1_00, diagnostics=[Vertex
> vertex_1461923723503_0189_1_00 [Map 1] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections initializer
> failed, vertex=vertex_1461923723503_0189_1_00 [Map 1],
> java.lang.RuntimeException: serious problem
> > >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
> > >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
> > >       at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
> > >       at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
> > >       at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
> > >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
> > >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
> > >       at java.security.AccessController.doPrivileged(Native Method)
> > >       at javax.security.auth.Subject.doAs(Subject.java:422)
> > >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> > >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
> > >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
> > >       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > >       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > >       at java.lang.Thread.run(Thread.java:745)
> > > Caused by: java.util.concurrent.ExecutionException:
> java.io.FileNotFoundException: File
> hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
> does not exist.
> > >       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > >       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036)
> > >       ... 15 more
> > > Caused by: java.io.FileNotFoundException: File
> hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
> does not exist.
> > >       at
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:958)
> > >       at
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:937)
> > >       at
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:882)
> > >       at
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:878)
> > >       at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> > >       at
> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:878)
> > >       at
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694)
> > >       at
> org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:690)
> > >       at
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:366)
> > >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:648)
> > >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
> > >       ... 4 more
> > > ]Vertex killed, vertexName=Reducer 3,
> vertexId=vertex_1461923723503_0189_1_02, diagnostics=[Vertex received Kill
> in INITED state., Vertex vertex_1461923723503_0189_1_02 [Reducer 3]
> killed/failed due to:OTHER_VERTEX_FAILURE]Vertex killed, vertexName=Reducer
> 2, vertexId=vertex_1461923723503_0189_1_01, diagnostics=[Vertex received
> Kill in INITED state., Vertex vertex_1461923723503_0189_1_01 [Reducer 2]
> killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
> VERTEX_FAILURE. failedVertices:1 killedVertices:2
> > >       at
> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
> > >       at Test$MyRunnable.run(Test.java:54)
> > >       at java.lang.Thread.run(Thread.java:745)
> > >
> > > Since I'm using transactional table, I expect, that all queries,
> executed after partition drop, will complete successfully with no result.
> Am I doing something wrong? Is there other way to drop partition with data?
> >
> >
>
>

Re: Delete hive partition while executing query.

Reply via email to