[ https://issues.apache.org/jira/browse/HIVE-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Furcy Pin updated HIVE-12258: ----------------------------- Description: When hive.support.concurrency is enabled if you launch a query that reads data from a partition and writes data into another partition of the same table, it creates a deadlock. The worse part is that once the deadlock is active, you can't query the table until it times out. * How to reproduce : CREATE TABLE test_table (id INT) PARTITIONED BY (part STRING) ; INSERT INTO TABLE test_table PARTITION (part="test") VALUES (1), (2), (3), (4) ; INSERT OVERWRITE TABLE test_table PARTITION (part="test2") SELECT id FROM test_table WHERE part="test1"; Nothing happens, and when doing a SHOW LOCKS in another terminal we get : +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ | lockid | database | table | partition | lock_state | lock_type | transaction_id | last_heartbeat | acquired_at | +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ | 3765 | default | test_table | NULL | WAITING | SHARED_READ | NULL | 1440603633148 | NULL | | 3765 | default | test_table | part=test2 | WAITING | EXCLUSIVE | NULL | 1440603633148 | NULL | +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents in 1.2.1. I could not reproduce it easily locally because it requires a pseudo-distributed setup with zookeeper to have concurrency enabled. >From looking at the code I believe the problem comes from the >EmbeddedLockManager method `public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock, long sleepTime)` that keeps trying to acquire two incompatible locks, and ends up failing after hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s = 100 minutes. was: When hive.support.concurrency is enabled if you launch a query that reads data from a partition and writes data into another partition of the same table, it creates a deadlock. The worse part is that once the deadlock is active, you can't query the table until it times out. * How to reproduce : ``` CREATE TABLE test_table (id INT) PARTITIONED BY (part STRING) ; INSERT INTO TABLE test_table PARTITION (part="test") VALUES (1), (2), (3), (4) ; INSERT OVERWRITE TABLE test_table PARTITION (part="test2") SELECT id FROM test_table WHERE part="test1"; ``` Nothing happens, and when doing a SHOW LOCKS in another terminal we get : ``` SHOW LOCKS ; +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ | lockid | database | table | partition | lock_state | lock_type | transaction_id | last_heartbeat | acquired_at | +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ | 3765 | default | test_table | NULL | WAITING | SHARED_READ | NULL | 1440603633148 | NULL | | 3765 | default | test_table | part=test2 | WAITING | EXCLUSIVE | NULL | 1440603633148 | NULL | +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ ``` This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents in 1.2.1. I could not reproduce it easily locally because it requires a pseudo-distributed setup with zookeeper to have concurrency enabled. >From looking at the code I believe the problem comes from the >EmbeddedLockManager method `public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock, long sleepTime)` that keeps trying to acquire two incompatible locks, and ends up failing after hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s = 100 minutes. > read/write into same partitioned table + concurrency = deadlock > --------------------------------------------------------------- > > Key: HIVE-12258 > URL: https://issues.apache.org/jira/browse/HIVE-12258 > Project: Hive > Issue Type: Bug > Reporter: Furcy Pin > > When hive.support.concurrency is enabled if you launch a query that reads > data from a partition and writes data into another partition of the same > table, > it creates a deadlock. > The worse part is that once the deadlock is active, you can't query the table > until it times out. > * How to reproduce : > CREATE TABLE test_table (id INT) > PARTITIONED BY (part STRING) > ; > INSERT INTO TABLE test_table PARTITION (part="test") > VALUES (1), (2), (3), (4) > ; > INSERT OVERWRITE TABLE test_table PARTITION (part="test2") > SELECT id FROM test_table WHERE part="test1"; > Nothing happens, and when doing a SHOW LOCKS in another terminal we get : > +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ > | lockid | database | table | partition | lock_state | lock_type > | transaction_id | last_heartbeat | acquired_at | > +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ > | 3765 | default | test_table | NULL | WAITING | SHARED_READ > | NULL | 1440603633148 | NULL | > | 3765 | default | test_table | part=test2 | WAITING | EXCLUSIVE > | NULL | 1440603633148 | NULL | > +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+ > This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still > presents in 1.2.1. > I could not reproduce it easily locally because it requires a > pseudo-distributed setup with zookeeper to have concurrency enabled. > From looking at the code I believe the problem comes from the > EmbeddedLockManager method > `public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock, > long sleepTime)` > that keeps trying to acquire two incompatible locks, and ends up failing > after > hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is > 100*60s = 100 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)