Hive LIMIT clause slows query

2017-11-02 Thread Igor Kuzmenko
I'm using HDP 2.5.0 with 1.2.1 Hive. Performing some tests I noticed that my query works better if I don't use limit clause. My query is: insert into table *results_table *partition (task_id=xxx) select * from *data_table * where dt=20171102 and . limit 100 This query runs in about 30 s

Re: Hive locking mechanism on read partition.

2017-10-13 Thread Igor Kuzmenko
tps://cwiki.apache.org/confluence/display/Hive/ > Configuration+Properties#ConfigurationProperties-hive. > txn.strict.locking.mode > > To change X lock on write to S lock to get around this but this may not be > appropriate for the rest of your logic. > > > > Eugene &

Re: Hive locking mechanism on read partition.

2017-10-13 Thread Igor Kuzmenko
Hi, Eugene. Tables are not transactional and locks are backed by DbTxnManager. On Fri, Oct 13, 2017 at 2:30 AM, Eugene Koifman wrote: > Which lock manager are you using? > > Do you have acid enabled and if so are these tables transactional? > > > > Eugene > > &g

Hive locking mechanism on read partition.

2017-10-12 Thread Igor Kuzmenko
Hello, I'm using HDP 2.5.0.0 with included hive 1.2.1. And I have problem with locking mechanism. Most of my queries to hive looks like this. *(1)insert into table results_table partition(task_id=${task_id})* *select * from data_table where ;* results_table partitioned by task_

Unexpected query result

2017-08-21 Thread Igor Kuzmenko
Runnuning simple '*select count(*) from test_table*' query returned me 500_000 result. But when i run '*select count(distinct field) from test_table*' query result is 500_001. How it coud happen, that in table with 500_000 records have 500_001 unique field values? I'm using Hive from HDP 2.5.0 p

Re: Hive TxnHandler::lock method run into dead lock.

2017-03-28 Thread Igor Kuzmenko
Explicit configuration is workaround, but it doesn't solve deadlock problem. On Mon, Mar 27, 2017 at 8:28 PM, Eugene Koifman wrote: > There is an open ticket > > https://issues.apache.org/jira/browse/HIVE-13842 > > > > Eugene > > > > *From: *Igor Kuzmenk

Re: Hive TxnHandler::lock method run into dead lock.

2017-03-27 Thread Igor Kuzmenko
> *if*(systemProp != *null*) { > *this*.loadProperties(systemProp); > } > > } > > > > > > *From: *Igor Kuzmenko > *Reply-To: *"user@hive.apache.org" > *Date: *Saturday, March 25, 2017 at 5:05 PM > > *To: *"user@hive.apache.org"

Re: Hive TxnHandler::lock method run into dead lock.

2017-03-25 Thread Igor Kuzmenko
ry use “hikaricp” connection pool manager? It seems to be using > default which is no limit. > > > > > > Eugene > > > > *From: *Igor Kuzmenko > *Reply-To: *"user@hive.apache.org" > *Date: *Monday, March 20, 2017 at 2:17 PM > *To: *"user@hive.ap

Hive TxnHandler::lock method run into dead lock.

2017-03-20 Thread Igor Kuzmenko
Hello I'm running Hortonworks data platform 2.5.0.0 with included hive. I'm using storm hive bolt to load data into my hive. But launching many hive bolt always leads me to TimeoutException on calling hive metastore. Metastore logs full of Exception like this: 2017-03-15 18:46:12,436 ERROR [pool-5

Re: Hive TxnHandler::lock method run into dead lock.

2017-03-20 Thread Igor Kuzmenko
pache/hadoop/hive/metastore/txn/TxnHandler.java I guess the closest branch in apach repo is: https://github.com/apache/hive/blob/branch-2.1/metastore/src/java/org/apache /hadoop/hive/metastore/txn/TxnHandler.java On Tue, Mar 21, 2017 at 12:07 AM, Igor Kuzmenko wrote: > Hello I'm runnin

Re: How to remove Hive table property?

2016-08-24 Thread Igor Kuzmenko
ty which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 23 August 2016 at 12:42, Igor Kuzmenko wrote: > >&

Re: Hive transaction doesn't release lock.

2016-08-24 Thread Igor Kuzmenko
t * from HIVE_LOCKS” but the output is not from > HIVE_LOCKS. > What entries do you have in HIVE_LOCKS for this txn_id? > > If all you see is an entry in TXN table in ‘a’ state – that is OK. that > just mean that this transaction was aborted. > > Eugene > > From: I

How to remove Hive table property?

2016-08-23 Thread Igor Kuzmenko
I've created a Hive table with property "serialization.null.format"="null" to interpret string "null' as null. Now it's unnecessary for me. How can I remove it? Alter table properties page

Re: Hive transaction doesn't release lock.

2016-08-23 Thread Igor Kuzmenko
lying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 22 August 2016 at 16:27, Igor Kuzmenko wrote: > >> Hello, I'm using Ap

Hive transaction doesn't release lock.

2016-08-22 Thread Igor Kuzmenko
Hello, I'm using Apache Hive 1.2.1 and Apache Storm to stream data in hive table. After making some tests I tried to truncate my table, but sql execution doesn't complete because of the lock on table: select * from HIVE_LOCKS; # TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_

Re: Malformed orc file

2016-08-05 Thread Igor Kuzmenko
t closed > and hence may not be flushed completely. Did the transaction commit > successfully? Or was there any exception thrown during writes/commit? > > Thanks > Prasanth > > On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko wrote: > > Hello, I've got a malformed O

Re: Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-05 Thread Igor Kuzmenko
Thanks for reply, Gopal. Very helpful. On Thu, Aug 4, 2016 at 10:15 PM, Gopal Vijayaraghavan wrote: > > where res_url like '%mts.ru%' > ... > > where res_url like '%mts_ru%' > ... > > Why '_' wildcard decrease perfomance? > > Because it misses the fast path by just one "_". > > ORC vectorized re

Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-04 Thread Igor Kuzmenko
I've got Hive Transactional table 'data_http' in ORC format, containing around 100.000.000 rows. When I execute query: select * from data_http where res_url like '%mts.ru%' it completes in 10 seconds. But executing query select * from data_http where res_url like '%mts_ru%' takes more than 3

Malformed orc file

2016-08-03 Thread Igor Kuzmenko
Hello, I've got a malformed ORC file in my Hive table. File was created by Hive Streaming API and I have no idea under what circumstances it became corrupted. File on google drive: link Exception message when trying t

Re: Hive compaction didn't launch

2016-07-29 Thread Igor Kuzmenko
gt; Eugene > > > On 7/28/16, 3:59 PM, "Alan Gates" wrote: > > >But until those transactions are closed you don¹t know that they won¹t > >write to partition B. After they write to A they may choose to write to > >B and then commit. The compactor can not m

Re: Hive compaction didn't launch

2016-07-28 Thread Igor Kuzmenko
orm > should be committing on some frequency even if it doesn’t have enough data > to commit. > > Alan. > > > On Jul 28, 2016, at 05:36, Igor Kuzmenko wrote: > > > > I made some research on that issue. > > The problem is in ValidCompactorTxnList::isTx

Re: Hive compaction didn't launch

2016-07-28 Thread Igor Kuzmenko
because of using Storm Hive Bolt. Hive Bolt gets transaction and maintain it open with heartbeat until there's data to commit. So if i get transaction and maintain it open all compactions will stop. Is it incorrect Hive behavior, or Storm should close transaction? On Wed, Jul 27, 2016 at 8

Re: Hive compaction didn't launch

2016-07-27 Thread Igor Kuzmenko
eople have seen this issue before. > > Alan. > > > On Jul 27, 2016, at 03:31, Igor Kuzmenko wrote: > > > > One more thing. I'm using Apache Storm to stream data in Hive. And when > I turned off Storm topology compactions started to work properly. > > > &g

Re: Hive compaction didn't launch

2016-07-27 Thread Igor Kuzmenko
One more thing. I'm using Apache Storm to stream data in Hive. And when I turned off Storm topology compactions started to work properly. On Tue, Jul 26, 2016 at 6:28 PM, Igor Kuzmenko wrote: > I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive > Streaming

Hive compaction didn't launch

2016-07-26 Thread Igor Kuzmenko
I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive Streaming API. After some time i expect compaction to start but it didn't happen: Here's part of log, which shows that compactor initiator thread doesn't see any delta files: *2016-07-26 18:06:52,459 INFO [Thread-8]: compacto

Does HIVE JDBC return same sequence of records?

2016-07-04 Thread Igor Kuzmenko
If I perform query "*SELECT * FROM table t WHERE t.partition = value" *with Hive JDBC several times is there garantee, that when I will iterate throw result set I get records in the same order every time? Intuitively, it feels yes, because in that query ther's no MapReduce and hive just read data f

What is the best way to store IPv6 address in Hive?

2016-06-28 Thread Igor Kuzmenko
Currently I'm using ORC transactional tables, and i need to store a lot of data containing IP addresses. With IPv4 it can be a Integer (4 bytes exacty), but what about IPv6? Obiously it should be space efficient and easy to search for exact match. As extra feature it would be good to do fast search

Re: Delete hive partition while executing query.

2016-06-09 Thread Igor Kuzmenko
he exception you are getting but it needs > to be fixed to prevent a partition from disappearing while query 3 and 4 > are in progress. > > Could you file a Jira please? > > thanks, > Eugene > > From: Igor Kuzmenko > Reply-To: "user@hive.apache.org" > Date:

Re: Delete hive partition while executing query.

2016-06-08 Thread Igor Kuzmenko
ve explanation is not what’s > happening. > > Would it be possible for you to turn on debug logging on your thrift > metastore process and rerun this test and post the logs somewhere? Apache > lists strip attachments so you won’t be able to attach them here, you’ll > have to

Re: Delete hive partition while executing query.

2016-06-07 Thread Igor Kuzmenko
up. The transaction manager is what manages > locking and makes sure that your queries don’t stomp each other. > > Alan. > > > On Jun 6, 2016, at 06:01, Igor Kuzmenko wrote: > > > > Hello, I'm trying to find a safe way to delete partition with all data > it in

Delete hive partition while executing query.

2016-06-06 Thread Igor Kuzmenko
Hello, I'm trying to find a safe way to delete partition with all data it includes. I'm using Hive 1.2.1, Hive JDBC driver 1.2.1 and perform simple test on transactional table: asyncExecute("Select count(distinct in_info_msisdn) from mobile_connections where dt=20151124 and msisdn_last_digit=2",

Hive Hcatalog Streaming. Why hive table must be bucketed?

2016-04-08 Thread Igor Kuzmenko
Hello I've got few questions about Hive HCatalog streaming . This feature has requirement: "*The Hive table must be bucketed , but not sorted.

Re: Hive StreamingAPI leaves table in not consistent state

2016-03-14 Thread Igor Kuzmenko
s is an issue in the Storm Hive bolt. I don’t have an Apache > JIRA on it, but if you ask on the Hortonworks lists we can connect you with > the fix for the storm bolt. > > Alan. > > > On Mar 10, 2016, at 04:02, Igor Kuzmenko wrote: > > > > Hello, I'm using Ho

Hive StreamingAPI leaves table in not consistent state

2016-03-10 Thread Igor Kuzmenko
Hello, I'm using Hortonworks Data Platform 2.3.4 which includes Apache Hive 1.2.1 and Apache Storm 0.10. I've build Storm topology using Hive Bolt, which eventually using Hive StreamingAPI to stream data into hive table. In Hive I've created transactional table: 1. CREATE EXTERNAL TABLE cdr1 (