Re: Query performance correlated to increase in delta files?

Sai Gopalakrishnan Fri, 20 Nov 2015 22:44:37 -0800

Hi Alan,


Thanks for replying.

I haven't tried the compactor yet, will do. Can it be scheduled or does it 
automatically run when detects a very high number of delta files? The 
documentation says 'All compactions are done in the background and do not 
prevent concurrent reads and writes of the data. '  I prefer to schedule it 
along with the daily imports to avoid overhead.


Thanks & Regards,

Sai

________________________________
From: Alan Gates <alanfga...@gmail.com>
Sent: Saturday, November 21, 2015 3:47 AM
To: user@hive.apache.org
Subject: Re: Query performance correlated to increase in delta files?

Are you running the compactor as part of your metastore?  It's occasionally 
compacts the delta files in order to reduce read time.  See 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions for details.
[https://cwiki.apache.org/confluence/s/en_GB/5982/f2b47fb3d636c8bc9fd0b11c0ec6d0ae18646be7.1/_/images/logo/confluence-logo.png]<https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions>

Hive Transactions - Apache Hive - Apache Software Foundation
1 hive.txn.max.open.batch controls how many transactions streaming agents such 
as Flume or Storm open simultaneously. The streaming agent then writes that 
number of ...
Read more...<https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions>




Alan.

[cid:part1.03010709.02080200@gmail.com]
Sai Gopalakrishnan<mailto:sai.gopalakrish...@aspiresys.com>
November 19, 2015 at 21:17

Hello fellow developer,



Greetings!



I am using Hive for querying transactional data. I transfer data from RDBMS to 
Hive using Sqoop and prefer the ORC format for speed and its ACID properties. I 
found out that Sqoop has no support for reflecting the updated and deleted 
records in RDBMS and hence I am inserting those modified records into the HDFS 
and updating/deleting the Hive tables to reflect the changes. Every 
update/delete in the Hive table results in creation of new delta files. I 
noticed a considerable drop in speed over a period of time. I realize that 
lookups tend to take more time with growing files. Is there any way to overcome 
this issue? INSERT OVERWRITE the table is costly, I deal with about 1TB data, 
and it keeps growing every day.



Kindly reply with a suitable solution at the earliest.



Thanks & Regards,

Saisubramaniam Gopalakrishnan

Aspire Systems (India) Pvt. Ltd.



[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.

[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.

Re: Query performance correlated to increase in delta files?

Reply via email to