Hi Alan,
Thanks for replying. I haven't tried the compactor yet, will do. Can it be scheduled or does it automatically run when detects a very high number of delta files? The documentation says 'All compactions are done in the background and do not prevent concurrent reads and writes of the data. ' I prefer to schedule it along with the daily imports to avoid overhead. Thanks & Regards, Sai ________________________________ From: Alan Gates <alanfga...@gmail.com> Sent: Saturday, November 21, 2015 3:47 AM To: user@hive.apache.org Subject: Re: Query performance correlated to increase in delta files? Are you running the compactor as part of your metastore? It's occasionally compacts the delta files in order to reduce read time. See https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions for details. [https://cwiki.apache.org/confluence/s/en_GB/5982/f2b47fb3d636c8bc9fd0b11c0ec6d0ae18646be7.1/_/images/logo/confluence-logo.png]<https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions> Hive Transactions - Apache Hive - Apache Software Foundation 1 hive.txn.max.open.batch controls how many transactions streaming agents such as Flume or Storm open simultaneously. The streaming agent then writes that number of ... Read more...<https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions> Alan. [cid:part1.03010709.02080200@gmail.com] Sai Gopalakrishnan<mailto:sai.gopalakrish...@aspiresys.com> November 19, 2015 at 21:17 Hello fellow developer, Greetings! I am using Hive for querying transactional data. I transfer data from RDBMS to Hive using Sqoop and prefer the ORC format for speed and its ACID properties. I found out that Sqoop has no support for reflecting the updated and deleted records in RDBMS and hence I am inserting those modified records into the HDFS and updating/deleting the Hive tables to reflect the changes. Every update/delete in the Hive table results in creation of new delta files. I noticed a considerable drop in speed over a period of time. I realize that lookups tend to take more time with growing files. Is there any way to overcome this issue? INSERT OVERWRITE the table is costly, I deal with about 1TB data, and it keeps growing every day. Kindly reply with a suitable solution at the earliest. Thanks & Regards, Saisubramaniam Gopalakrishnan Aspire Systems (India) Pvt. Ltd. [Aspire Systems] This e-mail message and any attachments are for the sole use of the intended recipient(s) and may contain proprietary, confidential, trade secret or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited and may be a violation of law. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. [Aspire Systems] This e-mail message and any attachments are for the sole use of the intended recipient(s) and may contain proprietary, confidential, trade secret or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited and may be a violation of law. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.