Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Sai Gopalakrishnan
Hi everyone, Thank you for your responses. I think Mich's suggestion is a great one, will go with it. As Alan suggested, using compactor in Hive should help out with managing the delta files. @Dasun, pardon me for deviating from the topic. Regarding configuration, you could try a packaged di

Re: Query performance correlated to increase in delta files?

2015-11-20 Thread Sai Gopalakrishnan
Storm open simultaneously. The streaming agent then writes that number of ... Read more...<https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions> Alan. [cid:part1.03010709.02080200@gmail.com] Sai Gopalakrishnan<mailto:sai.gopalakrish...@aspiresys.com> November 19, 2015 at 2

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Sai Gopalakrishnan
Hi Mich, Could you please explain more on how to efficiently reflect updates and deletes done at RDBMS in HDFS via Sqoop? Even if Hive supports ACID properties in ORC, it still needs to know which records are to be updated/deleted right? You had mentioned feeding deltas from RDBMS to Hive, but

Query performance correlated to increase in delta files?

2015-11-19 Thread Sai Gopalakrishnan
Hello fellow developer, Greetings! I am using Hive for querying transactional data. I transfer data from RDBMS to Hive using Sqoop and prefer the ORC format for speed and its ACID properties. I found out that Sqoop has no support for reflecting the updated and deleted records in RDBMS and henc