Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

Sai Gopalakrishnan Fri, 20 Nov 2015 07:49:26 -0800

Hi Mich,


Could you please explain more on how to efficiently reflect updates and deletes 
done at RDBMS in HDFS via Sqoop? Even if Hive supports ACID properties in ORC, 
it still needs to know which records are to be updated/deleted right? You had 
mentioned feeding deltas from RDBMS to Hive, but query performance degrades 
with increase in delta files. Is there an existing feature related to this in 
Sqoop  or planned to be released any time soon?


Thanks & Regards,

Sai

________________________________
From: Mich Talebzadeh <m...@peridale.co.uk>
Sent: Friday, November 20, 2015 4:54 PM
To: user@hive.apache.org
Subject: RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu


Right



Your steps look reasonable.



Try to understand your approach



1.    You have a current RDBMS (Oracle, Sybase, MSSQL?)

2.    You want to feed that data daily in batch or real time from RDBMS to 
Hadoop as relational tables (that is where Hive comes into it)

3.    You need to have fully installed and configured Hiveincluding Hibve2 
server !

4.    You will need to use sqoop (SQL to Hadoop) to get DDL and data on RDBMS 
to be created on Hive. This is apriority step

5.    You will use Hive/MapReduce for batch processing

6.    You want to use Spark for real time data processing on Hadoop



How about feeding deltas (daily/periodic changes) from RDBMS to Hive. How are 
you going to do that. Remember we are talking about inserts/deletes/updates).



HTH



Mich Talebzadeh



Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.



From: Dasun Hegoda [mailto:dasunheg...@gmail.com]
Sent: 20 November 2015 09:36
To: user@hive.apache.org
Subject: Hive on Spark - Hadoop 2 - Installation - Ubuntu



Hi,



What I'm planning to do is develop a reporting platform using existing data. I 
have an existing RDBMS which has large number of records. So I'm using. 
(http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture)



 - Scoop - Extract data from RDBMS to Hadoop

 - Hadoop - Storage platform -> *Deployment Completed*

 - Hive - Datawarehouse

 - Spark - Read time processing -> *Deployment Completed*



I'm planning to deploy Hive on Spark but I can't find the installation steps. I 
tried to read the official '[Hive on Spark][1]' guide but it has problems. As 
an example it says under 'Configuring Yarn' 
`yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
 but does not imply where should I do it. Also as per the guide configurations 
are set in the Hive runtime shell which is not permanent according to my 
knowledge.



Given that I read [this][2] but it does not have any steps.



Please provide me the steps to run Hive on Spark on Ubuntu as a production 
system?





  [1]: 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

  [2]: 
http://stackoverflow.com/questions/26018306/how-to-configure-hive-to-use-spark



--

Regards,

Dasun Hegoda, Software Engineer
www.dasunhegoda.com<http://www.dasunhegoda.com/> | 
dasunheg...@gmail.com<mailto:dasunheg...@gmail.com>

[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

Reply via email to