RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu

Mich Talebzadeh Fri, 20 Nov 2015 07:39:00 -0800

HI,


I don’t think there is any packaged distribution including all these 
components. Indeed one neds to get the architecture right to make this work 
seamlessly

 

 

What I did was

 

1.    Installed and configured Hadoop

2.    Installed Hive

3.    Installed Sqoop

4.    Used Sqoop to get data out of relational table and put it in Hive table. 
You can use Sqoop to create the DDL and populate Hive table for you. This is 
pretty fast for base table

5.    Used SAP replication server to changed data out of RDBMS table in real 
time and feed Hive table. For an RDBMS you have a transactional table. You need 
to find a mechanism to flag rows in Hive as INSERTED/UPDATED/DELETED with a 
timestamp. Sqoop would not do that for you whatever scheduling mechanism you 
use. If I am correct Sqoop can do “incremental append” based on last ID value 
but that will not cater for updates or deletes

6.    As you have already gathered Hive with MapReduce is best for batch 
processing, hence Spark sounds an attractive mechanism for real time analytics

 

 

As for myself I am in the processing of getting to know Spark and make it work 
with Hive.  Once I get it working will publish and share the findings.

 

HTH

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Dasun Hegoda [mailto:dasunheg...@gmail.com] 
Sent: 20 November 2015 11:54
To: user@hive.apache.org
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Where can I get a Hadoop distribution containing these technologies? Link?

 

On Fri, Nov 20, 2015 at 5:22 PM, Jörn Franke <jornfra...@gmail.com 
<mailto:jornfra...@gmail.com> > wrote:

I recommend to use a Hadoop distribution containing these technologies. I think 
you get also other useful tools for your scenario, such as Auditing using 
sentry or ranger.


On 20 Nov 2015, at 10:48, Mich Talebzadeh <m...@peridale.co.uk 
<mailto:m...@peridale.co.uk> > wrote:

Well

 

“I'm planning to deploy Hive on Spark but I can't find the installation steps. 
I tried to read the official '[Hive on Spark][1]' guide but it has problems. As 
an example it says under 'Configuring Yarn' 
`yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
 but does not imply where should I do it. Also as per the guide configurations 
are set in the Hive runtime shell which is not permanent according to my 
knowledge.”

 

You can do that in yarn-site.xml file which is normally under 
$HADOOP_HOME/etc/hadoop.

 

 

HTH

 

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Dasun Hegoda [mailto:dasunheg...@gmail.com] 
Sent: 20 November 2015 09:36
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Hi,

 

What I'm planning to do is develop a reporting platform using existing data. I 
have an existing RDBMS which has large number of records. So I'm using. 
(http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture)

 

 - Scoop - Extract data from RDBMS to Hadoop

 - Hadoop - Storage platform -> *Deployment Completed*

 - Hive - Datawarehouse

 - Spark - Read time processing -> *Deployment Completed*

 

I'm planning to deploy Hive on Spark but I can't find the installation steps. I 
tried to read the official '[Hive on Spark][1]' guide but it has problems. As 
an example it says under 'Configuring Yarn' 
`yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
 but does not imply where should I do it. Also as per the guide configurations 
are set in the Hive runtime shell which is not permanent according to my 
knowledge.

 

Given that I read [this][2] but it does not have any steps.

 

Please provide me the steps to run Hive on Spark on Ubuntu as a production 
system?

 

 

  [1]: 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

  [2]: 
http://stackoverflow.com/questions/26018306/how-to-configure-hive-to-use-spark

 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com> 





 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com>

RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu

Reply via email to