Hi, 


What I'm planning to do is develop a reporting platform using existing data. I 
have an existing RDBMS which has large number of records. So I'm using. ( 
http://stackoverflow.com/ questions/33635234/hadoop-2-7- 
spark-hive-jasperreports- scoop-architecuture ) 


- Scoop - Extract data from RDBMS to Hadoop 
- Hadoop - Storage platform -> *Deployment Completed* 
- Hive - Datawarehouse 
- Spark - Read time processing -> *Deployment Completed* 


I'm planning to deploy Hive on Spark but I can't find the installation steps. I 
tried to read the official '[Hive on Spark][1]' guide but it has problems. As 
an example it says under 'Configuring Yarn' `yarn.resourcemanager. 
scheduler.class=org.apache. hadoop.yarn.server. resourcemanager.scheduler. 
fair.FairScheduler` but does not imply where should I do it. Also as per the 
guide configurations are set in the Hive runtime shell which is not permanent 
according to my knowledge. 


Given that I read [this][2] but it does not have any steps. 


Please provide me the steps to run Hive on Spark on Ubuntu as a production 
system? 




[1]: https://cwiki.apache.org/ confluence/display/Hive/Hive+ 
on+Spark%3A+Getting+Started 
[2]: http://stackoverflow.com/ questions/26018306/how-to- 
configure-hive-to-use-spark 


Regards, 
Dasun Hegoda 
Senior Software Engineer @ ICTA 
dasunhegoda.com 

Reply via email to