Hi, 

I have data stored in Hive tables that I want to do simple manipulation.


Currently in Spark I perform the following with getting the result set
using SQL from Hive tables, registering as a temporary table in Spark 

Now Ideally I can get the result set into a DF and work on DF to slice
and dice the data using functional programming with filter, map. split
etc. 

I wanted to get some ideas on how to go about it. 

thanks 

val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 

HiveContext.sql("use oraclehadoop")
val rs = HiveContext.sql("""SELECT t.calendar_month_desc,
c.channel_desc, SUM(s.amount_sold) AS TotalSales
FROM smallsales s, times t, channels c
WHERE s.time_id = t.time_id
AND s.channel_id = c.channel_id
GROUP BY t.calendar_month_desc, c.channel_desc
""")
RS.REGISTERTEMPTABLE("TMP") 

HiveContext.sql("""
SELECT calendar_month_desc AS MONTH, channel_desc AS CHANNEL, TotalSales
from tmp
ORDER BY MONTH, CHANNEL
""").collect.foreach(println)
HiveContext.sql("""
SELECT channel_desc AS CHANNEL, MAX(TotalSales) AS SALES
FROM tmp
GROUP BY channel_desc
order by SALES DESC
""").collect.foreach(println) 

-- 

Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential.
This message is for the designated recipient only, if you are not the
intended recipient, you should destroy it immediately. Any information
in this message shall not be understood as given or endorsed by Cloud
Technology Partners Ltd, its subsidiaries or their employees, unless
expressly so stated. It is the responsibility of the recipient to ensure
that this email is virus free, therefore neither Cloud Technology
partners Ltd, its subsidiaries nor their employees accept any
responsibility.

 

Reply via email to