Hi,
I have data stored in Hive tables that I want to do simple manipulation. Currently in Spark I perform the following with getting the result set using SQL from Hive tables, registering as a temporary table in Spark Now Ideally I can get the result set into a DF and work on DF to slice and dice the data using functional programming with filter, map. split etc. I wanted to get some ideas on how to go about it. thanks val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) HiveContext.sql("use oraclehadoop") val rs = HiveContext.sql("""SELECT t.calendar_month_desc, c.channel_desc, SUM(s.amount_sold) AS TotalSales FROM smallsales s, times t, channels c WHERE s.time_id = t.time_id AND s.channel_id = c.channel_id GROUP BY t.calendar_month_desc, c.channel_desc """) RS.REGISTERTEMPTABLE("TMP") HiveContext.sql(""" SELECT calendar_month_desc AS MONTH, channel_desc AS CHANNEL, TotalSales from tmp ORDER BY MONTH, CHANNEL """).collect.foreach(println) HiveContext.sql(""" SELECT channel_desc AS CHANNEL, MAX(TotalSales) AS SALES FROM tmp GROUP BY channel_desc order by SALES DESC """).collect.foreach(println) -- Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Cloud Technology Partners Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Cloud Technology partners Ltd, its subsidiaries nor their employees accept any responsibility.