Hi Subject to your version of Hive & Spark, you may want to set hive.execution.engine=spark as beeline command line parameter, assuming you are running hive scripts using beeline command line (which is suggested practice for security purposes).
On Thu, Mar 9, 2017 at 2:09 PM, nancy henry <nancyhenry6...@gmail.com> wrote: > > Hi Team, > > basically we have all data as hive tables ..and processing it till now in > hive on MR.. now that we have hivecontext which can run hivequeries on > spark, we are making all these complex hive scripts to run using > hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically > running hive queries on spark and not coding anything yet in scala still we > see just making hive queries to run on spark is showing a lot difference in > time than run on MR.. > > so as we already have hivescripts lets make those complex hivescript run > using hc.sql as hc.sql is able to do it > > or is this not best practice even though spark can do it its still better > to load all those individual hive tables in spark and make rdds and write > scala code to get the same functionality happening in hive > > its becoming difficult for us to choose whether to leave it to hc.sql to > do the work of running complex scripts also or we have to code in > scala..will it be worth the effort of manual intervention in terms of > performance > > ex of our sample scripts > use db; > create tempfunction1 as com.fgh.jkl.TestFunction; > > create destable in hive; > insert overwrite desttable select (big complext transformations and usage > of hive udf) > from table1,table2,table3 join table4 on some condition complex and join > table 7 on another complex condition where complex filtering > > So please help what would be best approach and why i should not give > entire script for hivecontext to make its own rdds and run on spark if we > are able to do it > > coz all examples i see online are only showing hc.sql("select * from > table1) and nothing complex than that > > > -- Best Regards, Ayan Guha