Not sure why you are using SparkThriftServer. OOTB HiveServer2 would be good enough for this.
Is there any specific reason for moving from tez to spark as execution engine? ~Rajesh.B On Mon, Mar 11, 2019 at 9:45 PM Daniel Mateus Pires <dmate...@gmail.com> wrote: > Hi there, > > I would like to run Hive using Spark as the execution engine and I'm > pretty confused with the set up. > > For reference I'm using AWS EMR. > > First, I'm confused at the difference between running Hive with Spark as > its execution engine sending queries to Hive using HiveServer2 (Thrift), > and using the SparkThriftServer (I thought it was built on top of > HiveServer2) ? Could I read more about the differences somewhere ? > > I followed the following docs: > https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started > and after changing the execution engine from the EMR default (tez) to > spark, I can see the difference on the HiveServer2 UI at port 10002 where > now the steps show "spark" as the execution engine. > > However I've set up the following config to get the Spark History Server > displaying queries coming through JDBC and I can see queries sent to the > SparkThriftServer (port 10001) but not to the HiveServer2 with execution > engine of Spark (port 10000) > > set spark.eventLog.enabled=true; > set spark.master=localhost:18080; > set spark.eventLog.dir=hdfs:///var/log/spark/apps; > set spark.executor.memory=512m; > set spark.serializer=org.apache.spark.serializer.KryoSerializer; > > Thanks! >