Hi there, I would like to run Hive using Spark as the execution engine and I'm pretty confused with the set up.
For reference I'm using AWS EMR. First, I'm confused at the difference between running Hive with Spark as its execution engine sending queries to Hive using HiveServer2 (Thrift), and using the SparkThriftServer (I thought it was built on top of HiveServer2) ? Could I read more about the differences somewhere ? I followed the following docs: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started and after changing the execution engine from the EMR default (tez) to spark, I can see the difference on the HiveServer2 UI at port 10002 where now the steps show "spark" as the execution engine. However I've set up the following config to get the Spark History Server displaying queries coming through JDBC and I can see queries sent to the SparkThriftServer (port 10001) but not to the HiveServer2 with execution engine of Spark (port 10000) set spark.eventLog.enabled=true; set spark.master=localhost:18080; set spark.eventLog.dir=hdfs:///var/log/spark/apps; set spark.executor.memory=512m; set spark.serializer=org.apache.spark.serializer.KryoSerializer; Thanks!