Hi there,

I would like to run Hive using Spark as the execution engine and I'm pretty
confused with the set up.

For reference I'm using AWS EMR.

First, I'm confused at the difference between running Hive with Spark as
its execution engine sending queries to Hive using HiveServer2 (Thrift),
and using the SparkThriftServer (I thought it was built on top of
HiveServer2) ? Could I read more about the differences somewhere ?

I followed the following docs:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
and after changing the execution engine from the EMR default (tez) to
spark, I can see the difference on the HiveServer2 UI at port 10002 where
now the steps show "spark" as the execution engine.

However I've set up the following config to get the Spark History Server
displaying queries coming through JDBC and I can see queries sent to the
SparkThriftServer (port 10001) but not to the HiveServer2 with execution
engine of Spark (port 10000)

set spark.eventLog.enabled=true;
set spark.master=localhost:18080;
set spark.eventLog.dir=hdfs:///var/log/spark/apps;
set spark.executor.memory=512m;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;

Thanks!

Reply via email to