Re: Running Hive on Spark

Daniel Mateus Pires Tue, 12 Mar 2019 01:56:41 -0700

Hi Rajesh,

I'm trying to further my understanding of the various interactions and
set-ups for Hive + Spark


My understanding so far is that running queries against the
SparkThriftServer uses the SparkSQL engine whereas the HiveServer2 + Hive +
Spark execution engine uses Hive primitives and only uses Spark for the
actual computations

I get your question about "why would I do that?" But my goal right now is
to understand "what does it mean if I do that"

Best regards
Daniel

On Tue 12 Mar 2019, 02:21 Rajesh Balamohan, <rbalamo...@apache.org> wrote:

> Not sure why you are using SparkThriftServer. OOTB HiveServer2 would be
> good enough for this.
>
> Is there any specific reason for moving from tez to spark as execution
> engine?
>
> ~Rajesh.B
>
> On Mon, Mar 11, 2019 at 9:45 PM Daniel Mateus Pires <dmate...@gmail.com>
> wrote:
>
>> Hi there,
>>
>> I would like to run Hive using Spark as the execution engine and I'm
>> pretty confused with the set up.
>>
>> For reference I'm using AWS EMR.
>>
>> First, I'm confused at the difference between running Hive with Spark as
>> its execution engine sending queries to Hive using HiveServer2 (Thrift),
>> and using the SparkThriftServer (I thought it was built on top of
>> HiveServer2) ? Could I read more about the differences somewhere ?
>>
>> I followed the following docs:
>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
>> and after changing the execution engine from the EMR default (tez) to
>> spark, I can see the difference on the HiveServer2 UI at port 10002 where
>> now the steps show "spark" as the execution engine.
>>
>> However I've set up the following config to get the Spark History Server
>> displaying queries coming through JDBC and I can see queries sent to the
>> SparkThriftServer (port 10001) but not to the HiveServer2 with execution
>> engine of Spark (port 10000)
>>
>> set spark.eventLog.enabled=true;
>> set spark.master=localhost:18080;
>> set spark.eventLog.dir=hdfs:///var/log/spark/apps;
>> set spark.executor.memory=512m;
>> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>>
>> Thanks!
>>
>

Re: Running Hive on Spark

Reply via email to