Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

Saurabh Gulati Tue, 22 Feb 2022 07:37:49 -0800

To correct my last message, its hive-metastore running as a service in a 
container and not hive. We use Spark-thriftserver for query execution.
________________________________
From: Saurabh Gulati <saurabh.gul...@fedex.com>
Sent: 22 February 2022 16:33
To: Mich Talebzadeh <mich.talebza...@gmail.com>
Cc: user@spark.apache.org <user@spark.apache.org>
Subject: Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL


Thanks Sean for your response.

@Mich Talebzadeh<mailto:mich.talebza...@gmail.com> We run all workloads on GKE 
as docker containers. So to answer your questions, Hive is running in a 
container as K8S service and spark thrift-server in another container as a 
service and Superset in a third container.

We use Spark on GKE setup to run thrift-server which spawns workers depending 
on the load. For buckets we use gcs.


TIA
Saurabh
________________________________
From: Mich Talebzadeh <mich.talebza...@gmail.com>
Sent: 22 February 2022 16:05
To: Saurabh Gulati <saurabh.gul...@fedex.com.invalid>
Cc: user@spark.apache.org <user@spark.apache.org>
Subject: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

Caution! This email originated outside of FedEx. Please do not open attachments 
or click links from an unknown or suspicious origin.

Is your hive on prem with external tables in cloud storage?

Where is your spark running from and what cloud buckets are you using?

HTH

On Tue, 22 Feb 2022 at 12:36, Saurabh Gulati <saurabh.gul...@fedex.com.invalid> 
wrote:
Hello,
We are trying to setup Spark as the execution engine for exposing our data 
stored in lake. We have hive metastore running along with Spark thrift server 
and are using Superset as the UI.

We save all tables as External tables in hive metastore with storge being on 
Cloud.

We see that right now when users run a query in Superset SQL Lab it scans the 
whole table. What we want is to limit the data scan by setting something like 
hive.mapred.mode=strict in spark, so that user gets an exception if they don't 
specify a partition column.

We tried setting spark.hadoop.hive.mapred.mode=strict in spark-defaults.conf 
in thrift server  but it still scans the whole table.
Also tried setting hive.mapred.mode=strict in hive-defaults.conf for metastore 
container.

We use Spark 3.2 with hive-metastore version 3.1.2

Is there a way in spark settings to make it happen.


TIA
Saurabh
--



 
[https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!AhGNFqKB8wRZstQ!UkIXXdMGzZQ1fweFWq7S_xng9u_1Pjbpz9cBjBrs_ajvgZ05vnA7VLJ1gTZbg4rhI9Q$>


 
https://en.everybodywiki.com/Mich_Talebzadeh<https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!AhGNFqKB8wRZstQ!UkIXXdMGzZQ1fweFWq7S_xng9u_1Pjbpz9cBjBrs_ajvgZ05vnA7VLJ1gTZbyZfziHU$>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

Reply via email to