To correct my last message, its hive-metastore running as a service in a container and not hive. We use Spark-thriftserver for query execution. ________________________________ From: Saurabh Gulati <saurabh.gul...@fedex.com> Sent: 22 February 2022 16:33 To: Mich Talebzadeh <mich.talebza...@gmail.com> Cc: user@spark.apache.org <user@spark.apache.org> Subject: Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL
Thanks Sean for your response. @Mich Talebzadeh<mailto:mich.talebza...@gmail.com> We run all workloads on GKE as docker containers. So to answer your questions, Hive is running in a container as K8S service and spark thrift-server in another container as a service and Superset in a third container. We use Spark on GKE setup to run thrift-server which spawns workers depending on the load. For buckets we use gcs. TIA Saurabh ________________________________ From: Mich Talebzadeh <mich.talebza...@gmail.com> Sent: 22 February 2022 16:05 To: Saurabh Gulati <saurabh.gul...@fedex.com.invalid> Cc: user@spark.apache.org <user@spark.apache.org> Subject: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL Caution! This email originated outside of FedEx. Please do not open attachments or click links from an unknown or suspicious origin. Is your hive on prem with external tables in cloud storage? Where is your spark running from and what cloud buckets are you using? HTH On Tue, 22 Feb 2022 at 12:36, Saurabh Gulati <saurabh.gul...@fedex.com.invalid> wrote: Hello, We are trying to setup Spark as the execution engine for exposing our data stored in lake. We have hive metastore running along with Spark thrift server and are using Superset as the UI. We save all tables as External tables in hive metastore with storge being on Cloud. We see that right now when users run a query in Superset SQL Lab it scans the whole table. What we want is to limit the data scan by setting something like hive.mapred.mode=strict in spark, so that user gets an exception if they don't specify a partition column. We tried setting spark.hadoop.hive.mapred.mode=strict in spark-defaults.conf in thrift server but it still scans the whole table. Also tried setting hive.mapred.mode=strict in hive-defaults.conf for metastore container. We use Spark 3.2 with hive-metastore version 3.1.2 Is there a way in spark settings to make it happen. TIA Saurabh -- [https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ] view my Linkedin profile<https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!AhGNFqKB8wRZstQ!UkIXXdMGzZQ1fweFWq7S_xng9u_1Pjbpz9cBjBrs_ajvgZ05vnA7VLJ1gTZbg4rhI9Q$> https://en.everybodywiki.com/Mich_Talebzadeh<https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!AhGNFqKB8wRZstQ!UkIXXdMGzZQ1fweFWq7S_xng9u_1Pjbpz9cBjBrs_ajvgZ05vnA7VLJ1gTZbyZfziHU$> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.