Ok interesting. I am surprised why you are not using BigQuery and using Hive. My assumption is that your Spark is version 3.1.1 with standard GKE on auto-scaler. What benefits are you getting from Using Hive here? As you have your hive tables on gs buckets, you can easily download your hive tables into BigQuery and run spark on BigQuery?
HTH On Tue, 22 Feb 2022 at 15:34, Saurabh Gulati <saurabh.gul...@fedex.com> wrote: > Thanks Sean for your response. > > @Mich Talebzadeh <mich.talebza...@gmail.com> We run all workloads on GKE > as docker containers. So to answer your questions, Hive is running in a > container as K8S service and spark thrift-server in another container as a > service and Superset in a third container. > > We use Spark on GKE setup to run thrift-server which spawns workers > depending on the load. For buckets we use gcs. > > > TIA > Saurabh > ------------------------------ > *From:* Mich Talebzadeh <mich.talebza...@gmail.com> > *Sent:* 22 February 2022 16:05 > *To:* Saurabh Gulati <saurabh.gul...@fedex.com.invalid> > *Cc:* user@spark.apache.org <user@spark.apache.org> > *Subject:* [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark > SQL > > *Caution! This email originated outside of FedEx. Please do not open > attachments or click links from an unknown or suspicious origin*. > Is your hive on prem with external tables in cloud storage? > > Where is your spark running from and what cloud buckets are you using? > > HTH > > On Tue, 22 Feb 2022 at 12:36, Saurabh Gulati > <saurabh.gul...@fedex.com.invalid> wrote: > > Hello, > We are trying to setup Spark as the execution engine for exposing our data > stored in lake. We have hive metastore running along with Spark thrift > server and are using Superset as the UI. > > We save all tables as External tables in hive metastore with storge being > on Cloud. > > We see that right now when users run a query in Superset SQL Lab it scans > the whole table. What we want is to limit the data scan by setting > something like hive.mapred.mode=strict in spark, so that user gets an > exception if they don't specify a partition column. > > We tried setting spark.hadoop.hive.mapred.mode=strict in > spark-defaults.conf in thrift server but it still scans the whole table. > Also tried setting hive.mapred.mode=strict in hive-defaults.conf for > metastore container. > > We use Spark 3.2 with hive-metastore version 3.1.2 > > Is there a way in spark settings to make it happen. > > > TIA > Saurabh > > -- > > > > view my Linkedin profile > <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!AhGNFqKB8wRZstQ!UkIXXdMGzZQ1fweFWq7S_xng9u_1Pjbpz9cBjBrs_ajvgZ05vnA7VLJ1gTZbg4rhI9Q$> > > > https://en.everybodywiki.com/Mich_Talebzadeh > <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!AhGNFqKB8wRZstQ!UkIXXdMGzZQ1fweFWq7S_xng9u_1Pjbpz9cBjBrs_ajvgZ05vnA7VLJ1gTZbyZfziHU$> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > -- view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.