Hello,
We are trying to setup Spark as the execution engine for exposing our data 
stored in lake. We have hive metastore running along with Spark thrift server 
and are using Superset as the UI.

We save all tables as External tables in hive metastore with storge being on 
Cloud.

We see that right now when users run a query in Superset SQL Lab it scans the 
whole table. What we want is to limit the data scan by setting something like 
hive.mapred.mode=strict​ in spark, so that user gets an exception if they don't 
specify a partition column.

We tried setting spark.hadoop.hive.mapred.mode=strict ​in spark-defaults.conf​ 
in thrift server  but it still scans the whole table.
Also tried setting hive.mapred.mode=strict​ in hive-defaults.conf for metastore 
container.

We use Spark 3.2 with hive-metastore version 3.1.2

Is there a way in spark settings to make it happen.


TIA
Saurabh

Reply via email to