By default, Spark uses Apache Derby (running in embedded mode with store
content defined in local files) for hosting the Hive metastore. You can
externalize the metastore on a JDBC-compliant database (e.g.,
PostgreSQL) and use the database authentication provided by the
database. The JDBC configuration shall be defined in a hive-site.xml
file in the Spark conf directory. Please see the metastore admin guide
for more details, including an init script for setting up your metastore
(https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration).
On 10/20/22 4:31 AM, second_co...@yahoo.com.INVALID wrote:
Currently my pyspark code able to connect to hive metastore at port
9083. However using this approach i can't put in-place any security
mechanism like LDAP and sql authentication control. Is there anyway to
connect from pyspark to spark thrift server on port 10000 without
exposing hive metastore url to the pyspark ? I would like to
authenticate the user before allow to execute spark sql, and user
should only allow to query from databases,tables that they have the
access.
Thank you,
comet