Hi community,
I am running hundreds of Spark jobs at the same time, which cause Hive Metastore connection numbers to be very high (> 1K), since the jobs do not use HMS really, so I wish to disable that, I have tried setting spark.sql.catalogImplementation config to in-memory, which is said to be useful but it turns out not. Any suggestion would be appreciated !
code:
spark = SparkSession \
.builder \
.appName(“test") \
.config("spark.sql.catalogImplementation", "in-memory") \
.config("spark.executor.memory", "1g") \
.getOrCreate()
spark-submit command:
spark2-submit \
--master yarn \
--deploy-mode cluster \
--name "test"\
--conf spark.sql.catalogImplementation=in-memory \
test.py \
Spark version: 2.2.0
Hadoop version: 2.6.0