There are many ways of interacting with Hive DW from Spark. You can either use the API from Spark to Hive native or you can use JDBC connection (local or remote spark).
What is the reference to the driver in this context? Bottom line using concurrent queries, you will have to go through Hive and that is where as you pointed out, you may have concurrency issues. Spark IMO does not play such a significant role here. Your concurrency will rise from the way Hive is configured to handle multiple threads. If Hive metastore is on Oracle you will have or expect v.good performance. On the other hand if you use some MySql etc, then you will have bottleneck on the hive side. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 27 Aug 2021 at 02:32, Tao Li <li...@apache.org> wrote: > In the high concurrency scenario, the query performance of spark SQL is > limited by namenode and hive Metastore. There are some caches in the code, > but the effect is limited. Do we have a practical and effective way to > solve the time-consuming problem of driver in concurrent query? > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >