Re: How to improve the concurrent query performance of spark SQL query

2021-08-26 Thread Mich Talebzadeh
There are many ways of interacting with Hive DW from Spark. You can either use the API from Spark to Hive native or you can use JDBC connection (local or remote spark). What is the reference to the driver in this context? Bottom line using concurrent queries, you will have to go through Hive and

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Stephen Coy
Hi Sean, I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will help you out here. Cheers, Steve C On 27 Aug 2021, at 12:29 pm, Sean Owen mailto:sro...@gmail.com>> wrote: OK right, you would have seen a different error otherwise. Yes profiles are only a compile-time thin

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Sean Owen
OK right, you would have seen a different error otherwise. Yes profiles are only a compile-time thing, but they should affect the effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows scala-parallel-collections as a dependency in the POM as expected (not in a profile). However

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Stephen Coy
I did indeed. The generated spark-core_2.13-3.2.0.pom that is created alongside the jar file in the local repo contains: scala-2.13 org.scala-lang.modules scala-parallel-collections_${scala.binary.version} which means this dependency will be missing for unit tes

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Sean Owen
Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to update POMs. It works fine for me. On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy wrote: > Hi all, > > Being adventurous I have built the RC1 code with: > > -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver -Phiv

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Stephen Coy
Hi all, Being adventurous I have built the RC1 code with: -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2 And then attempted to build my Java based spark application. However, I found a number of our unit tests were failing with: j

How to improve the concurrent query performance of spark SQL query

2021-08-26 Thread Tao Li
In the high concurrency scenario, the query performance of spark SQL is limited by namenode and hive Metastore. There are some caches in the code, but the effect is limited. Do we have a practical and effective way to solve the time-consuming problem of driver in concurrent query? -