There are many ways of interacting with Hive DW from Spark.
You can either use the API from Spark to Hive native or you can use JDBC
connection (local or remote spark).
What is the reference to the driver in this context? Bottom line using
concurrent queries, you will have to go through Hive and
Hi Sean,
I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will help
you out here.
Cheers,
Steve C
On 27 Aug 2021, at 12:29 pm, Sean Owen
mailto:sro...@gmail.com>> wrote:
OK right, you would have seen a different error otherwise.
Yes profiles are only a compile-time thin
OK right, you would have seen a different error otherwise.
Yes profiles are only a compile-time thing, but they should affect the
effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows
scala-parallel-collections as a dependency in the POM as expected (not in a
profile). However
I did indeed.
The generated spark-core_2.13-3.2.0.pom that is created alongside the jar file
in the local repo contains:
scala-2.13
org.scala-lang.modules
scala-parallel-collections_${scala.binary.version}
which means this dependency will be missing for unit tes
Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to
update POMs. It works fine for me.
On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy
wrote:
> Hi all,
>
> Being adventurous I have built the RC1 code with:
>
> -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver -Phiv
Hi all,
Being adventurous I have built the RC1 code with:
-Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver -Phive-2.3
-Pscala-2.13 -Dhadoop.version=3.2.2
And then attempted to build my Java based spark application.
However, I found a number of our unit tests were failing with:
j
In the high concurrency scenario, the query performance of spark SQL is limited
by namenode and hive Metastore. There are some caches in the code, but the
effect is limited. Do we have a practical and effective way to solve the
time-consuming problem of driver in concurrent query?
-