Support will be released as part of Spark 3.0
Preview:
https://spark.apache.org/docs/3.0.0-preview2/#downloading
Refer:
https://issues.apache.org/jira/browse/SPARK-24417
Is hadoop-3.1 profile supported for this release.? i see lot of UTs failing
under this profile.
https://github.com/apache/spark/blob/v2.4.5-rc2/pom.xml
Example:
[INFO] Running org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
[ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time e
You can try using -pl maven option for this
> mvn clean install -pl :spark-core_2.11
From:Qiu, Gerry
To:zhangliyun ;dev@spark.apache.org
Date:2019-03-26 14:34:20
Subject:RE: How to build single jar for single project in spark
You can try this
https://spark.apache.org/docs/latest/building-spark
I have a question related to Permanent UDF for spark enabled hive support.
When we do create function, this is registered with hive via
spark-sql>create function customfun as
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDay' using jar
'hdfs:///tmp/hive-exec.jar';
call stack:
org.apach
Hi all
I see UGI credentials (ex sparkCookie) shared from driver to executor is being
lost on driver side in yarn mode. Below is the analysis on start of
thriftserver,
Step 1. SparkSubmit create submit env which does a loginUserFromKeytab
"main@1" prio=5 tid=0x1 nid=NA runnable
java.lang.
: Shivaram Venkataraman
Cc: Ryan Blue; Ajith shetty; dev@spark.apache.org
Subject: Re: [Spark][Scheduler] Spark DAGScheduler scheduling performance
hindered on JobSubmitted Event
It's mostly just hash maps from some ids to some state, and those can be
replaced just with concurrent hash maps
DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted events
has to be processed as DAGSchedulerEventProcessLoop is single threaded and it
will block other tasks in queue like TaskCompletion.
The JobSubmitted event is time consuming depending on the nature of the job
(Example:
Hi all
I am using spark 2.1 and I encounter exception when do concurrent insert on a
table, Here is my scenario and some analysis
create table sample using csv options('path' '/tmp/f/')
When concurrent insert are executed, we see exception like below:
2017-12-29 13:41:11,117 | ERROR | main | A
Hi all
The parallelism of queries executed in given SparkContext can be controlled via
spark.default.parallelism
I have a scenario where need to run multiple concurrent queries in a single
context, but so that to ensure concurrent queries shall be able to utilize the
resources without resource