RE: Java 11

2020-02-03 Thread Ajith shetty
Support will be released as part of Spark 3.0 Preview: https://spark.apache.org/docs/3.0.0-preview2/#downloading Refer: https://issues.apache.org/jira/browse/SPARK-24417

RE: [VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-02 Thread Ajith shetty
Is hadoop-3.1 profile supported for this release.? i see lot of UTs failing under this profile. https://github.com/apache/spark/blob/v2.4.5-rc2/pom.xml Example: [INFO] Running org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time e

RE: How to build single jar for single project in spark

2019-03-26 Thread Ajith shetty
You can try using -pl maven option for this > mvn clean install -pl :spark-core_2.11 From:Qiu, Gerry To:zhangliyun ;dev@spark.apache.org Date:2019-03-26 14:34:20 Subject:RE: How to build single jar for single project in spark You can try this https://spark.apache.org/docs/latest/building-spark

Permanent UDF support across session

2018-09-19 Thread Ajith shetty
I have a question related to Permanent UDF for spark enabled hive support. When we do create function, this is registered with hive via spark-sql>create function customfun as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDay' using jar 'hdfs:///tmp/hive-exec.jar'; call stack: org.apach

[Spark][Security] UGI credentials lost between driver and executor in yarn mode

2018-03-20 Thread Ajith shetty
Hi all I see UGI credentials (ex sparkCookie) shared from driver to executor is being lost on driver side in yarn mode. Below is the analysis on start of thriftserver, Step 1. SparkSubmit create submit env which does a loginUserFromKeytab "main@1" prio=5 tid=0x1 nid=NA runnable java.lang.

RE: [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event

2018-03-07 Thread Ajith shetty
: Shivaram Venkataraman Cc: Ryan Blue; Ajith shetty; dev@spark.apache.org Subject: Re: [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event It's mostly just hash maps from some ids to some state, and those can be replaced just with concurrent hash maps

[Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event

2018-03-04 Thread Ajith shetty
DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted events has to be processed as DAGSchedulerEventProcessLoop is single threaded and it will block other tasks in queue like TaskCompletion. The JobSubmitted event is time consuming depending on the nature of the job (Example:

Spark SQL : Exception on concurrent insert due to lease over _SUCCESS

2018-01-07 Thread Ajith shetty
Hi all I am using spark 2.1 and I encounter exception when do concurrent insert on a table, Here is my scenario and some analysis create table sample using csv options('path' '/tmp/f/') When concurrent insert are executed, we see exception like below: 2017-12-29 13:41:11,117 | ERROR | main | A

Spark SQL : How to make Spark support parallelism per sql

2017-11-23 Thread Ajith shetty
Hi all The parallelism of queries executed in given SparkContext can be controlled via spark.default.parallelism I have a scenario where need to run multiple concurrent queries in a single context, but so that to ensure concurrent queries shall be able to utilize the resources without resource