Re: Spark Thrift Server Concurrency

2016-06-26 Thread Prabhu Joseph
nd not others? > > It sounds like an interesting problem… > > On Jun 23, 2016, at 5:21 AM, Prabhu Joseph > wrote: > > Hi All, > >On submitting 20 parallel same SQL query to Spark Thrift Server, the > query execution time for some queries are less than a second and some a

Spark Thrift Server Concurrency

2016-06-23 Thread Prabhu Joseph
concurrency is affected by Single Driver. How to improve the concurrency and what are the best practices. Thanks, Prabhu Joseph

Re: Spark Scheduler creating Straggler Node

2016-03-08 Thread Prabhu Joseph
cate hot cached blocks right? > > > On Tuesday, March 8, 2016, Prabhu Joseph > wrote: > >> Hi All, >> >> When a Spark Job is running, and one of the Spark Executor on Node A >> has some partitions cached. Later for some other stage, Scheduler tries to

Spark Scheduler creating Straggler Node

2016-03-08 Thread Prabhu Joseph
shuffle files from an external service instead of from each other which will offload the load on Spark Executors. We want to check whether a similar thing of an External Service is implemented for transferring the cached partition to other executors. Thanks, Prabhu Joseph

Spark Custom Partitioner not picked

2016-03-06 Thread Prabhu Joseph
= { val pieces = line.split(' ') val level = pieces(2).toString val one = pieces(0).toString val two = pieces(1).toString (level,LogClass(one,two)) } val output = logData.map(x => parse(x)) *val partitioned = output.partitionBy(new ExactPartitioner(5)).persist()val groups = partitioned.groupByKey(new ExactPartitioner(5))* groups.count() output.partitions.size partitioned.partitions.size } } Thanks, Prabhu Joseph

Re: Spark log4j fully qualified class name

2016-02-27 Thread Prabhu Joseph
: > Looking at > > https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html > > *WARNING* Generating the caller class information is slow. Thus, use > should be avoided unless execution speed is not an issue. > > On Sat, Feb 27, 2016 at 12:40 PM, Prabhu

Spark log4j fully qualified class name

2016-02-27 Thread Prabhu Joseph
4:40 ERROR org.apache.spark.Logging$class: Failed to create any local dir. 16/02/27 15:34:40 INFO org.apache.spark.Logging$class: Shutdown hook called 16/02/27 15:34:40 INFO org.apache.spark.Logging$class: Deleting directory /tmp/spark-5544c349-0393-4bd0-8aab-c20331a9a1cf Thanks, Prabhu Joseph

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
YARN-2026 has fixed the issue. On Thu, Feb 25, 2016 at 4:17 AM, Prabhu Joseph wrote: > You are right, Hamel. It should get 10 TB /2. And In hadoop-2.7.0, it is > working fine. But in hadoop-2.5.1, it gets only 10TB/230. The same > configuration used in both versions. > So i think

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
dea of what your queues/actual resource > usage is like. Logs from each of your Spark applications would also be > useful. Basically the more info the better. > > On Wed, Feb 24, 2016 at 2:52 PM Prabhu Joseph > wrote: > >> Hi Hamel, >> >> Thanks for looki

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
ty and reservation. The question is how much preemption tries to preempt the queue A if it holds the entire resource without releasing? Could not able to share the actual configuration, but the answer to the question here will help us. Thanks, Prabhu Joseph On Wed, Feb 24, 2016 at 10:03 PM, Ham

Spark Job on YARN Hogging the entire Cluster resource

2016-02-23 Thread Prabhu Joseph
r new YARN application type with similar behavior. We want YARN to control this behavior by killing the resources which is hold by first job for longer period. Thanks, Prabhu Joseph

Re: Concurreny does not improve for Spark Jobs with Same Spark Context

2016-02-18 Thread Prabhu Joseph
java old threading is used somewhere. On Friday, February 19, 2016, Jörn Franke wrote: > How did you configure YARN queues? What scheduler? Preemption ? > > > On 19 Feb 2016, at 06:51, Prabhu Joseph > wrote: > > > > Hi All, > > > >When running con

Concurreny does not improve for Spark Jobs with Same Spark Context

2016-02-18 Thread Prabhu Joseph
taking 2-3 times longer than A, which shows concurrency does not improve with shared Spark Context. [Spark Job Server] Thanks, Prabhu Joseph

Re: Long running Spark job on YARN throws "No AMRMToken"

2016-02-11 Thread Prabhu Joseph
ed your help to find scenario where "No AMRMToken" will happen, an user added with a token but later that token is missing. Is token removed since expired? Thanks, Prabhu Joseph On Wed, Feb 10, 2016 at 12:59 AM, Hari Shreedharan < hshreedha...@cloudera.com> wrote: > The cred

Re: Spark Job on YARN accessing Hbase Table

2016-02-10 Thread Prabhu Joseph
hadoop-2.5.1 and hence spark.yarn.dist.files does not work with hadoop-2.5.1, spark.yarn.dist.files works fine on hadoop-2.7.0, as CWD/* is included in container classpath through some bug fix. Searching for the JIRA. Thanks, Prabhu Joseph On Wed, Feb 10, 2016 at 4:04 PM, Ted Yu wrote: > H

Re: Spark Job on YARN accessing Hbase Table

2016-02-10 Thread Prabhu Joseph
of hbase client jars, when i checked launch container.sh , Classpath does not have $PWD/* and hence all the hbase client jars are ignored. Is spark.yarn.dist.files not for adding jars into the executor classpath. Thanks, Prabhu Joseph On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph wrote: >

Re: Long running Spark job on YARN throws "No AMRMToken"

2016-02-08 Thread Prabhu Joseph
+ Spark-Dev On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph wrote: > Hi All, > > A long running Spark job on YARN throws below exception after running > for few days. > > yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row. > org.apache.hadoop.yarn.exceptio

Re: Spark job does not perform well when some RDD in memory and some on Disk

2016-02-04 Thread Prabhu Joseph
> must be the process of putting ..." > - Edsger Dijkstra > > "If you pay peanuts you get monkeys" > > > 2016-02-04 11:33 GMT+01:00 Prabhu Joseph : > >> Okay, the reason for the task delay within executor when some RDD in >> memory and some in Hadoop i.

Re: Spark job does not perform well when some RDD in memory and some on Disk

2016-02-04 Thread Prabhu Joseph
up and launching it on a less-local node. So after making it 0, all tasks started parallel. But learned that it is better not to reduce it to 0. On Mon, Feb 1, 2016 at 2:02 PM, Prabhu Joseph wrote: > Hi All, > > > Sample Spark application which reads a logfile from hadoop (1.2GB

Spark saveAsHadoopFile stage fails with ExecutorLostfailure

2016-02-02 Thread Prabhu Joseph
, saveAsHadoopFile runs fine. What could be the reason for ExecutorLostFailure failing when cores per executor is high. Error: ExecutorLostFailure (executor 3 lost) 16/02/02 04:22:40 WARN TaskSetManager: Lost task 1.3 in stage 15.0 (TID 1318, hdnprd-c01-r01-14): Thanks, Prabhu Joseph

Spark Executor retries infinitely

2016-02-01 Thread Prabhu Joseph
ores, 2.0 GB RAM 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2848 is now LOADING 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2848 is now RUNNING .... Thanks, Prabhu Joseph

Spark on YARN job continuously reports "Application does not exist in cache"

2016-01-13 Thread Prabhu Joseph
application attempt, there are many finishApplicationMaster request causing the ERROR. Need your help to understand on what scenario the above happens. JIRA's related are https://issues.apache.org/jira/browse/SPARK-1032 https://issues.apache.org/jira/browse/SPARK-3072 Thanks, Prabhu Joseph