Re: kubeflow spark operator & SparkHistoryService on k8s - spark driver/executor logs not showing up Spark History Server

2024-10-23 Thread Mat Schaffer
We have a similar setup (EKS/S3) and use promtail to collect pod logs to loki. We haven't tried to get the history UI log links working. Instead we link to both the history server and logs from the same job/cluster overview dashboards in grafana. On Wed, Oct 23, 2024 at 3:36 PM karan alang wrote

kubeflow spark operator & SparkHistoryService on k8s - spark driver/executor logs not showing up Spark History Server

2024-10-23 Thread karan alang
Hello All, I have kubeflow spark operator installed on GKE (in namespace - so350), as well as Spark History Server installed on GKE in namespace shs-350. The spark job is launched in a separate namespaces - spark-apps. When I launch the spark job, it runs fine and I'm able to see the job details i

Re: Re: OOM issue in Spark Driver

2024-06-11 Thread Mich Talebzadeh
In a nutshell, the culprit for the OOM issue in your Spark driver appears to be memory leakage or inefficient memory usage within your application. This could be caused by factors such as: 1. Accumulation of data or objects in memory over time without proper cleanup. 2. Inefficient data

Re: OOM issue in Spark Driver

2024-06-08 Thread Andrzej Zera
Hey, do you perform stateful operations? Maybe your state is growing indefinitely - a screenshot with state metrics would help (you can find it in Spark UI -> Structured Streaming -> your query). Do you have a driver-only cluster or do you have workers too? What's the memory usage profile at worker

OOM issue in Spark Driver

2024-06-07 Thread Karthick Nk
Hi All, I am using the pyspark structure streaming with Azure Databricks for data load process. In the Pipeline I am using a Job cluster and I am running only one pipeline, I am getting the OUT OF MEMORY issue while running for a long time. When I inspect the metrics of the cluster I found that,

Re: Spark driver thread

2020-03-06 Thread Enrico Minack
user@spark.apache.org *Subject:* Re: Spark driver thread Hi james, You can configure the Spark Driver to use more than a single thread. It is something that depends on the application, but the Spark driver can take advantage of multiple threads in many situations. For instance, when the d

Re: Spark driver thread

2020-03-06 Thread Pol Santamaria
------ >> *From:* Pol Santamaria >> *Sent:* Friday, March 6, 2020 12:59 AM >> *To:* James Yu >> *Cc:* user@spark.apache.org >> *Subject:* Re: Spark driver thread >> >> Hi james, >> >> You can configure the Spark Driver to

Re: Spark driver thread

2020-03-06 Thread Russell Spitzer
id still > applicable in cluster mode. Thanks in advance for your further > clarification. > > -- > *From:* Pol Santamaria > *Sent:* Friday, March 6, 2020 12:59 AM > *To:* James Yu > *Cc:* user@spark.apache.org > *Subject:* Re: Spark driver th

Re: Spark driver thread

2020-03-06 Thread James Yu
@spark.apache.org Subject: Re: Spark driver thread Hi james, You can configure the Spark Driver to use more than a single thread. It is something that depends on the application, but the Spark driver can take advantage of multiple threads in many situations. For instance, when the driver

Re: Spark driver thread

2020-03-06 Thread Pol Santamaria
Hi james, You can configure the Spark Driver to use more than a single thread. It is something that depends on the application, but the Spark driver can take advantage of multiple threads in many situations. For instance, when the driver program gathers or sends data to the workers. So yes, if

Spark driver thread

2020-03-05 Thread James Yu
Hi, Does a Spark driver always works as single threaded? If yes, does it mean asking for more than one vCPU for the driver is wasteful? Thanks, James

Spark driver crashed with internal error

2019-04-07 Thread Manu Zhang
Hi all, Recently, our Spark application's (2.3.1) driver has been crashing before exiting with the following error. * Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled* * #* * # A fatal error has been detected by the Java Runtime Environment:* * #* * # Internal Error

Fwd: Spark driver pod scheduling fails on auto scaled node

2019-01-31 Thread prudhvi ch
-- Forwarded message - From: prudhvi ch Date: Thu, Jan 31, 2019, 5:54 PM Subject: Fwd: Spark driver pod scheduling fails on auto scaled node To: -- Forwarded message - From: Prudhvi Chennuru (CONT) Date: Thu, Jan 31, 2019, 5:01 PM Subject: Fwd: Spark driver

Fwd: Spark driver pod scheduling fails on auto scaled node

2019-01-31 Thread prudhvi ch
-- Forwarded message - From: Prudhvi Chennuru (CONT) Date: Thu, Jan 31, 2019, 5:01 PM Subject: Fwd: Spark driver pod scheduling fails on auto scaled node To: Hi, I am using kubernetes *v 1.11.5* and spark *v 2.3.0*, *calico(daemonset)* as overlay network plugin and

Fwd: Spark driver pod scheduling fails on auto scaled node

2019-01-31 Thread Prudhvi Chennuru (CONT)
Hi, I am using kubernetes *v 1.11.5* and spark *v 2.3.0*, *calico(daemonset)* as overlay network plugin and kubernetes *cluster auto scalar* feature to autoscale cluster if needed. When the cluster is auto scaling calico pods are scheduling on those nodes but they are not ready for 40 to 50 se

java.lang.OutOfMemoryError: Java heap space - Spark driver.

2018-08-29 Thread Guillermo Ortiz Fernández
I got this error from spark driver, it seems that I should increase the memory in the driver although it's 5g (and 4 cores) right now. It seems weird to me because I'm not using Kryo or broadcast in this process but in the log there are references to Kryo and broadcast. How could I figu

Re: spark driver pod stuck in Waiting: PodInitializing state in Kubernetes

2018-08-17 Thread purna pradeep
Resurfacing The question to get more attention Hello, > > im running Spark 2.3 job on kubernetes cluster >> >> kubectl version >> >> Client Version: version.Info{Major:"1", Minor:"9", >> GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", >> GitTreeState:"clean", BuildDa

Re: spark driver pod stuck in Waiting: PodInitializing state in Kubernetes

2018-08-16 Thread purna pradeep
Hello, im running Spark 2.3 job on kubernetes cluster > > kubectl version > > Client Version: version.Info{Major:"1", Minor:"9", > GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", > GitTreeState:"clean", BuildDate:"2018-02-09T21:51:06Z", > GoVersion:"go1.9.4", Compile

spark driver pod stuck in Waiting: PodInitializing state in Kubernetes

2018-08-15 Thread purna pradeep
im running Spark 2.3 job on kubernetes cluster kubectl version Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-09T21:51:06Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"da

Re: Spark driver pod garbage collection

2018-05-23 Thread Anirudh Ramanathan
that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled. On Wed, May 23, 2018, 8:34 AM purna pradeep wrote: > Hello, > > Currently I observe dead pods are not getting garbage collected (a

Spark driver pod garbage collection

2018-05-23 Thread purna pradeep
Hello, Currently I observe dead pods are not getting garbage collected (aka spark driver pods which have completed execution). So pods could sit in the namespace for weeks potentially. This makes listing, parsing, and reading pods slower and well as having junk sit on the cluster. I believe

Re: Spark driver pod eviction Kubernetes

2018-05-22 Thread Anirudh Ramanathan
I think a pod disruption budget might actually work here. It can select the spark driver pod using a label. Using that with a minAvailable value that's appropriate here could do it. In a more general sense, we do plan on some future work to support driver recovery which should help long ru

Spark driver pod eviction Kubernetes

2018-05-22 Thread purna pradeep
Hi, What would be the recommended approach to wait for spark driver pod to complete the currently running job before it gets evicted to new nodes while maintenance on the current node is goingon (kernel upgrade,hardware maintenance etc..) using drain command I don’t think I can use

how sequence of chained jars in spark.(driver/executor).extraClassPath matters

2017-09-13 Thread Richard Xin
so let's say I have chained path in spark.driver.extraClassPath/spark.executor.extraClassPath such as /path1/*:/path2/*, and I have different versions of the same jar under those 2 directories, how spark pick the version of jar to use, from /path1/*? Thanks.

Re: Spark driver CPU usage

2017-03-01 Thread Yong Zhang
e.org Subject: RE: Spark driver CPU usage Does that configuration parameter affect the CPU usage of the driver? If it does, we have that property unchanged from its default value of "1" yet the same behaviour as before. -Original Message- From: Rohit Verma [mailto:rohit.ve...@roki

RE: Spark driver CPU usage

2017-03-01 Thread Phadnis, Varun
7 06:08 To: Phadnis, Varun Cc: user@spark.apache.org Subject: Re: Spark driver CPU usage Use conf spark.task.cpus to control number of cpus to use in a task. On Mar 1, 2017, at 5:41 PM, Phadnis, Varun wrote: > > Hello, > > Is there a way to control CPU usage for driver when running app

Re: Spark driver CPU usage

2017-03-01 Thread Rohit Verma
Use conf spark.task.cpus to control number of cpus to use in a task. On Mar 1, 2017, at 5:41 PM, Phadnis, Varun wrote: > > Hello, > > Is there a way to control CPU usage for driver when running applications in > client mode? > > Currently we are observing that the driver occupies all the co

Spark driver CPU usage

2017-03-01 Thread Phadnis, Varun
Hello, Is there a way to control CPU usage for driver when running applications in client mode? Currently we are observing that the driver occupies all the cores. Launching just 3 instances of driver of WordCount sample application concurrently on the same machine brings the usage of its 4 cor

Re: RDD blocks on Spark Driver

2017-02-28 Thread Prithish
. >> >> liangyihuai >> >> ---Original--- >> *From:* "Jacek Laskowski " >> *Date:* 2017/2/25 02:45:20 >> *To:* "prithish"; >> *Cc:* "user"; >> *Subject:* Re: RDD blocks on Spark Driver >> >> Hi, >> &g

Re: RDD blocks on Spark Driver

2017-02-28 Thread Jonathan Kelly
quot; is relative to spark. > > liangyihuai > > ---Original--- > *From:* "Jacek Laskowski " > *Date:* 2017/2/25 02:45:20 > *To:* "prithish"; > *Cc:* "user"; > *Subject:* Re: RDD blocks on Spark Driver > > Hi, > > Guess you're use loca

Re: RDD blocks on Spark Driver

2017-02-26 Thread Prithish
which are local, standalone, yarn > and Mesos. Also, "blocks" is relative to hdfs, "partitions" > is relative to spark. > > liangyihuai > > ---Original--- > *From:* "Jacek Laskowski " > *Date:* 2017/2/25 02:45:20 > *To:* "prithish"; > *

Re: RDD blocks on Spark Driver

2017-02-25 Thread liangyhg...@gmail.com
Hi, I think you are using the local model of Spark. There are mainly four models, which are local, standalone,  yarn and Mesos. Also, "blocks" is relative to hdfs, "partitions" is relative to spark.liangyihuai---Original---From: "Jacek Laskowski "Date: 2017/2/25 02:45:20To: "prithish";Cc: "user";Su

Re: RDD blocks on Spark Driver

2017-02-24 Thread Jacek Laskowski
Hi, Guess you're use local mode which has only one executor called driver. Is my guessing correct? Jacek On 23 Feb 2017 2:03 a.m., wrote: > Hello, > > Had a question. When I look at the executors tab in Spark UI, I notice > that some RDD blocks are assigned to the driver as well. Can someone p

RDD blocks on Spark Driver

2017-02-22 Thread prithish
Hello, Had a question. When I look at the executors tab in Spark UI, I notice that some RDD blocks are assigned to the driver as well. Can someone please tell me why? Thanks for the help.

spark driver UI points to the wrong ip. dont know why?

2017-01-26 Thread kant kodali
Hi, I use Spark Standalone cluster and I submitted my application in cluster mode. When I go the Spark Master UI there is a table layout for "Running Applications" and in that table there is a column called "Name" which has the value of my app name and when I click on the link it doesn't work beca

Re: Spark driver not reusing HConnection

2016-11-23 Thread Mukesh Jha
Corrosponding HBase bug: https://issues.apache.org/jira/browse/HBASE-12629 On Wed, Nov 23, 2016 at 1:55 PM, Mukesh Jha wrote: > The solution is to disable region size caluculation check. > > hbase.regionsizecalculator.enable: false > > On Sun, Nov 20, 2016 at 9:29 PM, Mukesh Jha > wrote: > >> A

Re: Spark driver not reusing HConnection

2016-11-23 Thread Mukesh Jha
The solution is to disable region size caluculation check. hbase.regionsizecalculator.enable: false On Sun, Nov 20, 2016 at 9:29 PM, Mukesh Jha wrote: > Any ideas folks? > > On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha > wrote: > >> Hi >> >> I'm accessing multiple regions (~5k) of an HBase tabl

Re: Spark driver not reusing HConnection

2016-11-20 Thread Mukesh Jha
Any ideas folks? On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha wrote: > Hi > > I'm accessing multiple regions (~5k) of an HBase table using spark's > newAPIHadoopRDD. But the driver is trying to calculate the region size of > all the regions. > It is not even reusing the hconnection and creting a

Spark driver not reusing HConnection

2016-11-18 Thread Mukesh Jha
Hi I'm accessing multiple regions (~5k) of an HBase table using spark's newAPIHadoopRDD. But the driver is trying to calculate the region size of all the regions. It is not even reusing the hconnection and creting a new connection for every request (see below) which is taking lots of time. Is the

appHandle.kill(), SparkSubmit Process, JVM questions related to SparkLauncher design and Spark Driver

2016-11-11 Thread Elkhan Dadashov
40550 MRAppMaster(this is MR APP MASTER container) *Spark Related processes:* 40602 SparkSubmit 40875 CoarseGrainedExecutorBackend 40846 CoarseGrainedExecutorBackend 40815 ExecutorLauncher When Spark app is started via SparkLauncher#startApplication(), Spark driver (inside SparkSubmit) is st

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
ve >>>> attached jstack dump here. I do a simple MapToPair and reduceByKey and I >>>> have a window Interval of 1 minute (6ms) and batch interval of 1s ( >>>> 1000) This is generating lot of threads atleast 5 to 8 threads per >>>> second and the total number of threads is monotonically increasing. So just >>>> for tweaking purpose I changed my window interval to 1min (6ms) and >>>> batch interval of 10s (1) this looked lot better but still not >>>> ideal at very least it is not monotonic anymore (It goes up and down). Now >>>> my question really is how do I tune such that my number of threads are >>>> optimal while satisfying the window Interval of 1 minute (6ms) and >>>> batch interval of 1s (1000) ? >>>> >>>> This jstack dump is taken after running my spark driver program for 2 >>>> mins and there are about 1000 threads. >>>> >>>> Thanks! >>>> >>>> >>>> >>>> >> >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread Shixiong(Ryan) Zhu
e (6ms) and batch interval of 1s ( >>> 1000) This is generating lot of threads atleast 5 to 8 threads per >>> second and the total number of threads is monotonically increasing. So just >>> for tweaking purpose I changed my window interval to 1min (6ms) and >>> batch interval of 10s (1) this looked lot better but still not >>> ideal at very least it is not monotonic anymore (It goes up and down). Now >>> my question really is how do I tune such that my number of threads are >>> optimal while satisfying the window Interval of 1 minute (6ms) and >>> batch interval of 1s (1000) ? >>> >>> This jstack dump is taken after running my spark driver program for 2 >>> mins and there are about 1000 threads. >>> >>> Thanks! >>> >>> >>> >>> >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
rval of 10s (1) this looked lot better but still not ideal >> at very least it is not monotonic anymore (It goes up and down). Now my >> question really is how do I tune such that my number of threads are >> optimal while satisfying the window Interval of 1 minute (6ms) and >> batch interval of 1s (1000) ? >> >> This jstack dump is taken after running my spark driver program for 2 >> mins and there are about 1000 threads. >> >> Thanks! >> >> >> >>

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread Shixiong(Ryan) Zhu
his looked lot better but still not ideal at > very least it is not monotonic anymore (It goes up and down). Now my > question really is how do I tune such that my number of threads are > optimal while satisfying the window Interval of 1 minute (6ms) and > batch interval of 1s (1000) ? > > This jstack dump is taken after running my spark driver program for 2 mins > and there are about 1000 threads. > > Thanks! > > > >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
e (It goes up and down). Now >>> my question really is how do I tune such that my number of threads are >>> optimal while satisfying the window Interval of 1 minute (6ms) and >>> batch interval of 1s (1000) ? >>> >>> This jstack dump is taken after running my spark driver program for 2 >>> mins and there are about 1000 threads. >>> >>> Thanks! >>> >> >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
n). Now my >> question really is how do I tune such that my number of threads are >> optimal while satisfying the window Interval of 1 minute (6ms) and >> batch interval of 1s (1000) ? >> >> This jstack dump is taken after running my spark driver program for 2 >> mins and there are about 1000 threads. >> >> Thanks! >> >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread Sean Owen
ms) and > batch interval of 1s (1000) ? > > This jstack dump is taken after running my spark driver program for 2 mins > and there are about 1000 threads. > > Thanks! >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
number of threads are >> optimal while satisfying the window Interval of 1 minute (6ms) and >> batch interval of 1s (1000) ? >> >> This jstack dump is taken after running my spark driver program for 2 >> mins and there are about 1000 threads. >> >> Thanks! >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
really is how do I tune such that my number of threads are > optimal while satisfying the window Interval of 1 minute (6ms) and > batch interval of 1s (1000) ? > > This jstack dump is taken after running my spark driver program for 2 mins > and there are about 1000 threads. &

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Shixiong(Ryan) Zhu
>>>>>>>> that it doesn't spawn any other threads. It only calls MapToPair, >>>>>>>> ReduceByKey, forEachRDD, Collect functions. >>>>>>>> >>>>>>>> public class NSQReceiver extends Receiver { >&

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
t;>>>> >>>>>>> public class NSQReceiver extends Receiver { >>>>>>> >>>>>>> private String topic=""; >>>>>>> >>>>>>> public NSQReceiver(String topic) { >>>>>

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Shixiong(Ryan) Zhu
gt;>>>> super(StorageLevel.MEMORY_AND_DISK_2()); >>>>>> this.topic = topic; >>>>>> } >>>>>> >>>>>> @Override >>>>>> public void *onStart()* { >>>>>>

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
= topic; >>>>> } >>>>> >>>>> @Override >>>>> public void *onStart()* { >>>>> new Thread() { >>>>> @Override public void run() { >>>>> receive(); >>>

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Shixiong(Ryan) Zhu
t;> new Thread() { >>>> @Override public void run() { >>>> receive(); >>>> } >>>> }.start(); >>>> } >>>> >>>> } >>>> >>>> >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Sean Owen
; this.topic = topic; > } > > @Override > public void *onStart()* { > new Thread() { > @Override public void run() { > receive(); > } > }.start(); > } > > } > > > Environment inf

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
} >>> }.start(); >>> } >>> >>> } >>> >>> >>> Environment info: >>> >>> Java 8 >>> >>> Scala 2.11.8 >>> >>> Spark 2.0.0 >>> >>> More than happy to share

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Shixiong(Ryan) Zhu
vironment info: >> >> Java 8 >> >> Scala 2.11.8 >> >> Spark 2.0.0 >> >> More than happy to share any other info you may need. >> >> >> On Mon, Oct 31, 2016 at 11:05 AM, Jakob Odersky >> wrote: >> >>> > how

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
} > }.start(); > } > > } > > > Environment info: > > Java 8 > > Scala 2.11.8 > > Spark 2.0.0 > > More than happy to share any other info you may need. > > > On Mon, Oct 31, 2016 at 11:05 AM, Jakob Odersky wrote: > >> >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
receive(); } }.start(); } } Environment info: Java 8 Scala 2.11.8 Spark 2.0.0 More than happy to share any other info you may need. On Mon, Oct 31, 2016 at 11:05 AM, Jakob Odersky wrote: > > how do I tell my spark driver program to not create so many? > > This may de

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Jakob Odersky
> how do I tell my spark driver program to not create so many? This may depend on your driver program. Do you spawn any threads in it? Could you share some more information on the driver program, spark version and your environment? It would greatly help others to help you On Mon, Oct 31, 2

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
unlimited max user processes (-u) 120242 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So at this point I do understand that the I am running out of memory due to allocation of threads so my biggest question is how do I tell my spark driver progra

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Sean Owen
> ps -elfT | grep "spark-driver-program.jar" | wc -l > > The result is around 32K. why does it create so many threads how can I > limit this? >

why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
when I do ps -elfT | grep "spark-driver-program.jar" | wc -l The result is around 32K. why does it create so many threads how can I limit this?

Re: Getting the IP address of Spark Driver in yarn-cluster mode

2016-10-25 Thread Masood Krohy
;Could not connect to server on %s' % nodes[amHost] -- Masood Krohy, Ph.D. Data Scientist, Intact Lab-R&D Intact Financial Corporation De :Steve Loughran A : Masood Krohy Cc :"user@spark.apache.org" Date : 2016-10-24 17:09 Obj

Re: Getting the IP address of Spark Driver in yarn-cluster mode

2016-10-24 Thread Steve Loughran
On 24 Oct 2016, at 19:34, Masood Krohy mailto:masood.kr...@intact.net>> wrote: Hi everyone, Is there a way to set the IP address/hostname that the Spark Driver is going to be running on when launching a program through spark-submit in yarn-cluster mode (PySpark 1.6.0)? I do not

Getting the IP address of Spark Driver in yarn-cluster mode

2016-10-24 Thread Masood Krohy
Hi everyone, Is there a way to set the IP address/hostname that the Spark Driver is going to be running on when launching a program through spark-submit in yarn-cluster mode (PySpark 1.6.0)? I do not see an option for this. If not, is there a way to get this IP address after the Spark app has

Spark driver memory breakdown

2016-08-26 Thread Mich Talebzadeh
Hi, I alwayd underestimated the significant of setting spark.driver.memory According to documents It is the amount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g, 2g). I was running my application using Spark Standalone so the argument about Local mode

Spark driver memory keeps growing

2016-08-08 Thread Pierre Villard
Hi, I'm running a job on Spark 1.5.2 and I get OutOfMemoryError on broadcast variables access. The thing is I am not sure to understand why the broadcast keeps growing and why it does at this place of code. Basically, I have a large input file, each line having a key. I group by key my lines to h

Re: Spark driver getting out of memory

2016-07-24 Thread Raghava Mutharaju
Saurav, We have the same issue. Our application runs fine on 32 nodes with 4 cores each and 256 partitions but gives an OOM on the driver when run on 64 nodes with 512 partitions. Did you get to know the reason behind this behavior or the relation between number of partitions and driver RAM usage?

Re: Spark driver getting out of memory

2016-07-20 Thread RK Aduri
Cache defaults to MEMORY_ONLY. Can you try with different storage levels ,i.e., MEMORY_ONLY_SER or even DISK_ONLY. you may want to use persist( ) instead of cache. Or there is an experimental storage level OFF_HEAP which might also help. On Tue, Jul 19, 2016 at 11:08 PM, Saurav Sinha wrote: > H

Re: Spark driver getting out of memory

2016-07-19 Thread Saurav Sinha
Hi, I have set driver memory 10 GB and job ran with intermediate failure which is recovered back by spark. But I still what to know if no of parts increases git driver ram need to be increased and what is ration of no of parts/RAM. @RK : I am using cache on RDD. Is this reason of high RAM utiliz

Re: Spark driver getting out of memory

2016-07-19 Thread RK Aduri
Just want to see if this helps. Are you doing heavy collects and persist that? If that is so, you might want to parallelize that collection by converting to an RDD. Thanks, RK On Tue, Jul 19, 2016 at 12:09 AM, Saurav Sinha wrote: > Hi Mich, > >1. In what mode are you running the spark stan

Re: Spark driver getting out of memory

2016-07-19 Thread Saurav Sinha
Hi Mich, 1. In what mode are you running the spark standalone, yarn-client, yarn cluster etc Ans: spark standalone 1. You have 4 nodes with each executor having 10G. How many actual executors do you see in UI (Port 4040 by default) Ans: There are 4 executor on which am using 8 cores

Re: Spark driver getting out of memory

2016-07-18 Thread Mich Talebzadeh
can you please clarify: 1. In what mode are you running the spark standalone, yarn-client, yarn cluster etc 2. You have 4 nodes with each executor having 10G. How many actual executors do you see in UI (Port 4040 by default) 3. What is master memory? Are you referring to diver memo

Re: Spark driver getting out of memory

2016-07-18 Thread Saurav Sinha
I have set --drive-memory 5g. I need to understand that as no of partition increase drive-memory need to be increased. What will be best ration of No of partition/drive-memory. On Mon, Jul 18, 2016 at 4:07 PM, Zhiliang Zhu wrote: > try to set --drive-memory xg , x would be as large as can be set

Re: Spark driver getting out of memory

2016-07-18 Thread Zhiliang Zhu
try to set --drive-memory xg , x would be as large as can be set .  On Monday, July 18, 2016 6:31 PM, Saurav Sinha wrote: Hi, I am running spark job. Master memory - 5Gexecutor memort 10G(running on 4 node) My job is getting killed as no of partition increase to 20K. 16/07/18 14:53:13 I

Spark driver getting out of memory

2016-07-18 Thread Saurav Sinha
Hi, I am running spark job. Master memory - 5G executor memort 10G(running on 4 node) My job is getting killed as no of partition increase to 20K. 16/07/18 14:53:13 INFO DAGScheduler: Got job 17 (foreachPartition at WriteToKafka.java:45) with 13524 output partitions (allowLocal=false) 16/07/18

Logs of spark driver in yarn-client mode.

2016-07-06 Thread Egor Pahomov
Hi, I have next issue: I have zeppelin, which set up in yarn-client mode. Notebook in Running state for long period of time with 0% done and I do not see any even accepted application in yarn. To be able to understand what's going on, I need logs of spark driver, which is trying to conne

Re: Spark driver assigning splits to incorrect workers

2016-07-04 Thread Raajen Patel
Hi Ted, Perhaps this might help? Thanks for your response. I am trying to access/read binary files stored over a series of servers. Line used to build RDD: val BIN_pairRDD: RDD[(BIN_Key, BIN_Value)] = spark.newAPIHadoopFile("not.used", classOf[BIN_InputFormat], classOf[BIN_Key], classOf[BIN_Valu

Re: Spark driver assigning splits to incorrect workers

2016-07-01 Thread Ted Yu
se I can check, or change, to force the driver to send these tasks > to > the right workers? > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-driver-assigning-splits-to-incorrect-workers-tp

Spark driver assigning splits to incorrect workers

2016-07-01 Thread Raajen
can check, or change, to force the driver to send these tasks to the right workers? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-driver-assigning-splits-to-incorrect-workers-tp27261.html Sent from the Apache Spark User List mailing list

Re: Set the node the spark driver will be started

2016-06-30 Thread Felix Massem
en >>> evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist >>> nicht gestattet. >>> >>>> Am 28.06.2016 um 17:04 schrieb Mich Talebzadeh >>> <mailto:mich.talebza...@gmail.com>>: >>>> >>>> Hi F

Re: Set the node the spark driver will be started

2016-06-29 Thread Bryan Cutler
tl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>> E-Mail ist nicht gestattet. >>> >>> Am 28.06.2016 um 17:04 schrieb Mich Talebzadeh < >>> mich.talebza..

Re: Set the node the spark driver will be started

2016-06-29 Thread Mich Talebzadeh
ew?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> http://talebzadehmich.wordpress.com >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >&

Re: Set the node the spark driver will be started

2016-06-29 Thread Felix Massem
>>> >>> HTH >>> >>> >>> Dr Mich Talebzadeh >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> <https://www.linkedin.com/profile/view?id=

Re: Set the node the spark driver will be started

2016-06-29 Thread Felix Massem
wn risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arisin

Re: Set the node the spark driver will be started

2016-06-28 Thread Mich Talebzadeh
your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such

Re: Set the node the spark driver will be started

2016-06-28 Thread Felix Massem
's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 28 June 2016 at 15:27, adaman79 <mailto:felix.mas...@codecentric.de>> wrote: > Hey guys, >

Re: Set the node the spark driver will be started

2016-06-28 Thread Mich Talebzadeh
ble for any monetary damages arising from such loss, damage or destruction. On 28 June 2016 at 15:27, adaman79 wrote: > Hey guys, > > I have a problem with memory because over 90% of my spark driver will be > started on one of my nine spark nodes. > So now I am looking for the poss

Set the node the spark driver will be started

2016-06-28 Thread adaman79
Hey guys, I have a problem with memory because over 90% of my spark driver will be started on one of my nine spark nodes. So now I am looking for the possibility to define the node the spark driver will be started when using spark-submit or setting it somewhere in the code. Is this possible

RE: How to add a custom jar file to the Spark driver?

2016-03-09 Thread Gerhard Fiedler
/create-cluster.html) doesn’t have a similar argument. Gerhard From: Sonal Goyal [mailto:sonalgoy...@gmail.com] Sent: Wed, Mar 09, 2016 04:28 To: Wang, Daoyuan Cc: Gerhard Fiedler; user@spark.apache.org Subject: Re: How to add a custom jar file to the Spark driver? Hi Gerhard, I just stumbled upon

Re: How to add a custom jar file to the Spark driver?

2016-03-09 Thread Sonal Goyal
rch 09, 2016 5:41 AM > *To:* user@spark.apache.org > *Subject:* How to add a custom jar file to the Spark driver? > > > > We’re running Spark 1.6.0 on EMR, in YARN client mode. We run Python code, > but we want to add a custom jar file to the driver. > > > > When runni

RE: How to add a custom jar file to the Spark driver?

2016-03-08 Thread Wang, Daoyuan
updated SparkConf to instantiate your SparkContext. Thanks, Daoyuan From: Gerhard Fiedler [mailto:gfied...@algebraixdata.com] Sent: Wednesday, March 09, 2016 5:41 AM To: user@spark.apache.org Subject: How to add a custom jar file to the Spark driver? We're running Spark 1.6.0 on EMR, in YARN c

How to add a custom jar file to the Spark driver?

2016-03-08 Thread Gerhard Fiedler
We're running Spark 1.6.0 on EMR, in YARN client mode. We run Python code, but we want to add a custom jar file to the driver. When running on a local one-node standalone cluster, we just use spark.driver.extraClassPath and everything works: spark-submit --conf spark.driver.extraClassPath=/path

Re: spark driver in docker

2016-03-05 Thread Timothy Chen
what ports need to be exposed. With mesos we had a lot of problems with >> container networking but yes the --net=host is a shortcut. >> >> Tamas >> >> >> >>> On 4 March 2016 at 22:37, yanlin wang wrote: >>> We would like to run multiple spark d

Re: spark driver in docker

2016-03-05 Thread Mailing List
e exposed. With mesos we had a lot of problems with > container networking but yes the --net=host is a shortcut. > > Tamas > > > >> On 4 March 2016 at 22:37, yanlin wang wrote: >> We would like to run multiple spark driver in docker container. Any >> sugg

Re: spark driver in docker

2016-03-05 Thread Tamas Szuromi
ple spark driver in docker container. Any > suggestion for the port expose and network settings for docker so driver is > reachable by the worker nodes? —net=“hosts” is the last thing we want to do. > > Thx > Yanlin > -

spark driver in docker

2016-03-04 Thread yanlin wang
We would like to run multiple spark driver in docker container. Any suggestion for the port expose and network settings for docker so driver is reachable by the worker nodes? —net=“hosts” is the last thing we want to do. Thx Yanlin

Re: Error getting response from spark driver rest APIs : java.lang.IncompatibleClassChangeError: Implementing class

2015-12-26 Thread Hokam Singh Chauhan
> > com.sun.jersey.api.core.ScanningResourceConfig.init(ScanningResourceConfig.java:79) > at > > com.sun.jersey.api.core.PackagesResourceConfig.init(PackagesResourceConfig.java:104) > at > > com.sun.jersey.api.core.PackagesResourceConfig.(PackagesResourceConfig.java:78) >

Error getting response from spark driver rest APIs : java.lang.IncompatibleClassChangeError: Implementing class

2015-12-16 Thread ihavethepotential
Please help. Thanks in advance. Regards, Rakesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-getting-response-from-spark-driver-rest-APIs-java-lang-IncompatibleClassChangeError-Implementis-tp25724.html Sent from the Apache Spark User List mailing list

  1   2   >