[Spark on k8s] A issue of k8s resource creation order

2024-05-29 Thread Tao Yang
Hi, team! I have a spark on k8s issue which posts in https://stackoverflow.com/questions/78537132/spark-on-k8s-resource-creation-order Need help

Re: Log file location in Spark on K8s

2023-10-09 Thread Prashant Sharma
Hi Sanket, Driver and executor logs are written to stdout by default, it can be configured using SPARK_HOME/conf/log4j.properties file. The file including the entire SPARK_HOME/conf is auto propogateded to all driver and executor container and mounted as volume. Thanks On Mon, 9 Oct, 2023, 5:37

Log file location in Spark on K8s

2023-10-09 Thread Agrawal, Sanket
Hi All, We are trying to send the spark logs using fluent-bit. We validated that fluent-bit is able to move logs of all other pods except the driver/executor pods. It would be great if someone can guide us where should I look for spark logs in Spark on Kubernetes with client/cluster mode deplo

Two new tickets for Spark on K8s

2023-08-26 Thread Mich Talebzadeh
Hi, @holden Karau recently created two Jiras that deal with two items of interest namely: 1. Improve Spark Driver Launch Time SPARK-44950 2. Improve Spark Dynamic Allocation SPARK-44951

Re: The performance difference when running Apache Spark on K8s and traditional server

2023-07-27 Thread Mich Talebzadeh
Google Dataproc which I mentioned. With regard to your questions: Q1: What are the causes and reasons for Spark on K8s to be slower than Serverful? --> It should be noted that Spark on Kubernetes is work in progress and as of now there is future work outstanding. It is not in parity with Spark

The performance difference when running Apache Spark on K8s and traditional server

2023-07-27 Thread Trường Trần Phan An
Hi all, I am learning about the performance difference of Spark when performing a JOIN problem on Serverless (K8S) and Serverful (Traditional server) environments. Through experiment, Spark on K8s tends to run slower than Serverful. Through understanding the architecture, I know that Spark runs

Re: spark on k8s daemonset collect log

2023-03-14 Thread Cheng Pan
The filebeat supports multiline matching, here is an example[1] BTW, I’m working on External Log Service integration[2], it may be useful in your case, feel free to review/left comments [1] https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html#multiline [2] https://github

spark on k8s daemonset collect log

2023-03-14 Thread 404
hi, all Spark runs on k8s, uses daemonset filebeat to collect logs, and writes them to elasticsearch. The docker logs are in json format, and each line is a json string. How to merge multi-line exceptions?

RE: [EXTERNAL] Re: Spark on K8s - repeating annoying exception

2022-05-15 Thread Shay Elbaz
Hi Martin, Thanks for the help :) I tried to set those keys to high value but the error persists every 90 seconds. Shay From: Martin Grigorov Sent: Friday, May 13, 2022 4:15 PM To: Shay Elbaz Cc: user@spark.apache.org Subject: [EXTERNAL] Re: Spark on K8s - repeating annoying exception

Re: Spark on K8s - repeating annoying exception

2022-05-13 Thread Martin Grigorov
Hi, On Mon, May 9, 2022 at 5:57 PM Shay Elbaz wrote: > Hi all, > > > > I apologize for reposting this from Stack Overflow, but it got very little > attention and now comment. > > > > I'm using Spark 3.2.1 image that was built from the official distribution > via `docker-image-tool.sh', on Kubern

Spark on K8s - repeating annoying exception

2022-05-09 Thread Shay Elbaz
Hi all, I apologize for reposting this from Stack Overflow, but it got very little attention and now comment. I'm using Spark 3.2.1 image that was built from the official distribution via `docker-image-tool.sh', on Kubernetes 1.18 cluster. Everything works fine, except for this error message on

Re: Spark on K8s , some applications ended ungracefully

2022-03-31 Thread Martin Grigorov
Hi, On Thu, Mar 31, 2022 at 4:18 PM Pralabh Kumar wrote: > Hi Spark Team > > Some of my spark applications on K8s ended with the below error . These > applications though completed successfully (as per the event log > SparkListenerApplicationEnd event at the end) > stil have even files with .inp

Spark on K8s , some applications ended ungracefully

2022-03-31 Thread Pralabh Kumar
Hi Spark Team Some of my spark applications on K8s ended with the below error . These applications though completed successfully (as per the event log SparkListenerApplicationEnd event at the end) stil have even files with .inprogress. This causes the application to be shown as inprogress in SHS.

Spark on k8s issues with s3a committer dependencies or config?

2022-03-19 Thread Prasad Paravatha
Hi all, I am trying out Spark 3.2.1 on k8s using Hadoop 3.3.1 Running into issues with writing to s3 bucket using TemporaryAWSCredentialsProvider https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Using_Session_Credentials_with_TemporaryAWSCredentialsProvider While readi

Skip single integration test case in Spark on K8s

2022-03-16 Thread Pralabh Kumar
Hi Spark team I am running Spark kubernetes integration test suite on cloud. build/mvn install \ -f pom.xml \ -pl resource-managers/kubernetes/integration-tests -am -Pscala-2.12 -Phadoop-3.1.1 -Phive -Phive-thriftserver -Pyarn -Pkubernetes -Pkubernetes-integration-tests \ -Djava.version=8 \

Re: Spark on K8s : property simillar to yarn.max.application.attempt

2022-02-04 Thread Mich Talebzadeh
alabh Kumar wrote: > Hi Spark Team > > I am running spark on K8s and looking for a > property/mechanism similar to yarn.max.application.attempt . I know this > is not really a spark question , but i thought if anyone have faced the > similar issue, > > Basically I want if

Spark on K8s : property simillar to yarn.max.application.attempt

2022-02-04 Thread Pralabh Kumar
Hi Spark Team I am running spark on K8s and looking for a property/mechanism similar to yarn.max.application.attempt . I know this is not really a spark question , but i thought if anyone have faced the similar issue, Basically I want if my driver pod fails , it should be retried on a different

Re: Spark on k8s : spark 3.0.1 spark.kubernetes.executor.deleteontermination issue

2022-01-18 Thread Pralabh Kumar
Does this property spark.kubernetes.executor.deleteontermination checks whether the executor which is deleted have shuffle data or not ? On Tue, 18 Jan 2022, 11:20 Pralabh Kumar, wrote: > Hi spark team > > Have cluster wide property spark.kubernetis.executor.deleteontermination > to true. > Duri

Spark on k8s : spark 3.0.1 spark.kubernetes.executor.deleteontermination issue

2022-01-17 Thread Pralabh Kumar
Hi spark team Have cluster wide property spark.kubernetis.executor.deleteontermination to true. During the long running job, some of the executor got deleted which have shuffle data. Because of this, in the subsequent stage , we get lot of spark shuffle fetch fail exceptions. Please let me know

Spark on k8s

2021-10-13 Thread Mich Talebzadeh
I have done some observations on running Spark on Kubernetes (AKA k8s). The model works on the basis of the "one-container-per-Pod" model meaning that for each node of the cluster you will have one node running the driver and each remaining no

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-23 Thread Mich Talebzadeh
; a qualified benchmark I am afraid. >> >> >> >> >>view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-23 Thread Julien Laurenceau
bzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The autho

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-06 Thread Mich Talebzadeh
d. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 5 Jul 2021 at 20:27, Madaditya .Maddy wrote: > I came across an article that benchmarked spark on k8s vs yarn by > Datamechanics. > > Link : > https://

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Mich Talebzadeh
ltrix > > > > > > > Original-Nachricht > Am 5. Juli 2021, 21:27, Madaditya .Maddy schrieb: > > > I came across an article that benchmarked spark on k8s vs yarn by > Datamechanics. > > Link : > https://www.datamechanics.co/blog-post/apache-spark

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Christian Pfarr
-Nachricht Am 5. Juli 2021, 21:27, Madaditya .Maddy schrieb: > > > > I came across an article that benchmarked spark on k8s vs yarn by > Datamechanics. > > > > > Link : > https://www.datamechanics.co/blog-post/apache-spark-performance-benchmarks-show-k

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Mich Talebzadeh
arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 5 Jul 2021 at 20:27, Madaditya .Maddy wrote: > I came across an article that benchmarked s

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Madaditya .Maddy
I came across an article that benchmarked spark on k8s vs yarn by Datamechanics. Link : https://www.datamechanics.co/blog-post/apache-spark-performance-benchmarks-show-kubernetes-has-caught-up-with-yarn -Regards Aditya On Mon, Jul 5, 2021, 23:49 Mich Talebzadeh wrote: > Thanks Yuri. Those

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Mich Talebzadeh
Thanks Yuri. Those are very valid points. Let me clarify my point. Let us assume that we will be using Yarn versus K8s doing the same job. Spark-submit will use Yarn at first instance and will then switch to using k8s for the same task. 1. Have there been such benchmarks? 2. When should I

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
Not a big expert on Spark, but I’m not really understand how you are going to compare and what? Reading-writing to and from Hdfs? How does it related to yarn and k8s… these are recourse managers (YARN yet another resource manager) : what and how much to allocate and when… (cpu, ram). Local Disk

Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Mich Talebzadeh
I was curious to know if there are benchmarks around on comparison between Spark on Yarn compared to Kubernetes. This question arose because traditionally in Google Cloud we have been using Spark on Dataproc clusters. Dataproc provides Spark, Hadoop plus other

Re: spark on k8s driver pod exception

2021-03-15 Thread Attila Zsolt Piros
Sure, that is expected, see the "How it works" section in "Running Spark on Kubernetes" page , quote: When the application completes, the executor pods terminate and are cleaned > up, but the driver pod persists logs and

Re: spark on k8s driver pod exception

2021-03-15 Thread 040840219
when driver pod throws exception , driver pod still running ? kubectl logs wordcount-e3141c7834d3dd68-driver 21/03/15 07:40:19 DEBUG Analyzer$ResolveReferences: Resolving 'value1 to 'value1 Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`value1`' given

Re: spark on k8s driver pod exception

2021-03-11 Thread Attila Zsolt Piros
> but the spark-submit log still running Set the "spark.kubernetes.submission.waitAppCompletion" config to false to change that. As the doc says: "spark.kubernetes.submission.waitAppCompletion" : In cluster mode, whether to wait for the application to finish before exiting the launcher process.

Re: spark on k8s driver pod exception

2021-03-11 Thread Attila Zsolt Piros
For getting the logs please read Accessing Logs part of the *Running Spark on Kubernetes* page. For stopping and generic management of the spark application please read the Spark Application Management

spark on k8s driver pod exception

2021-03-11 Thread yxl040840219
when run the code in k8s , driver pod throw AnalysisException , but the spark-submit log still running , then how to get the exception and stop pods ? val spark = SparkSession.builder().getOrCreate() import spark.implicits._ val df = (0 until 10).toDF("id").selectExpr("id %

Re: Error while running Spark on K8s

2021-01-04 Thread Prashant Sharma
Can you please paste the full exception trace, and mention spark and k8s version? On Mon, Jan 4, 2021 at 6:19 PM Sachit Murarka wrote: > Hi Prashant > > Thanks for the response! > > I created the service account with the permissions and following is the > command: > > spark-submit --deploy-mode

Re: Error while running Spark on K8s

2021-01-04 Thread Sachit Murarka
Hi Prashant Thanks for the response! I created the service account with the permissions and following is the command: spark-submit --deploy-mode cluster --master k8s://http://ip:port --name "sachit" --conf spark.kubernetes.pyspark.pythonVersion=3 --conf spark.kubernetes.namespace=spark-test -

Re: Error while running Spark on K8s

2021-01-04 Thread Prashant Sharma
Hi Sachit, Can you give more details on how did you run? i.e. spark submit command. My guess is, a service account with sufficient privilege is not provided. Please see: http://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac Thanks, On Mon, Jan 4, 2021 at 5:27 PM Sachit Murarka wro

Error while running Spark on K8s

2021-01-04 Thread Sachit Murarka
Hi All, I am getting the below error when I am trying to run the spark job on Kubernetes, I am running it in cluster mode. Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [spark-test] failed. at

Re: No matter how many instances and cores configured for spark on k8s, only one executor is reading file

2020-12-21 Thread ????
Hi Finally, i found a configuraiton parameter:  spark.default.parallelism Change this parmater will finally change the parallel running exeutor amount,  although log file still says first 15 tasks ... blabla.  Any way, my problem is solved. -- Original 

Re: No matter how many instances and cores configured for spark on k8s, only one executor is reading file

2020-12-21 Thread Sean Owen
Pass more partitions to the second argument of parallelize()? On Mon, Dec 21, 2020 at 7:39 AM 沈俊 wrote: > Hi > > I am now trying to use spark to do tcpdump pcap file analysis. The first > step is to read the file and parse the content to dataframe according to > analysis requirements. > > I've

No matter how many instances and cores configured for spark on k8s, only one executor is reading file

2020-12-21 Thread ????
Hi I am now trying to use spark to do tcpdump pcap file analysis.  The first step is to read the file and parse the content to dataframe according to analysis requirements.  I've made a public folder for all executors so that they can access it directly like a local file system.  Here is the

Need suggestions for Spark on K8S: RPC Encryption

2020-11-05 Thread Xuan Gong
Hello, spark experts: I am trying to figure out how to encrypt traffic when using spark on k8s. >From the spark security doc, I learned how to do the RPC encryption between spark driver and spark executors. But I do not understand how to do it between spark driver and K8S API Server. (and/ma

Re: spark on k8s - can driver and executor have separate checkpoint location?

2020-05-16 Thread Ali Gouta
Hello, I am wondering if you do so, then all your executor pods should run on the same kubernetes worker node since you mount a single volume with a ReadWriteOnce policy. By design this seems not to be good I assume. You may need to have a kind of ReadWriteMany policy associated to the volume. The

spark on k8s - can driver and executor have separate checkpoint location?

2020-05-15 Thread wzhan
Hi guys, I'm running spark applications on kubernetes. According to spark documentation https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing Spark needs distributed file system to store its checkpoint data so that in case of failure, it can recover from checkpoint di

RE:How to use spark-on-k8s pod template?

2019-11-12 Thread sora
Hi, I am using Spark 2.4.1 now. I can run spark on k8s normally, but I want to apply some k8s features (eg: pod tolerations) to pod by pod template. Thanks. -- 发件人:David Mitchell 发送时间:2019年11月9日(星期六) 00:18 收件人:sora 抄 送:user 主

Re: How to use spark-on-k8s pod template?

2019-11-08 Thread David Mitchell
ances=5 \ --conf spark.kubernetes.container.image= \ local:///path/to/examples.jar On Tue, Nov 5, 2019 at 6:37 AM sora wrote: > Hi all, > I am looking for the usage about the spark-on-k8s pod template. > I want to set some toleration rules for the driver and executor

How to use spark-on-k8s pod template?

2019-11-05 Thread sora
Hi all, I am looking for the usage about the spark-on-k8s pod template. I want to set some toleration rules for the driver and executor pod. I tried to set --conf spark.kubernetes.driver.podTemplateFile=/spark-pod-template.yaml but didn't work. The driver pod started without the toleration

Spark on k8s: Mount config map in executor

2019-08-26 Thread Steven Stetzler
Hello everyone, I am wondering if there is a way to mount a Kubernetes ConfigMap into a directory in a Spark executor on Kubernetes. Poking around the docs, the only volume mounting options I can find are for a PVC, a directory on the host machine, and an empty volume. I am trying to pass in confi

Re: Spark on K8S - --packages not working for cluster mode?

2019-06-06 Thread pacuna
Great! Thanks a lot. Best, Pablo. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark on K8S - --packages not working for cluster mode?

2019-06-06 Thread Stavros Kontopoulos
Hi, This has been fixed here: https://github.com/apache/spark/pull/23546. Will be available with Spark 3.0.0 Best, Stavros On Wed, Jun 5, 2019 at 11:18 PM pacuna wrote: > I'm trying to run a sample code that reads a file from s3 so I need the aws > sdk and aws hadoop dependencies. > If I assem

Spark on K8S - --packages not working for cluster mode?

2019-06-05 Thread pacuna
I'm trying to run a sample code that reads a file from s3 so I need the aws sdk and aws hadoop dependencies. If I assemble these deps into the main jar everything works fine. But when I try using --packages, the deps are not seen by the pods. This is my submit command: spark-submit --master k8s:

Re: Spark on k8s - map persistentStorage for data spilling

2019-03-01 Thread Tomasz Krol
wonder if your organization can consider modifying your Kubernetes > setup to make your emptyDir volumes larger and faster? > > > > -Matt Cheah > > > > *From: *Tomasz Krol > *Date: *Friday, March 1, 2019 at 10:53 AM > *To: *Matt Cheah > *Cc: *"user

Re: Spark on k8s - map persistentStorage for data spilling

2019-03-01 Thread Matt Cheah
faster? -Matt Cheah From: Tomasz Krol Date: Friday, March 1, 2019 at 10:53 AM To: Matt Cheah Cc: "user@spark.apache.org" Subject: Re: Spark on k8s - map persistentStorage for data spilling Hi Matt, Thanks for coming back to me. Yeah that doesn't work. Basically in th

Re: Spark on k8s - map persistentStorage for data spilling

2019-03-01 Thread Tomasz Krol
give that a try and let us know if that moves > the spills as expected? > > > > -Matt Cheah > > > > *From: *Tomasz Krol > *Date: *Wednesday, February 27, 2019 at 3:41 AM > *To: *"user@spark.apache.org" > *Subject: *Spark on k8s - map persistentStorage for

Re: Spark on k8s - map persistentStorage for data spilling

2019-02-28 Thread Matt Cheah
; Subject: Spark on k8s - map persistentStorage for data spilling Hey Guys, I hope someone will be able to help me, as I've stuck with this for a while:) Basically I am running some jobs on kubernetes as per documentation https://spark.apache.org/docs/latest/running-on-

Spark on k8s - map persistentStorage for data spilling

2019-02-27 Thread Tomasz Krol
Hey Guys, I hope someone will be able to help me, as I've stuck with this for a while:) Basically I am running some jobs on kubernetes as per documentation https://spark.apache.org/docs/latest/running-on-kubernetes.html All works fine, however if I run queries on bigger data volume, then jobs fa

[Spark on K8s] Scaling experiences sharing

2018-11-09 Thread Li Gao
Hi Spark Community, I am reaching out to see if there are current large scale production or pre-production deployment of Spark on k8s for batch and micro batch jobs. Large scale means running 100s of thousand spark jobs daily and 1000s of concurrent spark jobs on a single k8s cluster and 10s of

Re: Spark on K8s resource staging server timeout

2018-03-29 Thread Jenna Hoole
official > release we should probably create a mechanism that’s more resilient. Using > a single HTTP server isn’t ideal – would ideally like something that’s > highly available, replicated, etc. > > > > -Matt Cheah > > > > *From: *Jenna Hoole > *Date: *Thursda

Re: Spark on K8s resource staging server timeout

2018-03-29 Thread Matt Cheah
AM To: "user@spark.apache.org" Subject: Re: Spark on K8s resource staging server timeout I added overkill high timeouts to the OkHttpClient.Builder() in RetrofitClientFactory.scala and I don't seem to be timing out anymore. val okHttpClientBuilder = new OkHtt

Re: Spark on K8s resource staging server timeout

2018-03-29 Thread Jenna Hoole
CONDS) .writeTimeout(120, TimeUnit.SECONDS) .readTimeout(120, TimeUnit.SECONDS) -Jenna On Tue, Mar 27, 2018 at 10:48 AM, Jenna Hoole wrote: > So I'm running into an issue with my resource staging server that's > producing a stacktrace like Issue 342 > <https://github.co

Spark on K8s resource staging server timeout

2018-03-27 Thread Jenna Hoole
So I'm running into an issue with my resource staging server that's producing a stacktrace like Issue 342 <https://github.com/apache-spark-on-k8s/spark/issues/342>, but I don't think for the same reasons. What's happening is that every time after I start up a resource sta

Re: Spark on K8s - using files fetched by init-container?

2018-02-27 Thread Felix Cheung
Yes you were pointing to HDFS on a loopback address... From: Jenna Hoole Sent: Monday, February 26, 2018 1:11:35 PM To: Yinan Li; user@spark.apache.org Subject: Re: Spark on K8s - using files fetched by init-container? Oh, duh. I completely forgot that file

Re: Spark on K8s - using files fetched by init-container?

2018-02-26 Thread Jenna Hoole
Oh, duh. I completely forgot that file:// is a prefix I can use. Up and running now :) Thank you so much! Jenna On Mon, Feb 26, 2018 at 1:00 PM, Yinan Li wrote: > OK, it looks like you will need to use > `file:///var/spark-data/spark-files/flights.csv` > instead. The 'file://' scheme must be e

Re: Spark on K8s - using files fetched by init-container?

2018-02-26 Thread Yinan Li
The files specified through --files are localized by the init-container to /var/spark-data/spark-files by default. So in your case, the file should be located at /var/spark-data/spark-files/flights.csv locally in the container. On Mon, Feb 26, 2018 at 10:51 AM, Jenna Hoole wrote: > This is proba

Spark on K8s - using files fetched by init-container?

2018-02-26 Thread Jenna Hoole
This is probably stupid user error, but I can't for the life of me figure out how to access the files that are staged by the init-container. I'm trying to run the SparkR example data-manipulation.R which requires the path to its datafile. I supply the hdfs location via --files and then the full hd

Re: Spark on K8s with Romana

2018-02-12 Thread Yinan Li
We actually moved away from using the driver pod IP because of https://github.com/apache-spark-on-k8s/spark/issues/482. The current way this works is that the driver url is constructed based on the value of "spark.driver.host" that is set to the DNS name of the headless driver serv

Spark on K8s with Romana

2018-02-12 Thread Jenna Hoole
So, we've run into something interesting. In our case, we've got some proprietary networking HW which is very feature limited in the TCP/IP space, so using Romana, executors can't seem to find the driver using the hostname lookup method it's attempting. Is there any way to make it use IP? Thanks,