Where do the executors get my app jar from?

2020-08-13 Thread James Yu
Hi, When I spark submit a Spark app with my app jar located in S3, obviously the Driver will download the jar from the s3 location. What is not clear to me is: where do the Executors get the jar from? From the same s3 location, or somehow from the Driver, or they don't need the jar? Thanks i

Re: Where do the executors get my app jar from?

2020-08-14 Thread James Yu
Henoc, Ok. That is for Yarn with HDFS. What will happen in Kubernetes as resource manager without HDFS scenario? James From: Henoc Sent: Thursday, August 13, 2020 10:45 PM To: James Yu Cc: user ; russell.spit...@gmail.com Subject: Re: Where do the executors

Poor performance caused by coalesce to 1

2021-02-03 Thread James Yu
Hi Team, We are running into this poor performance issue and seeking your suggestion on how to improve it: We have a particular dataset which we aggregate from other datasets and like to write out to one single file (because it is small enough). We found that after a series of transformations

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread James Yu
rito Sent: Wednesday, February 3, 2021 11:05 AM To: James Yu ; user Subject: Re: Poor performance caused by coalesce to 1 Coalesce is reducing the parallelization of your last stage, in your case to 1 task. So, it’s natural it will give poor performance especially with large data. If you absol

Re: Performance Problems Migrating to S3A Committers

2021-08-05 Thread James Yu
See this ticket https://issues.apache.org/jira/browse/HADOOP-17201. It may help your team. From: Johnny Burns Sent: Tuesday, June 22, 2021 3:41 PM To: user@spark.apache.org Cc: data-orchestration-team Subject: Performance Problems Migrating to S3A Committers H

start-history-server.sh doesn't survive system reboot. Recommendation?

2021-12-07 Thread James Yu
Hi Users, We found that the history server launched by using the "start-history-server.sh" command does not survive system reboot. Any recommendation of making it always up even after reboot? Thanks, James

Re: start-history-server.sh doesn't survive system reboot. Recommendation?

2021-12-08 Thread James Yu
Sent: Tuesday, December 7, 2021 1:29 PM To: James Yu Cc: user @spark Subject: Re: start-history-server.sh doesn't survive system reboot. Recommendation? The scripts just launch the processes. To make any process restart on system restart, you would need to set it up as a system service

Re: start-history-server.sh doesn't survive system reboot. Recommendation?

2021-12-08 Thread James Yu
amages arising from such loss, damage or destruction. On Wed, 8 Dec 2021 at 19:45, James Yu mailto:ja...@ispot.tv>> wrote: Just thought about another possibility which is to containerize the history server and run the container with proper restart policy. This may be the approach we will

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread James Yu
Question: Spark use log4j 1.2.17, if my application jar contains log4j 2.x and gets submitted to the Spark cluster. Which version of log4j gets actually used during the Spark session? From: Sean Owen Sent: Monday, December 13, 2021 8:25 AM To: Jörn Franke Cc: P

[k8s] Fail to expose custom port on executor container specified in my executor pod template

2023-06-26 Thread James Yu
Hi Team, I have no luck in trying to expose port 5005 (for remote debugging purpose) on my executor container using the following pod template and spark configuration s3a://mybucket/pod-template-executor-debug.yaml apiVersion:

Re: Spark Connect, Master, and Workers

2023-09-01 Thread James Yu
Can I simply understand Spark Connect this way: The client process is now the Spark driver? From: Brian Huynh Sent: Thursday, August 10, 2023 10:15 PM To: Kezhi Xiong Cc: user@spark.apache.org Subject: Re: Spark Connect, Master, and Workers Hi Kezhi, Yes, you

Spark driver thread

2020-03-05 Thread James Yu
Hi, Does a Spark driver always works as single threaded? If yes, does it mean asking for more than one vCPU for the driver is wasteful? Thanks, James

Re: Spark driver thread

2020-03-06 Thread James Yu
Pol, thanks for your reply. Actually I am running Spark apps in CLUSTER mode. Is what you said still applicable in cluster mode. Thanks in advance for your further clarification. From: Pol Santamaria Sent: Friday, March 6, 2020 12:59 AM To: James Yu Cc: user