If you are running Spark on Yarn, the spark-submit utility will download
the jar from S3 and copy it to HDFS in a distributed cache. The driver
shares this location with Yarn NodeManagers via the container
LaunchContext. NodeManagers localize the jar and place it on container
classpath before they launch the executor container

Henoc

On Fri, Aug 14, 2020, 6:19 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> Looking back at the code
>
> All --jar Args and such run through
>
>
> https://github.com/apache/spark/blob/7f275ee5978e00ac514e25f5ef1d4e3331f8031b/core/src/main/scala/org/apache/spark/SparkContext.scala#L493-L500
>
> Which calls
>
>
> https://github.com/apache/spark/blob/7f275ee5978e00ac514e25f5ef1d4e3331f8031b/core/src/main/scala/org/apache/spark/SparkContext.scala#L1842
>
> Which places local jars on the driver hosted file server and just leaves
> Remote Jars as is with the path for executors to access them
>
> On Thu, Aug 13, 2020 at 11:01 PM Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> The driver hosts a file server which the executors download the jar from.
>>
>> On Thu, Aug 13, 2020, 5:33 PM James Yu <ja...@ispot.tv> wrote:
>>
>>> Hi,
>>>
>>> When I spark submit a Spark app with my app jar located in S3, obviously
>>> the Driver will download the jar from the s3 location.  What is not clear
>>> to me is: where do the Executors get the jar from?  From the same s3
>>> location, or somehow from the Driver, or they don't need the jar?
>>>
>>> Thanks in advance for explanation.
>>>
>>> James
>>>
>>

Reply via email to