Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Marcelo Vanzin Mon, 20 Jun 2016 13:21:47 -0700

It doesn't hurt to have a bug tracking it, in case anyone else has
time to look at it before I do.


On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jonathaka...@gmail.com> wrote:
> Thanks for the confirmation! Shall I cut a JIRA issue?
>
> On Mon, Jun 20, 2016 at 10:42 AM Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> I just tried this locally and can see the wrong behavior you mention.
>> I'm running a somewhat old build of 2.0, but I'll take a look.
>>
>> On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jonathaka...@gmail.com>
>> wrote:
>> > Does anybody have any thoughts on this?
>> >
>> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jonathaka...@gmail.com>
>> > wrote:
>> >>
>> >> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
>> >> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
>> >> log4j.properties is
>> >> not getting picked up in the executor classpath (and driver classpath
>> >> for
>> >> yarn-cluster mode), so Hadoop's log4j.properties file is taking
>> >> precedence
>> >> in the YARN containers.
>> >>
>> >> Spark's log4j.properties file is correctly being bundled into the
>> >> __spark_conf__.zip file and getting added to the DistributedCache, but
>> >> it is
>> >> not in the classpath of the executor, as evidenced by the following
>> >> command,
>> >> which I ran in spark-shell:
>> >>
>> >> scala> sc.parallelize(Seq(1)).map(_ =>
>> >> getClass().getResource("/log4j.properties")).first
>> >> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
>> >>
>> >> I then ran the following in spark-shell to verify the classpath of the
>> >> executors:
>> >>
>> >> scala> sc.parallelize(Seq(1)).map(_ =>
>> >> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e
>> >> =>
>> >> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
>> >> ...
>> >>
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>> >>
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
>> >> /etc/hadoop/conf
>> >> ...
>> >>
>> >> So the JVM has this nonexistent __spark_conf__ directory in the
>> >> classpath
>> >> when it should really be __spark_conf__.zip (which is actually a
>> >> symlink to
>> >> a directory, despite the .zip filename).
>> >>
>> >> % sudo ls -l
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>> >> total 20
>> >> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
>> >> -rwx------ 1 yarn yarn  594 Jun 18 01:26
>> >> default_container_executor_session.sh
>> >> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
>> >> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
>> >> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
>> >> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
>> >> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
>> >>
>> >> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
>> >> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
>> >>
>> >> Does anybody know why this is happening? Is this a bug in Spark, or is
>> >> it
>> >> the JVM doing this (possibly because the extension is .zip)?
>> >>
>> >> Thanks,
>> >> Jonathan
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Reply via email to