Hello,

I'm currently exploring DCOS for the spark notebook, and while looking at
the spark configuration I found something interesting which is actually
converging to what we've discovered:
https://github.com/mesosphere/universe/blob/master/repo/packages/S/spark/0/marathon.json

So the logging is working fine here because the spark package is using the
spark-class which is able to configure the log4j file. But the interesting
part comes with the fact that the `uris` parameter is filled in with a
downloadable path to the log4j file!

However, it's not possible when creating the spark context ourselfves and
relying on  the mesos sheduler backend only. Unles the spark.executor.uri
(or a another one) can take more than one downloadable path.

my.2ยข

andy

On Fri, May 29, 2015 at 5:09 PM Gerard Maas <gerard.m...@gmail.com> wrote:

> Hi Tim,
>
> Thanks for the info.   We (Andy Petrella and myself) have been diving a
> bit deeper into this log config:
>
> The log line I was referring to is this one (sorry, I provided the others
> just for context)
>
> *Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties*
>
> That line comes from Logging.scala [1] where a default config is loaded is
> none is found in the classpath upon the startup of the Spark Mesos executor
> in the Mesos sandbox. At that point in time, none of the
> application-specific resources have been shipped yet as the executor JVM is
> just starting up.   To load a custom configuration file we should have it
> already on the sandbox before the executor JVM starts and add it to the
> classpath on the startup command. Is that correct?
>
> For the classpath customization, It looks like it should be possible to
> pass a -Dlog4j.configuration  property by using the
> 'spark.executor.extraClassPath' that will be picked up at [2] and that
> should be added to the command that starts the executor JVM, but the
> resource must be already on the host before we can do that. Therefore we
> also need some means of 'shipping' the log4j.configuration file to the
> allocated executor.
>
> This all boils down to your statement on the need of shipping extra files
> to the sandbox. Bottom line: It's currently not possible to specify a
> config file for your mesos executor. (ours grows several GB/day).
>
> The only workaround I found so far is to open up the Spark assembly,
> replace the log4j-default.properties and pack it up again.  That would
> work, although kind of rudimentary as we use the same assembly for many
> jobs.  Probably, accessing the log4j API programmatically should also work
> (I didn't try that yet)
>
> Should we open a JIRA for this functionality?
>
> -kr, Gerard.
>
>
>
>
> [1]
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L128
> [2]
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L77
>
> On Thu, May 28, 2015 at 7:50 PM, Tim Chen <t...@mesosphere.io> wrote:
>
>>
>> ---------- Forwarded message ----------
>> From: Tim Chen <t...@mesosphere.io>
>> Date: Thu, May 28, 2015 at 10:49 AM
>> Subject: Re: [Streaming] Configure executor logging on Mesos
>> To: Gerard Maas <gerard.m...@gmail.com>
>>
>>
>> Hi Gerard,
>>
>> The log line you referred to is not Spark logging but Mesos own logging,
>> which is using glog.
>>
>> Our own executor logs should only contain very few lines though.
>>
>> Most of the log lines you'll see is from Spark, and it can be controled
>> by specifiying a log4j.properties to be downloaded with your Mesos task.
>> Alternatively if you are downloading Spark executor via spark.executor.uri,
>> you can include log4j.properties in that tar ball.
>>
>> I think we probably need some more configurations for Spark scheduler to
>> pick up extra files to be downloaded into the sandbox.
>>
>> Tim
>>
>>
>>
>>
>>
>> On Thu, May 28, 2015 at 6:46 AM, Gerard Maas <gerard.m...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to control the verbosity of the logs on the Mesos executors
>>> with no luck so far. The default behaviour is INFO on stderr dump with an
>>> unbounded growth that gets too big at some point.
>>>
>>> I noticed that when the executor is instantiated, it locates a default
>>> log configuration in the spark assembly:
>>>
>>> I0528 13:36:22.958067 26890 exec.cpp:206] Executor registered on slave
>>> 20150528-063307-780930314-5050-8152-S5
>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>> classpath
>>> Using Spark's default log4j profile:
>>> org/apache/spark/log4j-defaults.properties
>>>
>>> So, no matter what I provide in my job jar files (or also tried with
>>> (spark.executor.extraClassPath=log4j.properties) takes effect in the
>>> executor's configuration.
>>>
>>> How should I configure the log on the executors?
>>>
>>> thanks, Gerard.
>>>
>>
>>
>>
>

Reply via email to