Hi Tim,

Thanks for the info.   We (Andy Petrella and myself) have been diving a bit
deeper into this log config:

The log line I was referring to is this one (sorry, I provided the others
just for context)

*Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties*

That line comes from Logging.scala [1] where a default config is loaded is
none is found in the classpath upon the startup of the Spark Mesos executor
in the Mesos sandbox. At that point in time, none of the
application-specific resources have been shipped yet as the executor JVM is
just starting up.   To load a custom configuration file we should have it
already on the sandbox before the executor JVM starts and add it to the
classpath on the startup command. Is that correct?

For the classpath customization, It looks like it should be possible to
pass a -Dlog4j.configuration  property by using the
'spark.executor.extraClassPath' that will be picked up at [2] and that
should be added to the command that starts the executor JVM, but the
resource must be already on the host before we can do that. Therefore we
also need some means of 'shipping' the log4j.configuration file to the
allocated executor.

This all boils down to your statement on the need of shipping extra files
to the sandbox. Bottom line: It's currently not possible to specify a
config file for your mesos executor. (ours grows several GB/day).

The only workaround I found so far is to open up the Spark assembly,
replace the log4j-default.properties and pack it up again.  That would
work, although kind of rudimentary as we use the same assembly for many
jobs.  Probably, accessing the log4j API programmatically should also work
(I didn't try that yet)

Should we open a JIRA for this functionality?

-kr, Gerard.




[1]
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L128
[2]
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L77

On Thu, May 28, 2015 at 7:50 PM, Tim Chen <t...@mesosphere.io> wrote:

>
> ---------- Forwarded message ----------
> From: Tim Chen <t...@mesosphere.io>
> Date: Thu, May 28, 2015 at 10:49 AM
> Subject: Re: [Streaming] Configure executor logging on Mesos
> To: Gerard Maas <gerard.m...@gmail.com>
>
>
> Hi Gerard,
>
> The log line you referred to is not Spark logging but Mesos own logging,
> which is using glog.
>
> Our own executor logs should only contain very few lines though.
>
> Most of the log lines you'll see is from Spark, and it can be controled by
> specifiying a log4j.properties to be downloaded with your Mesos task.
> Alternatively if you are downloading Spark executor via spark.executor.uri,
> you can include log4j.properties in that tar ball.
>
> I think we probably need some more configurations for Spark scheduler to
> pick up extra files to be downloaded into the sandbox.
>
> Tim
>
>
>
>
>
> On Thu, May 28, 2015 at 6:46 AM, Gerard Maas <gerard.m...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm trying to control the verbosity of the logs on the Mesos executors
>> with no luck so far. The default behaviour is INFO on stderr dump with an
>> unbounded growth that gets too big at some point.
>>
>> I noticed that when the executor is instantiated, it locates a default
>> log configuration in the spark assembly:
>>
>> I0528 13:36:22.958067 26890 exec.cpp:206] Executor registered on slave
>> 20150528-063307-780930314-5050-8152-S5
>> Spark assembly has been built with Hive, including Datanucleus jars on
>> classpath
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>>
>> So, no matter what I provide in my job jar files (or also tried with
>> (spark.executor.extraClassPath=log4j.properties) takes effect in the
>> executor's configuration.
>>
>> How should I configure the log on the executors?
>>
>> thanks, Gerard.
>>
>
>
>

Reply via email to