Hi Tim, Thanks for the info. We (Andy Petrella and myself) have been diving a bit deeper into this log config:
The log line I was referring to is this one (sorry, I provided the others just for context) *Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties* That line comes from Logging.scala [1] where a default config is loaded is none is found in the classpath upon the startup of the Spark Mesos executor in the Mesos sandbox. At that point in time, none of the application-specific resources have been shipped yet as the executor JVM is just starting up. To load a custom configuration file we should have it already on the sandbox before the executor JVM starts and add it to the classpath on the startup command. Is that correct? For the classpath customization, It looks like it should be possible to pass a -Dlog4j.configuration property by using the 'spark.executor.extraClassPath' that will be picked up at [2] and that should be added to the command that starts the executor JVM, but the resource must be already on the host before we can do that. Therefore we also need some means of 'shipping' the log4j.configuration file to the allocated executor. This all boils down to your statement on the need of shipping extra files to the sandbox. Bottom line: It's currently not possible to specify a config file for your mesos executor. (ours grows several GB/day). The only workaround I found so far is to open up the Spark assembly, replace the log4j-default.properties and pack it up again. That would work, although kind of rudimentary as we use the same assembly for many jobs. Probably, accessing the log4j API programmatically should also work (I didn't try that yet) Should we open a JIRA for this functionality? -kr, Gerard. [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L128 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L77 On Thu, May 28, 2015 at 7:50 PM, Tim Chen <t...@mesosphere.io> wrote: > > ---------- Forwarded message ---------- > From: Tim Chen <t...@mesosphere.io> > Date: Thu, May 28, 2015 at 10:49 AM > Subject: Re: [Streaming] Configure executor logging on Mesos > To: Gerard Maas <gerard.m...@gmail.com> > > > Hi Gerard, > > The log line you referred to is not Spark logging but Mesos own logging, > which is using glog. > > Our own executor logs should only contain very few lines though. > > Most of the log lines you'll see is from Spark, and it can be controled by > specifiying a log4j.properties to be downloaded with your Mesos task. > Alternatively if you are downloading Spark executor via spark.executor.uri, > you can include log4j.properties in that tar ball. > > I think we probably need some more configurations for Spark scheduler to > pick up extra files to be downloaded into the sandbox. > > Tim > > > > > > On Thu, May 28, 2015 at 6:46 AM, Gerard Maas <gerard.m...@gmail.com> > wrote: > >> Hi, >> >> I'm trying to control the verbosity of the logs on the Mesos executors >> with no luck so far. The default behaviour is INFO on stderr dump with an >> unbounded growth that gets too big at some point. >> >> I noticed that when the executor is instantiated, it locates a default >> log configuration in the spark assembly: >> >> I0528 13:36:22.958067 26890 exec.cpp:206] Executor registered on slave >> 20150528-063307-780930314-5050-8152-S5 >> Spark assembly has been built with Hive, including Datanucleus jars on >> classpath >> Using Spark's default log4j profile: >> org/apache/spark/log4j-defaults.properties >> >> So, no matter what I provide in my job jar files (or also tried with >> (spark.executor.extraClassPath=log4j.properties) takes effect in the >> executor's configuration. >> >> How should I configure the log on the executors? >> >> thanks, Gerard. >> > > >