The reason it didn't work for you is that the function you registered with
someRdd.map will be running on the worker/executor side, not in your
driver's program. Then you need to be careful to not accidentally close
over some objects instantiated from your driver's program, like the log
object in your sample code above. You can look for more information online
to understand more the concept of "Closure" so that you can understand to
the bottom of it why it didn't work for you at first place.

The usual solution to this type of problems is to instantiate the objects
you want to use in your map functions from within your map functions. You
can define a factory object that you can create your log object from.

On Mon, May 25, 2015 at 11:05 PM, Spico Florin <spicoflo...@gmail.com>
wrote:

> Hello!
>   I would like to use the logging mechanism provided by the log4j, but I'm
> getting the
> Exception in thread "main" org.apache.spark.SparkException: Task not
> serializable -> Caused by: java.io.NotSerializableException:
> org.apache.log4j.Logger
>
> The code (and the problem) that I'm using resembles the one used here :
> http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala,
> meaning:
>
> val log = Logger.getLogger(getClass.getName)
>
>   def doTest() {
>    val conf = new SparkConf().setMaster("local[4]").setAppName("LogTest")
>    val spark = new SparkContext(conf)
>
>    val someRdd = spark.parallelize(List(1, 2, 3))
>    someRdd.map {
>      element =>
>        *log.info <http://log.info>(s"$element will be processed")*
>        element + 1
>     }
> I'm posting the same problem due to the fact that the one from
> stackoverflow didn't get any answer.
> In this case, can you please tell us what is the best way to use  logging?
> Is any solution that is not using the rdd.forEachPartition?
>
> I look forward for your answers.
> Regards,
> Florin
>
>
>
>
>
>

Reply via email to