The reason it didn't work for you is that the function you registered with someRdd.map will be running on the worker/executor side, not in your driver's program. Then you need to be careful to not accidentally close over some objects instantiated from your driver's program, like the log object in your sample code above. You can look for more information online to understand more the concept of "Closure" so that you can understand to the bottom of it why it didn't work for you at first place.
The usual solution to this type of problems is to instantiate the objects you want to use in your map functions from within your map functions. You can define a factory object that you can create your log object from. On Mon, May 25, 2015 at 11:05 PM, Spico Florin <spicoflo...@gmail.com> wrote: > Hello! > I would like to use the logging mechanism provided by the log4j, but I'm > getting the > Exception in thread "main" org.apache.spark.SparkException: Task not > serializable -> Caused by: java.io.NotSerializableException: > org.apache.log4j.Logger > > The code (and the problem) that I'm using resembles the one used here : > http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala, > meaning: > > val log = Logger.getLogger(getClass.getName) > > def doTest() { > val conf = new SparkConf().setMaster("local[4]").setAppName("LogTest") > val spark = new SparkContext(conf) > > val someRdd = spark.parallelize(List(1, 2, 3)) > someRdd.map { > element => > *log.info <http://log.info>(s"$element will be processed")* > element + 1 > } > I'm posting the same problem due to the fact that the one from > stackoverflow didn't get any answer. > In this case, can you please tell us what is the best way to use logging? > Is any solution that is not using the rdd.forEachPartition? > > I look forward for your answers. > Regards, > Florin > > > > > >