What should I do if I want to log something as part of a task? This is what I tried. To set up a logger, I followed the advice here: http://py4j.sourceforge.net/faq.html#how-to-turn-logging-on-off
logger = logging.getLogger("py4j") logger.setLevel(logging.INFO) logger.addHandler(logging.StreamHandler()) This works fine when I call it from my driver (ie pyspark): logger.info("this works fine") But I want to try logging within a distributed task so I did this: def logTestMap(a): logger.info("test") return a myrdd.map(logTestMap).count() and got: PicklingError: Can't pickle 'lock' object So it's trying to serialize my function and can't because of a lock object used in logger, presumably for thread-safeness. But then...how would I do it? Or is this just a really bad idea? Thanks Diana