A JIRA has been opened up on this exact topic: SPARK-25236 <https://issues.apache.org/jira/browse/SPARK-25236>, a few days ago, after seeing another case of print(_, file=sys.stderr) in a most recent review. I agree that we should include logging for PySpark workers.
On Mon, Aug 27, 2018 at 1:29 PM, Imran Rashid <iras...@cloudera.com.invalid> wrote: > Another question on pyspark code -- how come there is no logging at all? > does python logging have an unreasonable overhead, or its impossible to > configure or something? > > I'm really surprised nobody has ever wanted to me able to turn on some > debug or trace logging in pyspark by just configuring a logging level. > > For me, I wanted this during debugging while developing -- I'd work on > some part of the code and drop in a bunch of print statements. Then I'd > rip those out when I think I'm ready to submit a patch. But then I realize > I forgot some case, then more debugging -- oh gotta add those print > statements in again ... > > does somebody jsut need to setup the configuration properly, or is there a > bigger reason to avoid logging in python? > > thanks, > Imran >