+1 to defaulting to text logs ! Regards, Mridul
On Fri, Nov 22, 2024 at 6:21 PM Gengliang Wang <ltn...@gmail.com> wrote: > Hi all, > > Earlier this year, we introduced JSON logging as the default in Spark with > the aim of enhancing log structure and facilitating better analysis. While > this change was made with the best intentions, we've collectively observed > some practical challenges that impact usability. > > *Key Observations:* > > 1. > > *Human Readability* > - *Cumbersome Formatting*: The JSON format, with its quotes and > braces, has proven less readable for direct log inspection. > - *Limitations of Pretty-Printing*: As noted in the Log4j > documentation > <https://logging.apache.org/log4j/2.x/manual/json-template-layout.html>, > pretty-printing JSON logs isn't feasible due to performance concerns. > - *Difficult Interpretation*: Elements like logical plans and stack > traces are rendered as single-line strings with embedded newline (\n) > characters, making quick interpretation challenging. > An example of a side-by-side plan comparison after setting > spark.sql.planChangeLog.level=info: > [image: image.png] > 2. > > *Lack of Log Centralization Tools* > - Although we can programmatically analyze logs using > spark.read.schema(SPARK_LOG_SCHEMA).json("path/to/logs"), there is > currently a lack of open-source tools to easily centralize and manage > these > logs across Drivers, Executors, Masters, and Workers. This limits the > practical benefits we hoped to achieve with JSON logging. > 3. > > *Consistency and Timing* > - Since Spark 4.0 has yet to be released, we have an opportunity to > maintain consistency with previous versions by reverting to plain text > logs > as the default. This doesn't close the door on structured logging; we > can > revisit this decision in future releases as the ecosystem matures and > more > supportive tools become available. > > Given these considerations, I support Wenchen's proposal to switch back to > plain text logs by default in Spark 4.0. Our goal is to provide the best > possible experience for our users, and adjusting our approach based on > real-world feedback is a part of that process. > > I'm looking forward to hearing your thoughts and discussing how we can > continue to improve our logging practices. > > Best regards, > > Gengliang Wang > > On Fri, Nov 22, 2024 at 3:32 PM bo yang <bobyan...@gmail.com> wrote: > >> +1 for default using plain text logging. It is good for simple usage >> scenario, will also be more friendly to first time Spark users. >> >> And different companies may already build some tooling to process Spark >> logs. Using plain text by default will make those exiting tools continue to >> work. >> >> >> On Friday, November 22, 2024, serge rielau.com <se...@rielau.com> wrote: >> >>> It doesn’t have to be very easy. It just has to be easier than >>> maintaining two infrastrictures forever. >>> If we can’t easily parse the json log to emmit the existing text >>> content, I’d say we have a bigger problem. >>> >>> On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim < >>> kabhwan.opensou...@gmail.com>, wrote: >>> >>> I'm not sure it is very easy to provide a reader (I meant, viewer); it >>> would be mostly not a reader but a post-processor which will convert JSON >>> formatted log to plain text log. And after that users would get the "same" >>> UI/UX when dealing with log files in Spark 3.x. For people who do not >>> really need to structure the log and just want to go with their way of >>> reading the log (I'm a lover of grep), JSON formatted log by default is a >>> regression of UI/UX. >>> >>> JSON formatted log is definitely useful, but also definitely not >>> something to be human friendly. It is mostly only useful if they have >>> constructed an ecosystem around Spark which never requires humans to read >>> the log as JSON. I'm not quite sure whether we can/want to force users to >>> build the ecosystem to use Spark; for me, it's a lot easier for users to >>> have both options and turn on the config when they need it. >>> >>> +1 on Wenchen's proposal. >>> >>> On Sat, Nov 23, 2024 at 12:36 AM serge rielau.com <se...@rielau.com> >>> wrote: >>> >>>> Shouldn’t we differentiate between teh logging and the reading of the >>>> log. >>>> The problem appears to be in the presentation layer. >>>> We could provide a basic log reader, insteda of supporting longterm two >>>> different ways to log. >>>> >>>> >>>> On Nov 22, 2024, at 6:37 AM, Martin Grund <mar...@databricks.com.INVALID> >>>> wrote: >>>> >>>> I'm generally supportive of this direction. However, I'm wondering if >>>> we can be more deliberate about when to use it. For example, for the common >>>> scenarios that you mention as "light" usage, we should switch to plain text >>>> logging. >>>> >>>> IMO, this would cover the cases where a user runs simply the pyspark or >>>> spark-shell scripts. For these use cases, most users will probably prefer >>>> plain text logging. Maybe we should even go one step further and have some >>>> default console filters that use color output for these interactive use >>>> cases? And make it more readable in general? >>>> >>>> For the regular spark-submit-based job submissions, I would actually >>>> say that the benefits outweigh the potential complexity. >>>> >>>> WDYT? >>>> >>>> On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan <cloud0...@gmail.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'm writing this email to propose switching back to the previous plain >>>>> text logs by default, for the following reasons: >>>>> >>>>> - The JSON log is not very human-readable. It's more verbose than >>>>> plain text, and new lines become `\n`, making query plan tree string >>>>> and >>>>> error stacktrace very hard to read. >>>>> - Structured Logging is not available out of the box. Users must >>>>> set up a log pipeline to collect the JSON log files on drivers and >>>>> executors first. Turning it on by default doesn't provide much value. >>>>> >>>>> Some examples of the hard-to-read JSON log: >>>>> [image: image.png] >>>>> [image: image.png] >>>>> >>>>> For the good of Spark engine developers and light Spark users, I think >>>>> the previous plain text log is a better choice. We can add a doc page to >>>>> introduce how to use Structured Logging: turn on the config, collect JSON >>>>> log files, and run queries. >>>>> >>>>> Please let me know if you share the same feelings or have different >>>>> opinions. >>>>> >>>>> Thanks, >>>>> Wenchen >>>>> >>>> >>>>