+1 On 2024/11/23 02:50:36 Wenchen Fan wrote: > Hi Martin, > > Yea, we should be more deliberate about when to use Structured Logging. Let > me start with when people prefer plain text logs: > - Spark engine developers like us. When running tests, the logs are printed > in the console and plain text log is more human-readable. > - Spark users who prefer to read the logs manually due to the lack of infra > support. > - Spark users who already have decent log infra based on the plain text > logs. > > In general, I think Structured Logging should be used when users want to > build an infra to consume logs by machine, or they want to switch their > existing infra to use JSON logs. Both need non-trivial work and turning > Structured Logging by default won't provide them much value, but it hurts > UX for people who still prefer plain text logs. > > On Sat, Nov 23, 2024 at 9:09 AM Mridul Muralidharan <mri...@gmail.com> > wrote: > > > +1 to defaulting to text logs ! > > > > Regards, > > Mridul > > > > On Fri, Nov 22, 2024 at 6:21 PM Gengliang Wang <ltn...@gmail.com> wrote: > > > >> Hi all, > >> > >> Earlier this year, we introduced JSON logging as the default in Spark > >> with the aim of enhancing log structure and facilitating better analysis. > >> While this change was made with the best intentions, we've collectively > >> observed some practical challenges that impact usability. > >> > >> *Key Observations:* > >> > >> 1. > >> > >> *Human Readability* > >> - *Cumbersome Formatting*: The JSON format, with its quotes and > >> braces, has proven less readable for direct log inspection. > >> - *Limitations of Pretty-Printing*: As noted in the Log4j > >> documentation > >> > >> <https://logging.apache.org/log4j/2.x/manual/json-template-layout.html>, > >> pretty-printing JSON logs isn't feasible due to performance concerns. > >> - *Difficult Interpretation*: Elements like logical plans and > >> stack traces are rendered as single-line strings with embedded > >> newline ( > >> \n) characters, making quick interpretation challenging. > >> An example of a side-by-side plan comparison after setting > >> spark.sql.planChangeLog.level=info: > >> [image: image.png] > >> 2. > >> > >> *Lack of Log Centralization Tools* > >> - Although we can programmatically analyze logs using > >> spark.read.schema(SPARK_LOG_SCHEMA).json("path/to/logs"), there is > >> currently a lack of open-source tools to easily centralize and > >> manage these > >> logs across Drivers, Executors, Masters, and Workers. This limits the > >> practical benefits we hoped to achieve with JSON logging. > >> 3. > >> > >> *Consistency and Timing* > >> - Since Spark 4.0 has yet to be released, we have an opportunity to > >> maintain consistency with previous versions by reverting to plain > >> text logs > >> as the default. This doesn't close the door on structured logging; > >> we can > >> revisit this decision in future releases as the ecosystem matures > >> and more > >> supportive tools become available. > >> > >> Given these considerations, I support Wenchen's proposal to switch back > >> to plain text logs by default in Spark 4.0. Our goal is to provide the best > >> possible experience for our users, and adjusting our approach based on > >> real-world feedback is a part of that process. > >> > >> I'm looking forward to hearing your thoughts and discussing how we can > >> continue to improve our logging practices. > >> > >> Best regards, > >> > >> Gengliang Wang > >> > >> On Fri, Nov 22, 2024 at 3:32 PM bo yang <bobyan...@gmail.com> wrote: > >> > >>> +1 for default using plain text logging. It is good for simple usage > >>> scenario, will also be more friendly to first time Spark users. > >>> > >>> And different companies may already build some tooling to process Spark > >>> logs. Using plain text by default will make those exiting tools continue > >>> to > >>> work. > >>> > >>> > >>> On Friday, November 22, 2024, serge rielau.com <se...@rielau.com> wrote: > >>> > >>>> It doesn’t have to be very easy. It just has to be easier than > >>>> maintaining two infrastrictures forever. > >>>> If we can’t easily parse the json log to emmit the existing text > >>>> content, I’d say we have a bigger problem. > >>>> > >>>> On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim < > >>>> kabhwan.opensou...@gmail.com>, wrote: > >>>> > >>>> I'm not sure it is very easy to provide a reader (I meant, viewer); it > >>>> would be mostly not a reader but a post-processor which will convert JSON > >>>> formatted log to plain text log. And after that users would get the > >>>> "same" > >>>> UI/UX when dealing with log files in Spark 3.x. For people who do not > >>>> really need to structure the log and just want to go with their way of > >>>> reading the log (I'm a lover of grep), JSON formatted log by default is a > >>>> regression of UI/UX. > >>>> > >>>> JSON formatted log is definitely useful, but also definitely not > >>>> something to be human friendly. It is mostly only useful if they have > >>>> constructed an ecosystem around Spark which never requires humans to read > >>>> the log as JSON. I'm not quite sure whether we can/want to force users to > >>>> build the ecosystem to use Spark; for me, it's a lot easier for users to > >>>> have both options and turn on the config when they need it. > >>>> > >>>> +1 on Wenchen's proposal. > >>>> > >>>> On Sat, Nov 23, 2024 at 12:36 AM serge rielau.com <se...@rielau.com> > >>>> wrote: > >>>> > >>>>> Shouldn’t we differentiate between teh logging and the reading of the > >>>>> log. > >>>>> The problem appears to be in the presentation layer. > >>>>> We could provide a basic log reader, insteda of supporting longterm > >>>>> two different ways to log. > >>>>> > >>>>> > >>>>> On Nov 22, 2024, at 6:37 AM, Martin Grund > >>>>> <mar...@databricks.com.INVALID> wrote: > >>>>> > >>>>> I'm generally supportive of this direction. However, I'm wondering if > >>>>> we can be more deliberate about when to use it. For example, for the > >>>>> common > >>>>> scenarios that you mention as "light" usage, we should switch to plain > >>>>> text > >>>>> logging. > >>>>> > >>>>> IMO, this would cover the cases where a user runs simply the pyspark > >>>>> or spark-shell scripts. For these use cases, most users will probably > >>>>> prefer plain text logging. Maybe we should even go one step further and > >>>>> have some default console filters that use color output for these > >>>>> interactive use cases? And make it more readable in general? > >>>>> > >>>>> For the regular spark-submit-based job submissions, I would actually > >>>>> say that the benefits outweigh the potential complexity. > >>>>> > >>>>> WDYT? > >>>>> > >>>>> On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan <cloud0...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> I'm writing this email to propose switching back to the previous > >>>>>> plain text logs by default, for the following reasons: > >>>>>> > >>>>>> - The JSON log is not very human-readable. It's more verbose than > >>>>>> plain text, and new lines become `\n`, making query plan tree > >>>>>> string and > >>>>>> error stacktrace very hard to read. > >>>>>> - Structured Logging is not available out of the box. Users must > >>>>>> set up a log pipeline to collect the JSON log files on drivers and > >>>>>> executors first. Turning it on by default doesn't provide much > >>>>>> value. > >>>>>> > >>>>>> Some examples of the hard-to-read JSON log: > >>>>>> [image: image.png] > >>>>>> [image: image.png] > >>>>>> > >>>>>> For the good of Spark engine developers and light Spark users, I > >>>>>> think the previous plain text log is a better choice. We can add a doc > >>>>>> page > >>>>>> to introduce how to use Structured Logging: turn on the config, collect > >>>>>> JSON log files, and run queries. > >>>>>> > >>>>>> Please let me know if you share the same feelings or have different > >>>>>> opinions. > >>>>>> > >>>>>> Thanks, > >>>>>> Wenchen > >>>>>> > >>>>> > >>>>> >
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org