Re: [DISCUSS] Use plain text logs by default

Yang Jie Tue, 10 Dec 2024 19:49:11 -0800

+1

On 2024/12/11 02:34:02 Kent Yao wrote:
> +1
> 
> On 2024/11/23 02:50:36 Wenchen Fan wrote:
> > Hi Martin,
> > 
> > Yea, we should be more deliberate about when to use Structured Logging. Let
> > me start with when people prefer plain text logs:
> > - Spark engine developers like us. When running tests, the logs are printed
> > in the console and plain text log is more human-readable.
> > - Spark users who prefer to read the logs manually due to the lack of infra
> > support.
> > - Spark users who already have decent log infra based on the plain text
> > logs.
> > 
> > In general, I think Structured Logging should be used when users want to
> > build an infra to consume logs by machine, or they want to switch their
> > existing infra to use JSON logs. Both need non-trivial work and turning
> > Structured Logging by default won't provide them much value, but it hurts
> > UX for people who still prefer plain text logs.
> > 
> > On Sat, Nov 23, 2024 at 9:09 AM Mridul Muralidharan <[email protected]>
> > wrote:
> > 
> > > +1 to defaulting to text logs !
> > >
> > > Regards,
> > > Mridul
> > >
> > > On Fri, Nov 22, 2024 at 6:21 PM Gengliang Wang <[email protected]> wrote:
> > >
> > >> Hi all,
> > >>
> > >> Earlier this year, we introduced JSON logging as the default in Spark
> > >> with the aim of enhancing log structure and facilitating better analysis.
> > >> While this change was made with the best intentions, we've collectively
> > >> observed some practical challenges that impact usability.
> > >>
> > >> *Key Observations:*
> > >>
> > >>    1.
> > >>
> > >>    *Human Readability*
> > >>    - *Cumbersome Formatting*: The JSON format, with its quotes and
> > >>       braces, has proven less readable for direct log inspection.
> > >>       - *Limitations of Pretty-Printing*: As noted in the Log4j
> > >>       documentation
> > >>       
> > >> <https://logging.apache.org/log4j/2.x/manual/json-template-layout.html>,
> > >>       pretty-printing JSON logs isn't feasible due to performance 
> > >> concerns.
> > >>       - *Difficult Interpretation*: Elements like logical plans and
> > >>       stack traces are rendered as single-line strings with embedded 
> > >> newline (
> > >>       \n) characters, making quick interpretation challenging.
> > >>       An example of a side-by-side plan comparison after setting
> > >>       spark.sql.planChangeLog.level=info:
> > >>       [image: image.png]
> > >>       2.
> > >>
> > >>    *Lack of Log Centralization Tools*
> > >>    - Although we can programmatically analyze logs using
> > >>       spark.read.schema(SPARK_LOG_SCHEMA).json("path/to/logs"), there is
> > >>       currently a lack of open-source tools to easily centralize and 
> > >> manage these
> > >>       logs across Drivers, Executors, Masters, and Workers. This limits 
> > >> the
> > >>       practical benefits we hoped to achieve with JSON logging.
> > >>    3.
> > >>
> > >>    *Consistency and Timing*
> > >>    - Since Spark 4.0 has yet to be released, we have an opportunity to
> > >>       maintain consistency with previous versions by reverting to plain 
> > >> text logs
> > >>       as the default. This doesn't close the door on structured logging; 
> > >> we can
> > >>       revisit this decision in future releases as the ecosystem matures 
> > >> and more
> > >>       supportive tools become available.
> > >>
> > >> Given these considerations, I support Wenchen's proposal to switch back
> > >> to plain text logs by default in Spark 4.0. Our goal is to provide the 
> > >> best
> > >> possible experience for our users, and adjusting our approach based on
> > >> real-world feedback is a part of that process.
> > >>
> > >> I'm looking forward to hearing your thoughts and discussing how we can
> > >> continue to improve our logging practices.
> > >>
> > >> Best regards,
> > >>
> > >> Gengliang Wang
> > >>
> > >> On Fri, Nov 22, 2024 at 3:32 PM bo yang <[email protected]> wrote:
> > >>
> > >>> +1 for default using plain text logging. It is good for simple usage
> > >>> scenario, will also be more friendly to first time Spark users.
> > >>>
> > >>> And different companies may already build some tooling to process Spark
> > >>> logs. Using plain text by default will make those exiting tools 
> > >>> continue to
> > >>> work.
> > >>>
> > >>>
> > >>> On Friday, November 22, 2024, serge rielau.com <[email protected]> wrote:
> > >>>
> > >>>> It doesn’t have to be very easy. It just has to be easier than
> > >>>> maintaining two infrastrictures forever.
> > >>>> If we can’t easily parse the json log to emmit the existing text
> > >>>> content, I’d say we have a bigger problem.
> > >>>>
> > >>>> On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim <
> > >>>> [email protected]>, wrote:
> > >>>>
> > >>>> I'm not sure it is very easy to provide a reader (I meant, viewer); it
> > >>>> would be mostly not a reader but a post-processor which will convert 
> > >>>> JSON
> > >>>> formatted log to plain text log. And after that users would get the 
> > >>>> "same"
> > >>>> UI/UX when dealing with log files in Spark 3.x. For people who do not
> > >>>> really need to structure the log and just want to go with their way of
> > >>>> reading the log (I'm a lover of grep), JSON formatted log by default 
> > >>>> is a
> > >>>> regression of UI/UX.
> > >>>>
> > >>>> JSON formatted log is definitely useful, but also definitely not
> > >>>> something to be human friendly. It is mostly only useful if they have
> > >>>> constructed an ecosystem around Spark which never requires humans to 
> > >>>> read
> > >>>> the log as JSON. I'm not quite sure whether we can/want to force users 
> > >>>> to
> > >>>> build the ecosystem to use Spark; for me, it's a lot easier for users 
> > >>>> to
> > >>>> have both options and turn on the config when they need it.
> > >>>>
> > >>>> +1 on Wenchen's proposal.
> > >>>>
> > >>>> On Sat, Nov 23, 2024 at 12:36 AM serge rielau.com <[email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>> Shouldn’t we differentiate between teh logging and the reading of the
> > >>>>> log.
> > >>>>> The problem appears to be in the presentation layer.
> > >>>>> We could provide a basic log reader, insteda of supporting longterm
> > >>>>> two different ways to log.
> > >>>>>
> > >>>>>
> > >>>>> On Nov 22, 2024, at 6:37 AM, Martin Grund
> > >>>>> <[email protected]> wrote:
> > >>>>>
> > >>>>> I'm generally supportive of this direction. However, I'm wondering if
> > >>>>> we can be more deliberate about when to use it. For example, for the 
> > >>>>> common
> > >>>>> scenarios that you mention as "light" usage, we should switch to 
> > >>>>> plain text
> > >>>>> logging.
> > >>>>>
> > >>>>> IMO, this would cover the cases where a user runs simply the pyspark
> > >>>>> or spark-shell scripts. For these use cases, most users will probably
> > >>>>> prefer plain text logging. Maybe we should even go one step further 
> > >>>>> and
> > >>>>> have some default console filters that use color output for these
> > >>>>> interactive use cases? And make it more readable in general?
> > >>>>>
> > >>>>> For the regular spark-submit-based job submissions, I would actually
> > >>>>> say that the benefits outweigh the potential complexity.
> > >>>>>
> > >>>>> WDYT?
> > >>>>>
> > >>>>> On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan <[email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> I'm writing this email to propose switching back to the previous
> > >>>>>> plain text logs by default, for the following reasons:
> > >>>>>>
> > >>>>>>    - The JSON log is not very human-readable. It's more verbose than
> > >>>>>>    plain text, and new lines become `\n`, making query plan tree 
> > >>>>>> string and
> > >>>>>>    error stacktrace very hard to read.
> > >>>>>>    - Structured Logging is not available out of the box. Users must
> > >>>>>>    set up a log pipeline to collect the JSON log files on drivers and
> > >>>>>>    executors first. Turning it on by default doesn't provide much 
> > >>>>>> value.
> > >>>>>>
> > >>>>>> Some examples of the hard-to-read JSON log:
> > >>>>>> [image: image.png]
> > >>>>>> [image: image.png]
> > >>>>>>
> > >>>>>> For the good of Spark engine developers and light Spark users, I
> > >>>>>> think the previous plain text log is a better choice. We can add a 
> > >>>>>> doc page
> > >>>>>> to introduce how to use Structured Logging: turn on the config, 
> > >>>>>> collect
> > >>>>>> JSON log files, and run queries.
> > >>>>>>
> > >>>>>> Please let me know if you share the same feelings or have different
> > >>>>>> opinions.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Wenchen
> > >>>>>>
> > >>>>>
> > >>>>>
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
> 
>


---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [DISCUSS] Use plain text logs by default

Reply via email to