Re: [DISCUSS] Use plain text logs by default

Allison Wang Tue, 10 Dec 2024 20:55:31 -0800

+1. Text logs are much more human-readable.

On Wed, Dec 11, 2024 at 12:12 PM Igor Dvorzhak <[email protected]>
wrote:


> +1
>
>
> On Tue, Dec 10, 2024 at 7:48 PM Yang Jie <[email protected]> wrote:
>
>> +1
>>
>> On 2024/12/11 02:34:02 Kent Yao wrote:
>> > +1
>> >
>> > On 2024/11/23 02:50:36 Wenchen Fan wrote:
>> > > Hi Martin,
>> > >
>> > > Yea, we should be more deliberate about when to use Structured
>> Logging. Let
>> > > me start with when people prefer plain text logs:
>> > > - Spark engine developers like us. When running tests, the logs are
>> printed
>> > > in the console and plain text log is more human-readable.
>> > > - Spark users who prefer to read the logs manually due to the lack of
>> infra
>> > > support.
>> > > - Spark users who already have decent log infra based on the plain
>> text
>> > > logs.
>> > >
>> > > In general, I think Structured Logging should be used when users want
>> to
>> > > build an infra to consume logs by machine, or they want to switch
>> their
>> > > existing infra to use JSON logs. Both need non-trivial work and
>> turning
>> > > Structured Logging by default won't provide them much value, but it
>> hurts
>> > > UX for people who still prefer plain text logs.
>> > >
>> > > On Sat, Nov 23, 2024 at 9:09 AM Mridul Muralidharan <[email protected]
>> >
>> > > wrote:
>> > >
>> > > > +1 to defaulting to text logs !
>> > > >
>> > > > Regards,
>> > > > Mridul
>> > > >
>> > > > On Fri, Nov 22, 2024 at 6:21 PM Gengliang Wang <[email protected]>
>> wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> Earlier this year, we introduced JSON logging as the default in
>> Spark
>> > > >> with the aim of enhancing log structure and facilitating better
>> analysis.
>> > > >> While this change was made with the best intentions, we've
>> collectively
>> > > >> observed some practical challenges that impact usability.
>> > > >>
>> > > >> *Key Observations:*
>> > > >>
>> > > >>    1.
>> > > >>
>> > > >>    *Human Readability*
>> > > >>    - *Cumbersome Formatting*: The JSON format, with its quotes and
>> > > >>       braces, has proven less readable for direct log inspection.
>> > > >>       - *Limitations of Pretty-Printing*: As noted in the Log4j
>> > > >>       documentation
>> > > >>       <
>> https://logging.apache.org/log4j/2.x/manual/json-template-layout.html>,
>> > > >>       pretty-printing JSON logs isn't feasible due to performance
>> concerns.
>> > > >>       - *Difficult Interpretation*: Elements like logical plans and
>> > > >>       stack traces are rendered as single-line strings with
>> embedded newline (
>> > > >>       \n) characters, making quick interpretation challenging.
>> > > >>       An example of a side-by-side plan comparison after setting
>> > > >>       spark.sql.planChangeLog.level=info:
>> > > >>       [image: image.png]
>> > > >>       2.
>> > > >>
>> > > >>    *Lack of Log Centralization Tools*
>> > > >>    - Although we can programmatically analyze logs using
>> > > >>       spark.read.schema(SPARK_LOG_SCHEMA).json("path/to/logs"),
>> there is
>> > > >>       currently a lack of open-source tools to easily centralize
>> and manage these
>> > > >>       logs across Drivers, Executors, Masters, and Workers. This
>> limits the
>> > > >>       practical benefits we hoped to achieve with JSON logging.
>> > > >>    3.
>> > > >>
>> > > >>    *Consistency and Timing*
>> > > >>    - Since Spark 4.0 has yet to be released, we have an
>> opportunity to
>> > > >>       maintain consistency with previous versions by reverting to
>> plain text logs
>> > > >>       as the default. This doesn't close the door on structured
>> logging; we can
>> > > >>       revisit this decision in future releases as the ecosystem
>> matures and more
>> > > >>       supportive tools become available.
>> > > >>
>> > > >> Given these considerations, I support Wenchen's proposal to switch
>> back
>> > > >> to plain text logs by default in Spark 4.0. Our goal is to provide
>> the best
>> > > >> possible experience for our users, and adjusting our approach
>> based on
>> > > >> real-world feedback is a part of that process.
>> > > >>
>> > > >> I'm looking forward to hearing your thoughts and discussing how we
>> can
>> > > >> continue to improve our logging practices.
>> > > >>
>> > > >> Best regards,
>> > > >>
>> > > >> Gengliang Wang
>> > > >>
>> > > >> On Fri, Nov 22, 2024 at 3:32 PM bo yang <[email protected]>
>> wrote:
>> > > >>
>> > > >>> +1 for default using plain text logging. It is good for simple
>> usage
>> > > >>> scenario, will also be more friendly to first time Spark users.
>> > > >>>
>> > > >>> And different companies may already build some tooling to process
>> Spark
>> > > >>> logs. Using plain text by default will make those exiting tools
>> continue to
>> > > >>> work.
>> > > >>>
>> > > >>>
>> > > >>> On Friday, November 22, 2024, serge rielau.com <[email protected]>
>> wrote:
>> > > >>>
>> > > >>>> It doesn’t have to be very easy. It just has to be easier than
>> > > >>>> maintaining two infrastrictures forever.
>> > > >>>> If we can’t easily parse the json log to emmit the existing text
>> > > >>>> content, I’d say we have a bigger problem.
>> > > >>>>
>> > > >>>> On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim <
>> > > >>>> [email protected]>, wrote:
>> > > >>>>
>> > > >>>> I'm not sure it is very easy to provide a reader (I meant,
>> viewer); it
>> > > >>>> would be mostly not a reader but a post-processor which will
>> convert JSON
>> > > >>>> formatted log to plain text log. And after that users would get
>> the "same"
>> > > >>>> UI/UX when dealing with log files in Spark 3.x. For people who
>> do not
>> > > >>>> really need to structure the log and just want to go with their
>> way of
>> > > >>>> reading the log (I'm a lover of grep), JSON formatted log by
>> default is a
>> > > >>>> regression of UI/UX.
>> > > >>>>
>> > > >>>> JSON formatted log is definitely useful, but also definitely not
>> > > >>>> something to be human friendly. It is mostly only useful if they
>> have
>> > > >>>> constructed an ecosystem around Spark which never requires
>> humans to read
>> > > >>>> the log as JSON. I'm not quite sure whether we can/want to force
>> users to
>> > > >>>> build the ecosystem to use Spark; for me, it's a lot easier for
>> users to
>> > > >>>> have both options and turn on the config when they need it.
>> > > >>>>
>> > > >>>> +1 on Wenchen's proposal.
>> > > >>>>
>> > > >>>> On Sat, Nov 23, 2024 at 12:36 AM serge rielau.com <
>> [email protected]>
>> > > >>>> wrote:
>> > > >>>>
>> > > >>>>> Shouldn’t we differentiate between teh logging and the reading
>> of the
>> > > >>>>> log.
>> > > >>>>> The problem appears to be in the presentation layer.
>> > > >>>>> We could provide a basic log reader, insteda of supporting
>> longterm
>> > > >>>>> two different ways to log.
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> On Nov 22, 2024, at 6:37 AM, Martin Grund
>> > > >>>>> <[email protected]> wrote:
>> > > >>>>>
>> > > >>>>> I'm generally supportive of this direction. However, I'm
>> wondering if
>> > > >>>>> we can be more deliberate about when to use it. For example,
>> for the common
>> > > >>>>> scenarios that you mention as "light" usage, we should switch
>> to plain text
>> > > >>>>> logging.
>> > > >>>>>
>> > > >>>>> IMO, this would cover the cases where a user runs simply the
>> pyspark
>> > > >>>>> or spark-shell scripts. For these use cases, most users will
>> probably
>> > > >>>>> prefer plain text logging. Maybe we should even go one step
>> further and
>> > > >>>>> have some default console filters that use color output for
>> these
>> > > >>>>> interactive use cases? And make it more readable in general?
>> > > >>>>>
>> > > >>>>> For the regular spark-submit-based job submissions, I would
>> actually
>> > > >>>>> say that the benefits outweigh the potential complexity.
>> > > >>>>>
>> > > >>>>> WDYT?
>> > > >>>>>
>> > > >>>>> On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan <
>> [email protected]>
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>>> Hi all,
>> > > >>>>>>
>> > > >>>>>> I'm writing this email to propose switching back to the
>> previous
>> > > >>>>>> plain text logs by default, for the following reasons:
>> > > >>>>>>
>> > > >>>>>>    - The JSON log is not very human-readable. It's more
>> verbose than
>> > > >>>>>>    plain text, and new lines become `\n`, making query plan
>> tree string and
>> > > >>>>>>    error stacktrace very hard to read.
>> > > >>>>>>    - Structured Logging is not available out of the box. Users
>> must
>> > > >>>>>>    set up a log pipeline to collect the JSON log files on
>> drivers and
>> > > >>>>>>    executors first. Turning it on by default doesn't provide
>> much value.
>> > > >>>>>>
>> > > >>>>>> Some examples of the hard-to-read JSON log:
>> > > >>>>>> [image: image.png]
>> > > >>>>>> [image: image.png]
>> > > >>>>>>
>> > > >>>>>> For the good of Spark engine developers and light Spark users,
>> I
>> > > >>>>>> think the previous plain text log is a better choice. We can
>> add a doc page
>> > > >>>>>> to introduce how to use Structured Logging: turn on the
>> config, collect
>> > > >>>>>> JSON log files, and run queries.
>> > > >>>>>>
>> > > >>>>>> Please let me know if you share the same feelings or have
>> different
>> > > >>>>>> opinions.
>> > > >>>>>>
>> > > >>>>>> Thanks,
>> > > >>>>>> Wenchen
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>>
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: [email protected]
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>

Re: [DISCUSS] Use plain text logs by default

Reply via email to