Re: [DISCUSS] Use plain text logs by default

Igor Dvorzhak Tue, 10 Dec 2024 20:11:48 -0800

+1


On Tue, Dec 10, 2024 at 7:48 PM Yang Jie <yangji...@apache.org> wrote:

> +1
>
> On 2024/12/11 02:34:02 Kent Yao wrote:
> > +1
> >
> > On 2024/11/23 02:50:36 Wenchen Fan wrote:
> > > Hi Martin,
> > >
> > > Yea, we should be more deliberate about when to use Structured
> Logging. Let
> > > me start with when people prefer plain text logs:
> > > - Spark engine developers like us. When running tests, the logs are
> printed
> > > in the console and plain text log is more human-readable.
> > > - Spark users who prefer to read the logs manually due to the lack of
> infra
> > > support.
> > > - Spark users who already have decent log infra based on the plain text
> > > logs.
> > >
> > > In general, I think Structured Logging should be used when users want
> to
> > > build an infra to consume logs by machine, or they want to switch their
> > > existing infra to use JSON logs. Both need non-trivial work and turning
> > > Structured Logging by default won't provide them much value, but it
> hurts
> > > UX for people who still prefer plain text logs.
> > >
> > > On Sat, Nov 23, 2024 at 9:09 AM Mridul Muralidharan <mri...@gmail.com>
> > > wrote:
> > >
> > > > +1 to defaulting to text logs !
> > > >
> > > > Regards,
> > > > Mridul
> > > >
> > > > On Fri, Nov 22, 2024 at 6:21 PM Gengliang Wang <ltn...@gmail.com>
> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Earlier this year, we introduced JSON logging as the default in
> Spark
> > > >> with the aim of enhancing log structure and facilitating better
> analysis.
> > > >> While this change was made with the best intentions, we've
> collectively
> > > >> observed some practical challenges that impact usability.
> > > >>
> > > >> *Key Observations:*
> > > >>
> > > >>    1.
> > > >>
> > > >>    *Human Readability*
> > > >>    - *Cumbersome Formatting*: The JSON format, with its quotes and
> > > >>       braces, has proven less readable for direct log inspection.
> > > >>       - *Limitations of Pretty-Printing*: As noted in the Log4j
> > > >>       documentation
> > > >>       <
> https://logging.apache.org/log4j/2.x/manual/json-template-layout.html>,
> > > >>       pretty-printing JSON logs isn't feasible due to performance
> concerns.
> > > >>       - *Difficult Interpretation*: Elements like logical plans and
> > > >>       stack traces are rendered as single-line strings with
> embedded newline (
> > > >>       \n) characters, making quick interpretation challenging.
> > > >>       An example of a side-by-side plan comparison after setting
> > > >>       spark.sql.planChangeLog.level=info:
> > > >>       [image: image.png]
> > > >>       2.
> > > >>
> > > >>    *Lack of Log Centralization Tools*
> > > >>    - Although we can programmatically analyze logs using
> > > >>       spark.read.schema(SPARK_LOG_SCHEMA).json("path/to/logs"),
> there is
> > > >>       currently a lack of open-source tools to easily centralize
> and manage these
> > > >>       logs across Drivers, Executors, Masters, and Workers. This
> limits the
> > > >>       practical benefits we hoped to achieve with JSON logging.
> > > >>    3.
> > > >>
> > > >>    *Consistency and Timing*
> > > >>    - Since Spark 4.0 has yet to be released, we have an opportunity
> to
> > > >>       maintain consistency with previous versions by reverting to
> plain text logs
> > > >>       as the default. This doesn't close the door on structured
> logging; we can
> > > >>       revisit this decision in future releases as the ecosystem
> matures and more
> > > >>       supportive tools become available.
> > > >>
> > > >> Given these considerations, I support Wenchen's proposal to switch
> back
> > > >> to plain text logs by default in Spark 4.0. Our goal is to provide
> the best
> > > >> possible experience for our users, and adjusting our approach based
> on
> > > >> real-world feedback is a part of that process.
> > > >>
> > > >> I'm looking forward to hearing your thoughts and discussing how we
> can
> > > >> continue to improve our logging practices.
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Gengliang Wang
> > > >>
> > > >> On Fri, Nov 22, 2024 at 3:32 PM bo yang <bobyan...@gmail.com>
> wrote:
> > > >>
> > > >>> +1 for default using plain text logging. It is good for simple
> usage
> > > >>> scenario, will also be more friendly to first time Spark users.
> > > >>>
> > > >>> And different companies may already build some tooling to process
> Spark
> > > >>> logs. Using plain text by default will make those exiting tools
> continue to
> > > >>> work.
> > > >>>
> > > >>>
> > > >>> On Friday, November 22, 2024, serge rielau.com <se...@rielau.com>
> wrote:
> > > >>>
> > > >>>> It doesn’t have to be very easy. It just has to be easier than
> > > >>>> maintaining two infrastrictures forever.
> > > >>>> If we can’t easily parse the json log to emmit the existing text
> > > >>>> content, I’d say we have a bigger problem.
> > > >>>>
> > > >>>> On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim <
> > > >>>> kabhwan.opensou...@gmail.com>, wrote:
> > > >>>>
> > > >>>> I'm not sure it is very easy to provide a reader (I meant,
> viewer); it
> > > >>>> would be mostly not a reader but a post-processor which will
> convert JSON
> > > >>>> formatted log to plain text log. And after that users would get
> the "same"
> > > >>>> UI/UX when dealing with log files in Spark 3.x. For people who do
> not
> > > >>>> really need to structure the log and just want to go with their
> way of
> > > >>>> reading the log (I'm a lover of grep), JSON formatted log by
> default is a
> > > >>>> regression of UI/UX.
> > > >>>>
> > > >>>> JSON formatted log is definitely useful, but also definitely not
> > > >>>> something to be human friendly. It is mostly only useful if they
> have
> > > >>>> constructed an ecosystem around Spark which never requires humans
> to read
> > > >>>> the log as JSON. I'm not quite sure whether we can/want to force
> users to
> > > >>>> build the ecosystem to use Spark; for me, it's a lot easier for
> users to
> > > >>>> have both options and turn on the config when they need it.
> > > >>>>
> > > >>>> +1 on Wenchen's proposal.
> > > >>>>
> > > >>>> On Sat, Nov 23, 2024 at 12:36 AM serge rielau.com <
> se...@rielau.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Shouldn’t we differentiate between teh logging and the reading
> of the
> > > >>>>> log.
> > > >>>>> The problem appears to be in the presentation layer.
> > > >>>>> We could provide a basic log reader, insteda of supporting
> longterm
> > > >>>>> two different ways to log.
> > > >>>>>
> > > >>>>>
> > > >>>>> On Nov 22, 2024, at 6:37 AM, Martin Grund
> > > >>>>> <mar...@databricks.com.INVALID> wrote:
> > > >>>>>
> > > >>>>> I'm generally supportive of this direction. However, I'm
> wondering if
> > > >>>>> we can be more deliberate about when to use it. For example, for
> the common
> > > >>>>> scenarios that you mention as "light" usage, we should switch to
> plain text
> > > >>>>> logging.
> > > >>>>>
> > > >>>>> IMO, this would cover the cases where a user runs simply the
> pyspark
> > > >>>>> or spark-shell scripts. For these use cases, most users will
> probably
> > > >>>>> prefer plain text logging. Maybe we should even go one step
> further and
> > > >>>>> have some default console filters that use color output for these
> > > >>>>> interactive use cases? And make it more readable in general?
> > > >>>>>
> > > >>>>> For the regular spark-submit-based job submissions, I would
> actually
> > > >>>>> say that the benefits outweigh the potential complexity.
> > > >>>>>
> > > >>>>> WDYT?
> > > >>>>>
> > > >>>>> On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan <cloud0...@gmail.com
> >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi all,
> > > >>>>>>
> > > >>>>>> I'm writing this email to propose switching back to the previous
> > > >>>>>> plain text logs by default, for the following reasons:
> > > >>>>>>
> > > >>>>>>    - The JSON log is not very human-readable. It's more verbose
> than
> > > >>>>>>    plain text, and new lines become `\n`, making query plan
> tree string and
> > > >>>>>>    error stacktrace very hard to read.
> > > >>>>>>    - Structured Logging is not available out of the box. Users
> must
> > > >>>>>>    set up a log pipeline to collect the JSON log files on
> drivers and
> > > >>>>>>    executors first. Turning it on by default doesn't provide
> much value.
> > > >>>>>>
> > > >>>>>> Some examples of the hard-to-read JSON log:
> > > >>>>>> [image: image.png]
> > > >>>>>> [image: image.png]
> > > >>>>>>
> > > >>>>>> For the good of Spark engine developers and light Spark users, I
> > > >>>>>> think the previous plain text log is a better choice. We can
> add a doc page
> > > >>>>>> to introduce how to use Structured Logging: turn on the config,
> collect
> > > >>>>>> JSON log files, and run queries.
> > > >>>>>>
> > > >>>>>> Please let me know if you share the same feelings or have
> different
> > > >>>>>> opinions.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Wenchen
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [DISCUSS] Use plain text logs by default

Reply via email to