Re: [DISCUSS] Use plain text logs by default

Martin Grund Fri, 22 Nov 2024 06:38:43 -0800

I'm generally supportive of this direction. However, I'm wondering if we
can be more deliberate about when to use it. For example, for the common
scenarios that you mention as "light" usage, we should switch to plain text
logging.

IMO, this would cover the cases where a user runs simply the pyspark or
spark-shell scripts. For these use cases, most users will probably prefer
plain text logging. Maybe we should even go one step further and have some
default console filters that use color output for these interactive use
cases? And make it more readable in general?

For the regular spark-submit-based job submissions, I would actually say
that the benefits outweigh the potential complexity.

WDYT?

On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan <cloud0...@gmail.com> wrote:

> Hi all,
>
> I'm writing this email to propose switching back to the previous plain
> text logs by default, for the following reasons:
>
>    - The JSON log is not very human-readable. It's more verbose than
>    plain text, and new lines become `\n`, making query plan tree string and
>    error stacktrace very hard to read.
>    - Structured Logging is not available out of the box. Users must set
>    up a log pipeline to collect the JSON log files on drivers and executors
>    first. Turning it on by default doesn't provide much value.
>
> Some examples of the hard-to-read JSON log:
> [image: image.png]
> [image: image.png]
>
> For the good of Spark engine developers and light Spark users, I think the
> previous plain text log is a better choice. We can add a doc page to
> introduce how to use Structured Logging: turn on the config, collect JSON
> log files, and run queries.
>
> Please let me know if you share the same feelings or have different
> opinions.
>
> Thanks,
> Wenchen
>

Re: [DISCUSS] Use plain text logs by default

Reply via email to