Re: [DISCUSS] Use plain text logs by default

serge rielau . com Fri, 22 Nov 2024 15:26:23 -0800

It doesn’t have to be very easy. It just has to be easier than maintaining two 
infrastrictures forever.
If we can’t easily parse the json log to emmit the existing text content, I’d 
say we have a bigger problem.


On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim <kabhwan.opensou...@gmail.com>, 
wrote:
I'm not sure it is very easy to provide a reader (I meant, viewer); it would be 
mostly not a reader but a post-processor which will convert JSON formatted log 
to plain text log. And after that users would get the "same" UI/UX when dealing 
with log files in Spark 3.x. For people who do not really need to structure the 
log and just want to go with their way of reading the log (I'm a lover of 
grep), JSON formatted log by default is a regression of UI/UX.

JSON formatted log is definitely useful, but also definitely not something to 
be human friendly. It is mostly only useful if they have constructed an 
ecosystem around Spark which never requires humans to read the log as JSON. I'm 
not quite sure whether we can/want to force users to build the ecosystem to use 
Spark; for me, it's a lot easier for users to have both options and turn on the 
config when they need it.

+1 on Wenchen's proposal.

On Sat, Nov 23, 2024 at 12:36 AM serge rielau.com<http://rielau.com> 
<se...@rielau.com<mailto:se...@rielau.com>> wrote:
Shouldn’t we differentiate between teh logging and the reading of the log.
The problem appears to be in the presentation layer.
We could provide a basic log reader, insteda of supporting longterm two 
different ways to log.


On Nov 22, 2024, at 6:37 AM, Martin Grund <mar...@databricks.com.INVALID> wrote:

I'm generally supportive of this direction. However, I'm wondering if we can be 
more deliberate about when to use it. For example, for the common scenarios 
that you mention as "light" usage, we should switch to plain text logging.

IMO, this would cover the cases where a user runs simply the pyspark or 
spark-shell scripts. For these use cases, most users will probably prefer plain 
text logging. Maybe we should even go one step further and have some default 
console filters that use color output for these interactive use cases? And make 
it more readable in general?

For the regular spark-submit-based job submissions, I would actually say that 
the benefits outweigh the potential complexity.

WDYT?

On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan 
<cloud0...@gmail.com<mailto:cloud0...@gmail.com>> wrote:
Hi all,

I'm writing this email to propose switching back to the previous plain text 
logs by default, for the following reasons:

  *   The JSON log is not very human-readable. It's more verbose than plain 
text, and new lines become `\n`, making query plan tree string and error 
stacktrace very hard to read.
  *   Structured Logging is not available out of the box. Users must set up a 
log pipeline to collect the JSON log files on drivers and executors first. 
Turning it on by default doesn't provide much value.

Some examples of the hard-to-read JSON log:
[image.png]
[image.png]

For the good of Spark engine developers and light Spark users, I think the 
previous plain text log is a better choice. We can add a doc page to introduce 
how to use Structured Logging: turn on the config, collect JSON log files, and 
run queries.

Please let me know if you share the same feelings or have different opinions.

Thanks,
Wenchen

Re: [DISCUSS] Use plain text logs by default

Reply via email to