+1. Text logs are much more human-readable. On Wed, Dec 11, 2024 at 12:12 PM Igor Dvorzhak <i...@google.com.invalid> wrote:
> +1 > > > On Tue, Dec 10, 2024 at 7:48 PM Yang Jie <yangji...@apache.org> wrote: > >> +1 >> >> On 2024/12/11 02:34:02 Kent Yao wrote: >> > +1 >> > >> > On 2024/11/23 02:50:36 Wenchen Fan wrote: >> > > Hi Martin, >> > > >> > > Yea, we should be more deliberate about when to use Structured >> Logging. Let >> > > me start with when people prefer plain text logs: >> > > - Spark engine developers like us. When running tests, the logs are >> printed >> > > in the console and plain text log is more human-readable. >> > > - Spark users who prefer to read the logs manually due to the lack of >> infra >> > > support. >> > > - Spark users who already have decent log infra based on the plain >> text >> > > logs. >> > > >> > > In general, I think Structured Logging should be used when users want >> to >> > > build an infra to consume logs by machine, or they want to switch >> their >> > > existing infra to use JSON logs. Both need non-trivial work and >> turning >> > > Structured Logging by default won't provide them much value, but it >> hurts >> > > UX for people who still prefer plain text logs. >> > > >> > > On Sat, Nov 23, 2024 at 9:09 AM Mridul Muralidharan <mri...@gmail.com >> > >> > > wrote: >> > > >> > > > +1 to defaulting to text logs ! >> > > > >> > > > Regards, >> > > > Mridul >> > > > >> > > > On Fri, Nov 22, 2024 at 6:21 PM Gengliang Wang <ltn...@gmail.com> >> wrote: >> > > > >> > > >> Hi all, >> > > >> >> > > >> Earlier this year, we introduced JSON logging as the default in >> Spark >> > > >> with the aim of enhancing log structure and facilitating better >> analysis. >> > > >> While this change was made with the best intentions, we've >> collectively >> > > >> observed some practical challenges that impact usability. >> > > >> >> > > >> *Key Observations:* >> > > >> >> > > >> 1. >> > > >> >> > > >> *Human Readability* >> > > >> - *Cumbersome Formatting*: The JSON format, with its quotes and >> > > >> braces, has proven less readable for direct log inspection. >> > > >> - *Limitations of Pretty-Printing*: As noted in the Log4j >> > > >> documentation >> > > >> < >> https://logging.apache.org/log4j/2.x/manual/json-template-layout.html>, >> > > >> pretty-printing JSON logs isn't feasible due to performance >> concerns. >> > > >> - *Difficult Interpretation*: Elements like logical plans and >> > > >> stack traces are rendered as single-line strings with >> embedded newline ( >> > > >> \n) characters, making quick interpretation challenging. >> > > >> An example of a side-by-side plan comparison after setting >> > > >> spark.sql.planChangeLog.level=info: >> > > >> [image: image.png] >> > > >> 2. >> > > >> >> > > >> *Lack of Log Centralization Tools* >> > > >> - Although we can programmatically analyze logs using >> > > >> spark.read.schema(SPARK_LOG_SCHEMA).json("path/to/logs"), >> there is >> > > >> currently a lack of open-source tools to easily centralize >> and manage these >> > > >> logs across Drivers, Executors, Masters, and Workers. This >> limits the >> > > >> practical benefits we hoped to achieve with JSON logging. >> > > >> 3. >> > > >> >> > > >> *Consistency and Timing* >> > > >> - Since Spark 4.0 has yet to be released, we have an >> opportunity to >> > > >> maintain consistency with previous versions by reverting to >> plain text logs >> > > >> as the default. This doesn't close the door on structured >> logging; we can >> > > >> revisit this decision in future releases as the ecosystem >> matures and more >> > > >> supportive tools become available. >> > > >> >> > > >> Given these considerations, I support Wenchen's proposal to switch >> back >> > > >> to plain text logs by default in Spark 4.0. Our goal is to provide >> the best >> > > >> possible experience for our users, and adjusting our approach >> based on >> > > >> real-world feedback is a part of that process. >> > > >> >> > > >> I'm looking forward to hearing your thoughts and discussing how we >> can >> > > >> continue to improve our logging practices. >> > > >> >> > > >> Best regards, >> > > >> >> > > >> Gengliang Wang >> > > >> >> > > >> On Fri, Nov 22, 2024 at 3:32 PM bo yang <bobyan...@gmail.com> >> wrote: >> > > >> >> > > >>> +1 for default using plain text logging. It is good for simple >> usage >> > > >>> scenario, will also be more friendly to first time Spark users. >> > > >>> >> > > >>> And different companies may already build some tooling to process >> Spark >> > > >>> logs. Using plain text by default will make those exiting tools >> continue to >> > > >>> work. >> > > >>> >> > > >>> >> > > >>> On Friday, November 22, 2024, serge rielau.com <se...@rielau.com> >> wrote: >> > > >>> >> > > >>>> It doesn’t have to be very easy. It just has to be easier than >> > > >>>> maintaining two infrastrictures forever. >> > > >>>> If we can’t easily parse the json log to emmit the existing text >> > > >>>> content, I’d say we have a bigger problem. >> > > >>>> >> > > >>>> On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim < >> > > >>>> kabhwan.opensou...@gmail.com>, wrote: >> > > >>>> >> > > >>>> I'm not sure it is very easy to provide a reader (I meant, >> viewer); it >> > > >>>> would be mostly not a reader but a post-processor which will >> convert JSON >> > > >>>> formatted log to plain text log. And after that users would get >> the "same" >> > > >>>> UI/UX when dealing with log files in Spark 3.x. For people who >> do not >> > > >>>> really need to structure the log and just want to go with their >> way of >> > > >>>> reading the log (I'm a lover of grep), JSON formatted log by >> default is a >> > > >>>> regression of UI/UX. >> > > >>>> >> > > >>>> JSON formatted log is definitely useful, but also definitely not >> > > >>>> something to be human friendly. It is mostly only useful if they >> have >> > > >>>> constructed an ecosystem around Spark which never requires >> humans to read >> > > >>>> the log as JSON. I'm not quite sure whether we can/want to force >> users to >> > > >>>> build the ecosystem to use Spark; for me, it's a lot easier for >> users to >> > > >>>> have both options and turn on the config when they need it. >> > > >>>> >> > > >>>> +1 on Wenchen's proposal. >> > > >>>> >> > > >>>> On Sat, Nov 23, 2024 at 12:36 AM serge rielau.com < >> se...@rielau.com> >> > > >>>> wrote: >> > > >>>> >> > > >>>>> Shouldn’t we differentiate between teh logging and the reading >> of the >> > > >>>>> log. >> > > >>>>> The problem appears to be in the presentation layer. >> > > >>>>> We could provide a basic log reader, insteda of supporting >> longterm >> > > >>>>> two different ways to log. >> > > >>>>> >> > > >>>>> >> > > >>>>> On Nov 22, 2024, at 6:37 AM, Martin Grund >> > > >>>>> <mar...@databricks.com.INVALID> wrote: >> > > >>>>> >> > > >>>>> I'm generally supportive of this direction. However, I'm >> wondering if >> > > >>>>> we can be more deliberate about when to use it. For example, >> for the common >> > > >>>>> scenarios that you mention as "light" usage, we should switch >> to plain text >> > > >>>>> logging. >> > > >>>>> >> > > >>>>> IMO, this would cover the cases where a user runs simply the >> pyspark >> > > >>>>> or spark-shell scripts. For these use cases, most users will >> probably >> > > >>>>> prefer plain text logging. Maybe we should even go one step >> further and >> > > >>>>> have some default console filters that use color output for >> these >> > > >>>>> interactive use cases? And make it more readable in general? >> > > >>>>> >> > > >>>>> For the regular spark-submit-based job submissions, I would >> actually >> > > >>>>> say that the benefits outweigh the potential complexity. >> > > >>>>> >> > > >>>>> WDYT? >> > > >>>>> >> > > >>>>> On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan < >> cloud0...@gmail.com> >> > > >>>>> wrote: >> > > >>>>> >> > > >>>>>> Hi all, >> > > >>>>>> >> > > >>>>>> I'm writing this email to propose switching back to the >> previous >> > > >>>>>> plain text logs by default, for the following reasons: >> > > >>>>>> >> > > >>>>>> - The JSON log is not very human-readable. It's more >> verbose than >> > > >>>>>> plain text, and new lines become `\n`, making query plan >> tree string and >> > > >>>>>> error stacktrace very hard to read. >> > > >>>>>> - Structured Logging is not available out of the box. Users >> must >> > > >>>>>> set up a log pipeline to collect the JSON log files on >> drivers and >> > > >>>>>> executors first. Turning it on by default doesn't provide >> much value. >> > > >>>>>> >> > > >>>>>> Some examples of the hard-to-read JSON log: >> > > >>>>>> [image: image.png] >> > > >>>>>> [image: image.png] >> > > >>>>>> >> > > >>>>>> For the good of Spark engine developers and light Spark users, >> I >> > > >>>>>> think the previous plain text log is a better choice. We can >> add a doc page >> > > >>>>>> to introduce how to use Structured Logging: turn on the >> config, collect >> > > >>>>>> JSON log files, and run queries. >> > > >>>>>> >> > > >>>>>> Please let me know if you share the same feelings or have >> different >> > > >>>>>> opinions. >> > > >>>>>> >> > > >>>>>> Thanks, >> > > >>>>>> Wenchen >> > > >>>>>> >> > > >>>>> >> > > >>>>> >> > > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>