Re: [DISCUSS] Use plain text logs by default

Mridul Muralidharan Fri, 22 Nov 2024 17:11:34 -0800

+1 to defaulting to text logs !

Regards,
Mridul


On Fri, Nov 22, 2024 at 6:21 PM Gengliang Wang <ltn...@gmail.com> wrote:

> Hi all,
>
> Earlier this year, we introduced JSON logging as the default in Spark with
> the aim of enhancing log structure and facilitating better analysis. While
> this change was made with the best intentions, we've collectively observed
> some practical challenges that impact usability.
>
> *Key Observations:*
>
>    1.
>
>    *Human Readability*
>    - *Cumbersome Formatting*: The JSON format, with its quotes and
>       braces, has proven less readable for direct log inspection.
>       - *Limitations of Pretty-Printing*: As noted in the Log4j
>       documentation
>       <https://logging.apache.org/log4j/2.x/manual/json-template-layout.html>,
>       pretty-printing JSON logs isn't feasible due to performance concerns.
>       - *Difficult Interpretation*: Elements like logical plans and stack
>       traces are rendered as single-line strings with embedded newline (\n)
>       characters, making quick interpretation challenging.
>       An example of a side-by-side plan comparison after setting
>       spark.sql.planChangeLog.level=info:
>       [image: image.png]
>       2.
>
>    *Lack of Log Centralization Tools*
>    - Although we can programmatically analyze logs using
>       spark.read.schema(SPARK_LOG_SCHEMA).json("path/to/logs"), there is
>       currently a lack of open-source tools to easily centralize and manage 
> these
>       logs across Drivers, Executors, Masters, and Workers. This limits the
>       practical benefits we hoped to achieve with JSON logging.
>    3.
>
>    *Consistency and Timing*
>    - Since Spark 4.0 has yet to be released, we have an opportunity to
>       maintain consistency with previous versions by reverting to plain text 
> logs
>       as the default. This doesn't close the door on structured logging; we 
> can
>       revisit this decision in future releases as the ecosystem matures and 
> more
>       supportive tools become available.
>
> Given these considerations, I support Wenchen's proposal to switch back to
> plain text logs by default in Spark 4.0. Our goal is to provide the best
> possible experience for our users, and adjusting our approach based on
> real-world feedback is a part of that process.
>
> I'm looking forward to hearing your thoughts and discussing how we can
> continue to improve our logging practices.
>
> Best regards,
>
> Gengliang Wang
>
> On Fri, Nov 22, 2024 at 3:32 PM bo yang <bobyan...@gmail.com> wrote:
>
>> +1 for default using plain text logging. It is good for simple usage
>> scenario, will also be more friendly to first time Spark users.
>>
>> And different companies may already build some tooling to process Spark
>> logs. Using plain text by default will make those exiting tools continue to
>> work.
>>
>>
>> On Friday, November 22, 2024, serge rielau.com <se...@rielau.com> wrote:
>>
>>> It doesn’t have to be very easy. It just has to be easier than
>>> maintaining two infrastrictures forever.
>>> If we can’t easily parse the json log to emmit the existing text
>>> content, I’d say we have a bigger problem.
>>>
>>> On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com>, wrote:
>>>
>>> I'm not sure it is very easy to provide a reader (I meant, viewer); it
>>> would be mostly not a reader but a post-processor which will convert JSON
>>> formatted log to plain text log. And after that users would get the "same"
>>> UI/UX when dealing with log files in Spark 3.x. For people who do not
>>> really need to structure the log and just want to go with their way of
>>> reading the log (I'm a lover of grep), JSON formatted log by default is a
>>> regression of UI/UX.
>>>
>>> JSON formatted log is definitely useful, but also definitely not
>>> something to be human friendly. It is mostly only useful if they have
>>> constructed an ecosystem around Spark which never requires humans to read
>>> the log as JSON. I'm not quite sure whether we can/want to force users to
>>> build the ecosystem to use Spark; for me, it's a lot easier for users to
>>> have both options and turn on the config when they need it.
>>>
>>> +1 on Wenchen's proposal.
>>>
>>> On Sat, Nov 23, 2024 at 12:36 AM serge rielau.com <se...@rielau.com>
>>> wrote:
>>>
>>>> Shouldn’t we differentiate between teh logging and the reading of the
>>>> log.
>>>> The problem appears to be in the presentation layer.
>>>> We could provide a basic log reader, insteda of supporting longterm two
>>>> different ways to log.
>>>>
>>>>
>>>> On Nov 22, 2024, at 6:37 AM, Martin Grund <mar...@databricks.com.INVALID>
>>>> wrote:
>>>>
>>>> I'm generally supportive of this direction. However, I'm wondering if
>>>> we can be more deliberate about when to use it. For example, for the common
>>>> scenarios that you mention as "light" usage, we should switch to plain text
>>>> logging.
>>>>
>>>> IMO, this would cover the cases where a user runs simply the pyspark or
>>>> spark-shell scripts. For these use cases, most users will probably prefer
>>>> plain text logging. Maybe we should even go one step further and have some
>>>> default console filters that use color output for these interactive use
>>>> cases? And make it more readable in general?
>>>>
>>>> For the regular spark-submit-based job submissions, I would actually
>>>> say that the benefits outweigh the potential complexity.
>>>>
>>>> WDYT?
>>>>
>>>> On Fri, Nov 22, 2024 at 3:26 PM Wenchen Fan <cloud0...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm writing this email to propose switching back to the previous plain
>>>>> text logs by default, for the following reasons:
>>>>>
>>>>>    - The JSON log is not very human-readable. It's more verbose than
>>>>>    plain text, and new lines become `\n`, making query plan tree string 
>>>>> and
>>>>>    error stacktrace very hard to read.
>>>>>    - Structured Logging is not available out of the box. Users must
>>>>>    set up a log pipeline to collect the JSON log files on drivers and
>>>>>    executors first. Turning it on by default doesn't provide much value.
>>>>>
>>>>> Some examples of the hard-to-read JSON log:
>>>>> [image: image.png]
>>>>> [image: image.png]
>>>>>
>>>>> For the good of Spark engine developers and light Spark users, I think
>>>>> the previous plain text log is a better choice. We can add a doc page to
>>>>> introduce how to use Structured Logging: turn on the config, collect JSON
>>>>> log files, and run queries.
>>>>>
>>>>> Please let me know if you share the same feelings or have different
>>>>> opinions.
>>>>>
>>>>> Thanks,
>>>>> Wenchen
>>>>>
>>>>
>>>>

Re: [DISCUSS] Use plain text logs by default

Reply via email to