[ https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-47240: ---------------------------------- Labels: pull-request-available releasenotes (was: pull-request-available) > SPIP: Structured Logging Framework for Apache Spark > --------------------------------------------------- > > Key: SPARK-47240 > URL: https://issues.apache.org/jira/browse/SPARK-47240 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 4.0.0 > Reporter: Gengliang Wang > Assignee: Apache Spark > Priority: Major > Labels: pull-request-available, releasenotes > > This proposal aims to enhance Apache Spark's logging system by implementing > structured logging. This transition will change the format of the default log > files from plain text to JSON, making them more accessible and analyzable. > The new logs will include crucial identifiers such as worker, executor, > query, job, stage, and task IDs, thereby making the logs more informative and > facilitating easier search and analysis. > h2. Current Logging Format > The current format of Spark logs is plain text, which can be challenging to > parse and analyze efficiently. An example of the current log format is as > follows: > {code:java} > 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor > 289 is alive or not. > org.apache.spark.SparkException: Exception thrown in awaitResult: > <stacktrace…> > Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: .. > {code} > h2. Proposed Structured Logging Format > The proposed change involves structuring the logs in JSON format, which > organizes the log information into easily identifiable fields. Here is how > the new structured log format would look: > {code:java} > { > "ts":"23/11/29 17:53:44", > "level":"ERROR", > "msg":"Fail to know the executor 289 is alive or not", > "context":{ > "executor_id":"289" > }, > "exception":{ > "class":"org.apache.spark.SparkException", > "msg":"Exception thrown in awaitResult", > "stackTrace":"..." > }, > "source":"BlockManagerMasterEndpoint" > } {code} > This format will enable users to upload and directly query > driver/executor/master/worker log files using Spark SQL for more effective > problem-solving and analysis, such as tracking executor losses or identifying > faulty tasks: > {code:java} > spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs") > /* To get all the executor lost logs */ > SELECT * FROM logs WHERE contains(message, 'Lost executor'); > /* To get all the distributed logs about executor 289 */ > SELECT * FROM logs WHERE executor_id = 289; > /* To get all the errors on host 100.116.29.4 */ > SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR"; > {code} > > SPIP doc: > [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org