[GitHub] [hudi] robertrichter opened a new issue, #6397: [SUPPORT]

GitBox Mon, 15 Aug 2022 03:30:58 -0700


robertrichter opened a new issue, #6397:
URL: https://github.com/apache/hudi/issues/6397


   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Examine the sql tab in the spark history web ui after the hudi write 
process has finshed.
   
   **Expected behavior**
   
   To analyse performance issues in complex spark jobs (multiple joins, 
aggregates, intermediate dataframes, etc.) the spark history server provides 
very useful information. Especially the sql tab displays a good overall 
overview of a complex transformation with usefull metrics like partitions, 
input/output records and so on.  The informations in the sql tab are displayed 
for all target file formats (parquet, orc, etc.) except hudi. It's only 
possible to show the physical plan in text format within the sql tab. It would 
be great to see the graphical dag with it's metrics. 
(https://spark.apache.org/docs/3.1.1/web-ui.html#sql-tab)
   
   **Environment Description**
   
   * Hudi version : 0.10.0
   
   * Spark version : 3.1.1
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.1.1
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no -> yarn on cloudera cdp 7.1.7
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] robertrichter opened a new issue, #6397: [SUPPORT]

Reply via email to