prashantwason commented on issue #1289: [HUDI-92] Provide reasonable names for 
Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-579944539
 
 
   A DAG stage name and description can be set using the 
JavaSparkContext.setJobDescription(...) method. The same name/description is 
used for all stages which use the same thread until the name/description is 
updated (another call to setJobDescription) or deleted (clearJobGroup).
   
   In this PR, I am using the ClassName as the stage name and a textual 
description derived from the method logic. HUDI classes have very descriptive 
names so this works well.
   
   There are two ways this may be done:
   1. Manually (this PR) by adding code set the name/description before any DAG 
stages are started. 
   2. Using Java AOP to automatically find code locations matching some pattern 
and augment them with the call to setJobDescription. 
   
   To use AOP approach, we can create a separate AspectJ file containing the 
Pointcuts (code locations to augment) and Advices (code to insert). There is a 
separate AspectJ compiler which at runtime can change the class bytecode to add 
the Advices. 
   
   Pros of AOP approach:
   1. Does not require any change in current code
   2. Also covers future code automatically
   3. Easy to undo (just don't run the AspectJ compiler as part of build)
   4. Can be extended to more use-cases like automating Metrics.
   
   Cons of AOP approach:
   1. Require AspectJ and its compiler to be integrated with the HUDI build 
chain
   2. The Advice cannot be dynamic. Hence we cannot provide descriptions to the 
DAG stages (we can still use the class name as the DAG stage name). 
   
   Since the code has a manageable number of places where DAG is created, I 
prefer the simpler manual approach. It also ends up documenting the code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to