Xuefu Zhang created HIVE-8840: --------------------------------- Summary: Print prettier Spark work graph after HIVE-8793 [Spark Branch] Key: HIVE-8840 URL: https://issues.apache.org/jira/browse/HIVE-8840 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang
Because of HIVE-8793, the work graph for Spark is possibly modified by SplitSparkWorkResolver. Original: {code} Spark Edges: Reducer 2 <- Map 1 (SORT, 1) Reducer 3 <- Reducer 2 (GROUP, 1) Reducer 4 <- Reducer 2 (GROUP, 1) {code} New graph {code} Spark Edges: Reducer 3 <- Reducer 5 (GROUP, 1) Reducer 4 <- Reducer 6 (GROUP, 1) Reducer 5 <- Map 1 (SORT, 1) Reducer 6 <- Map 1 (SORT, 1) {code} where Reducer2 was splitted into Reducer5 and Reducer6. Two types of ordering can be considered: 1. Topological order {code} Spark Edges: Reducer 5 <- Map 1 (SORT, 1) Reducer 6 <- Map 1 (SORT, 1) Reducer 3 <- Reducer 5 (GROUP, 1) Reducer 4 <- Reducer 6 (GROUP, 1) {code} 2. DFS {code} Spark Edges: Reducer 5 <- Map 1 (SORT, 1) Reducer 3 <- Reducer 5 (GROUP, 1) Reducer 6 <- Map 1 (SORT, 1) Reducer 4 <- Reducer 6 (GROUP, 1) {code} Both seems better, though topolical seems more suitable for a graph. Please feel free to create a patch on trunk if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)