[ https://issues.apache.org/jira/browse/HIVE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107024#comment-13107024 ]
Ning Zhang commented on HIVE-2453: ---------------------------------- Kevin, I guess we cross posted on the review board and here. As you have noticed that there's really no much difference in the resulting operator tree beween sortby and orderby except the latter requires 1 reducer. However they are different from the syntax point of view. So 2 queries may have different syntaxes but their plan may be the same, or it could also be true that 2 queries's syntax are very similar but there execution plans are different (e.g., CommonJoin can be converted to MapJoin at execution time). So I think for this task we should focus on tag the syntax tree rather than the physical execution plan tree. We probably should examine the operator tree and tag it at one pre-exec hook. BTW, we may also need to capture "distribute by", which just distribute the key-values pairs based on keys without sorting at the reducer. This is also one indicator for the analyses that the job need a reduce phase. > Need a way to categorize queries in hooks for improved logging > -------------------------------------------------------------- > > Key: HIVE-2453 > URL: https://issues.apache.org/jira/browse/HIVE-2453 > Project: Hive > Issue Type: Improvement > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > Attachments: HIVE-2453.1.patch.txt > > > We need a way to categorize queries, such as whether or not the include a > join clause, a group by clause, etc., in the hooks. This will allow for > better performance logging. > Currently the only way I can find is to go through the operators in the > tasks, but which operators are used for the different types of queries may > change over time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira