[ https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=731641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-731641 ]
ASF GitHub Bot logged work on HIVE-25448: ----------------------------------------- Author: ASF GitHub Bot Created on: 23/Feb/22 15:28 Start Date: 23/Feb/22 15:28 Worklog Time Spent: 10m Work Description: dengzhhu653 edited a comment on pull request #2585: URL: https://github.com/apache/hive/pull/2585#issuecomment-1048902780 I found something interesting, when I explain `select col1, count(distinct col2) from partition_distinct_skew group by col1;` on master branch, the output is following: ``` Vertices: Map 1 Map Operator Tree: TableScan alias: partition_distinct_skew Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: col1 (type: string), col2 (type: string) outputColumnNames: col1, col2 Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: col1 (type: string), col2 (type: string) minReductionHashAggr: 0.4 mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string) null sort order: zz sort order: ++ Map-reduce partition columns: rand() (type: double) Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE ``` The partition column is **rand()** for this case. It's seems we have done something to improve the skew case, though I'm not able to find where the cause locates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 731641) Time Spent: 2h 40m (was: 2.5h) > Invalid partition columns when skew with distinct > ------------------------------------------------- > > Key: HIVE-25448 > URL: https://issues.apache.org/jira/browse/HIVE-25448 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer > Reporter: Zhihua Deng > Assignee: Zhihua Deng > Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > When hive.groupby.skewindata is enabled, we spray by the grouping key and > distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.20.1#820001)