[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575262#comment-13575262 ]
Ashutosh Chauhan commented on HIVE-3403: ---------------------------------------- I concur with Mark's above comments. I don't agree that erring on the side of more configs is a good idea. e.g., the if-else ladder after this patch will look like following: {code} if user specifies map join hint && hive.optimize.bucketmapjoin is true, than a map-join may be converted to BMJ. if user specifies map join hint && hive.optimize.bucketmapjoin is true && hive.optimize.sortedmerge is true, than a map-join may be converted to SMBJ. if user doesn't specify map join hint && hive.optimize.bucketmapjoin && hive.optimize.sortedmerge is true && hive.optimize.auto.convert.sortmerge.join is true, than a regular may be converted may be converted to SMBJ. ... and than there is hive.auto.covert.join, hive.auto.convert.join.noconditionaltask and many others... {code} instead of simplifying the life of user which I believe is the original goal of jira, we are making his life complicated by introducing even more config which he needs to understand. Btw, I am not 100% even if I got the above settings right. Further, the fact that default value for every optimization is false means user ends up in worst of both worlds where none of the optimization kicks in and query runs slow. To improve from state of art, my suggestions are following: a) Lets get rid of hints altogether, i.e., we never construct logical plan with a MapJoin/SMBJoin/BJoin operator but always with regular join operator. And than in optimization phase we convert regular join to most optimal join implementation depending on sorting/bucketing properties and sizes of tables. This will simplify the codebase since we always see regular join in our operator tree in logical phase, thus eliminating need of handling MapJoin operator at logical level. Also, this simplifies the interaction of hints and configs like user provided hint but config is off.. kind of scenarios... b) We should compress all these different configs to lower number of configs. c) We should set the default value true for all these configs. Namit, do you think its possible to do this or do you see any problem in this plan? > user should not specify mapjoin to perform sort-merge bucketed join > ------------------------------------------------------------------- > > Key: HIVE-3403 > URL: https://issues.apache.org/jira/browse/HIVE-3403 > Project: Hive > Issue Type: Bug > Reporter: Namit Jain > Assignee: Namit Jain > Attachments: hive.3403.10.patch, hive.3403.11.patch, > hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, > hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, > hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, > hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, > hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, > hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, > hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch > > > Currently, in order to perform a sort merge bucketed join, the user needs > to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the > mapjoin hint. > The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira