[ https://issues.apache.org/jira/browse/HIVE-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136178#comment-14136178 ]
Sergey Shelukhin commented on HIVE-8111: ---------------------------------------- [~ashutoshc] [~jpullokkaran] fyi. I've tried doing 1 and 2 and encountered problems, for now exploring 5 and 3... tell me if you have any input. Example of the biggest problem where decimal becomes null due to incorrect type is: SELECT key * value FROM DECIMAL_UDF, "expressions: (key * value) (type: decimal(31,10))" becomes "expressions: (key * CAST( value AS decimal(31,10))) (type: decimal(38,20))" and 1524157875171467887.5019052100 becomes NULL because there are more than 18 digits in decimal part. Incorrect types can also result in different types which I assume can make insert/create queries have undesirable results; not sure about other possible effects. > CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO > ---------------------------------------------------------------------------- > > Key: HIVE-8111 > URL: https://issues.apache.org/jira/browse/HIVE-8111 > Project: Hive > Issue Type: Sub-task > Components: CBO > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > > Original test failure: looks like column type changes to different decimals > in most cases. In one case it causes the integer part to be too big to fit, > so the result becomes null it seems. > What happens is that CBO adds casts to arithmetic expressions to make them > type compatible; these casts become part of new AST, and then Hive adds casts > on top of these casts. This (the first part) also causes lots of out file > changes. It's not clear how to best fix it so far, in addition to incorrect > decimal width and sometimes nulls when width is larger than allowed in Hive. > Option one - don't add those for numeric ops - cannot be done if numeric op > is a part of compare, for which CBO needs correct types. > Option two - unwrap casts when determining type in Hive - hard or impossible > to tell apart CBO-added casts and user casts. > Option three - don't change types in Hive if CBO has run - seems hacky and > hard to ensure it's applied everywhere. > Option four - map all expressions precisely between two trees and remove > casts again after optimization, will be pretty difficult. > Option five - somehow mark those casts. Not sure about how yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)