Quanlong Huang created HIVE-22711: ------------------------------------- Summary: yearValue of UDF mask() should not start from 1900 Key: HIVE-22711 URL: https://issues.apache.org/jira/browse/HIVE-22711 Project: Hive Issue Type: Bug Reporter: Quanlong Huang
Here's the description of the UDF mask(): {code:java} masks the given value Examples: mask(ccn) mask(ccn, 'X', 'x', '0') mask(ccn, 'x', 'x', 'x') Arguments: mask(value, upperChar, lowerChar, digitChar, otherChar, numberChar, dayValue, monthValue, yearValue) value - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR, DATE upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1' dayValue - value to replace day field in a date with. Specify -1 to retain original value. Valid values: 1-31. Default value: 1 monthValue - value to replace month field in a date with. Specify -1 to retain original value. Valid values: 0-11. Default value: 0 yearValue - value to replace year field in a date with. Specify -1 to retain original value. Default value: 0 {code} Although it says 'yearValue' is the value to replace year field in a DATE with, it actually counts start at 1900. E.g. yearValue = 0 means masking the year field to 1900, yearValue=2000 means masking the year field to 3900, yearValue=-2 means masking the year field to 1988. Here are some query examples: {code:sql} beeline> select mask(cast('2019-02-03' as date), -1, -1, -1, -1, -1, -1, -1, 0); 1900-02-03 beeline> select mask(cast('2019-02-03' as date), -1, -1, -1, -1, -1, -1, -1, 2000); 3900-02-03 beeline> select mask(cast('2019-02-03' as date), -1, -1, -1, -1, -1, -1, -1, -2); 1898-02-03 beeline> select mask(cast('2019-02-03' as date), -1, -1, -1, -1, -1, -1, -1, -100); 1800-02-03 {code} The drawback of this behavior is that we can't mask year field to be 1899, since -1 already means retaining the original value. It'd be better to change the behavior to be intuitive that simply masking year filed to yearValue. And only accept yearValue from 0 to 9999. Still use -1 to retain original value. -- This message was sent by Atlassian Jira (v8.3.4#803005)