Quanlong Huang created HIVE-22711:
-------------------------------------

             Summary: yearValue of UDF mask() should not start from 1900
                 Key: HIVE-22711
                 URL: https://issues.apache.org/jira/browse/HIVE-22711
             Project: Hive
          Issue Type: Bug
            Reporter: Quanlong Huang


Here's the description of the UDF mask():
{code:java}
masks the given value  
Examples:                  
   mask(ccn)               
   mask(ccn, 'X', 'x', '0')
   mask(ccn, 'x', 'x', 'x')
 Arguments:                
   mask(value, upperChar, lowerChar, digitChar, otherChar, numberChar, 
dayValue, monthValue, yearValue) 
     value      - value to mask. Supported types: TINYINT, SMALLINT, INT, 
BIGINT, STRING, VARCHAR, CHAR, DATE 
     upperChar  - character to replace upper-case characters with. Specify -1 
to retain original character. Default value: 'X' 
     lowerChar  - character to replace lower-case characters with. Specify -1 
to retain original character. Default value: 'x' 
     digitChar  - character to replace digit characters with. Specify -1 to 
retain original character. Default value: 'n' 
     otherChar  - character to replace all other characters with. Specify -1 to 
retain original character. Default value: -1 
     numberChar - character to replace digits in a number with. Valid values: 
0-9. Default value: '1' 
     dayValue   - value to replace day field in a date with.  Specify -1 to 
retain original value. Valid values: 1-31. Default value: 1 
     monthValue - value to replace month field in a date with. Specify -1 to 
retain original value. Valid values: 0-11. Default value: 0 
     yearValue  - value to replace year field in a date with. Specify -1 to 
retain original value. Default value: 0 {code}
Although it says 'yearValue' is the value to replace year field in a DATE with, 
it actually counts start at 1900. E.g. yearValue = 0 means masking the year 
field to 1900, yearValue=2000 means masking the year field to 3900, 
yearValue=-2 means masking the year field to 1988.

Here are some query examples:
{code:sql}
beeline> select mask(cast('2019-02-03' as date), -1, -1, -1, -1, -1, -1, -1, 0);
1900-02-03
beeline> select mask(cast('2019-02-03' as date), -1, -1, -1, -1, -1, -1, -1, 
2000);
3900-02-03
beeline> select mask(cast('2019-02-03' as date), -1, -1, -1, -1, -1, -1, -1, 
-2);
1898-02-03
beeline> select mask(cast('2019-02-03' as date), -1, -1, -1, -1, -1, -1, -1, 
-100);
1800-02-03
{code}
The drawback of this behavior is that we can't mask year field to be 1899, 
since -1 already means retaining the original value.

It'd be better to change the behavior to be intuitive that simply masking year 
filed to yearValue. And only accept yearValue from 0 to 9999. Still use -1 to 
retain original value. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to