Manoj Durisheti created HIVE-17416:
--------------------------------------
Summary: Hive Distinct changes column value
Key: HIVE-17416
URL: https://issues.apache.org/jira/browse/HIVE-17416
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.2.1
Reporter: Manoj Durisheti
Hive 1.2.1000.2.6.1.0-129
Below query with distinct is expected to just dedupe the resultant data. But it
alters the data.
*Query without Distinct:*
select
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
from alpha.table_name
where
datestamp = 20170805
and
field_name =
'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a'
;
Result:
e_2300a e_2300
e_2300a e_2300
e_2300a e_2300
e_2300a e_2300
e_2300a e_2300
*Query with Distinct:*
select distinct
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
from alpha.table_name
where
datestamp = 20170805
and
field_name =
'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a'
;
Result:
e_2300 e_2300
*Expected Result with Distinct is: *
e_2300a e_2300
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)