[ https://issues.apache.org/jira/browse/HIVE-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair resolved HIVE-4891. --------------------------------- Resolution: Cannot Reproduce This could not be reproduced with more recent hive. Marking it as cannot reproduce. Fengdong, Please let us know if you feel that there is anything missing in the steps followed by Harish, or if you are able to reproduce the issue with hive 0.12 branch or trunk. > Distinct includes duplicate records > ----------------------------------- > > Key: HIVE-4891 > URL: https://issues.apache.org/jira/browse/HIVE-4891 > Project: Hive > Issue Type: Bug > Components: File Formats, HiveServer2, Query Processor > Affects Versions: 0.10.0 > Reporter: Fengdong Yu > Priority: Blocker > Fix For: 0.12.0 > > > I have two partitions, one is sequence file, another is RCFile, but they are > the same data(only different file format). > I have the following SQL: > {code} > select distinct uid from test where (dt ='20130718' or dt ='20130718_1') and > cur_url like '%cq.aa.com%'; > {code} > dt ='20130718' is sequence file,(default input format, which specified when > create table) > > dt ='20130718_1' is RCFile. > {code} > ALTER TABLE test ADD IF NOT EXISTS PARTITION (dt='20130718_1') LOCATION > '/user/test/test-data' > ALTER TABLE test PARTITION(dt='20130718_1') SET FILEFORMAT RCFILE; > {code} > but there are duplicate recoreds in the result. > If two partitions with the same input format, then there are no duplicate > records. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira