[ https://issues.apache.org/jira/browse/HIVE-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773683#comment-13773683 ]
Harish Butani commented on HIVE-4891: ------------------------------------- I am not able to reproduce. Tried the scenario below, get correct results. Can you provide us details on how to reproduce. {noformat} CREATE TABLE part( p_partkey INT, p_name STRING, p_mfgr STRING, p_brand STRING, p_type STRING, p_size INT, p_container STRING, p_retailprice DOUBLE, p_comment STRING ) ; LOAD DATA LOCAL INPATH '.../hive/data/files/part_tiny.txt' overwrite into table part; CREATE TABLE part_partitioned( p_partkey INT, p_name STRING, p_mfgr STRING, p_brand STRING, p_type STRING, p_size INT, p_container STRING, p_retailprice DOUBLE, p_comment STRING ) PARTITIONED BY (ds string); alter table part_partitioned add partition(ds='2010'); alter table part_partitioned partition(ds='2010') set fileformat sequencefile; -- have to do this, otherwise sets partition's format to textinputformat alter table part_partitioned set fileformat sequencefile; INSERT OVERWRITE TABLE part_partitioned PARTITION (ds='2010') SELECT * from part; alter table part_partitioned add partition(ds='2011'); alter table part_partitioned partition(ds='2011') set fileformat rcfile; -- have to do this, otherwise sets partition's format to textinputformat alter table part_partitioned set fileformat rcfile; INSERT OVERWRITE TABLE part_partitioned PARTITION (ds='2011') SELECT * from part; -- tried these, give the right results select distinct p_mfgr from part_partitioned where (ds='2010' or ds='2011') and p_size < 10; select distinct p_name from part_partitioned where (ds='2010' or ds='2011'); {noformat} > Distinct includes duplicate records > ----------------------------------- > > Key: HIVE-4891 > URL: https://issues.apache.org/jira/browse/HIVE-4891 > Project: Hive > Issue Type: Bug > Components: File Formats, HiveServer2, Query Processor > Affects Versions: 0.10.0 > Reporter: Fengdong Yu > Priority: Blocker > Fix For: 0.12.0 > > > I have two partitions, one is sequence file, another is RCFile, but they are > the same data(only different file format). > I have the following SQL: > {code} > select distinct uid from test where (dt ='20130718' or dt ='20130718_1') and > cur_url like '%cq.aa.com%'; > {code} > dt ='20130718' is sequence file,(default input format, which specified when > create table) > > dt ='20130718_1' is RCFile. > {code} > ALTER TABLE test ADD IF NOT EXISTS PARTITION (dt='20130718_1') LOCATION > '/user/test/test-data' > ALTER TABLE test PARTITION(dt='20130718_1') SET FILEFORMAT RCFILE; > {code} > but there are duplicate recoreds in the result. > If two partitions with the same input format, then there are no duplicate > records. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira