[ https://issues.apache.org/jira/browse/HIVE-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aditya Kishore updated HIVE-3935: --------------------------------- Attachment: HIVE-3935-0.9.patch Attaching a proposed patch where, if the partition descriptor is empty, use the table's InputFormat Class as the InputFormat Class for the split. > New line character in output when sequence file is used for storage and table > is empty > -------------------------------------------------------------------------------------- > > Key: HIVE-3935 > URL: https://issues.apache.org/jira/browse/HIVE-3935 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.9.0, 0.10.0 > Environment: Centos 6.3 > Reporter: Doodle gum > Attachments: HIVE-3935-0.9.patch > > > When a "select distinct" command is issued on empty table which uses sequence > file for storage, a new extra line (0x0a) is present in the result set even > when table has no data. This output is not consistent with result of same > command Hive 0.7.1 and can cause workflows to fail due to wrong record count. > Execution on Hive 0.9 and 0.10 > hive> create table hoge2(col1 string,col2 string) partitioned by (p_part > string) stored as sequencefile; > hive> describe hoge2; > OK > col1 string > col2 string > p_part string > Time taken: 0.24 seconds > hive> select distinct p_part from hoge2; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks not specified. Estimated from input data size: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > Starting Job = job_201301230112_0001, Tracking URL = > http://testcluster2-1:50030/jobdetails.jsp?jobid=job_201301230112_0001 > Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job > -Dmapred.job.tracker=maprfs:/// -kill job_201301230112_0001 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2013-01-23 02:50:16,843 Stage-1 map = 0%, reduce = 0% > 2013-01-23 02:50:26,897 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 > sec > 2013-01-23 02:50:27,905 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 > sec > 2013-01-23 02:50:28,911 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 > sec > 2013-01-23 02:50:29,919 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 > sec > 2013-01-23 02:50:30,925 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 > sec > 2013-01-23 02:50:31,933 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 > sec > 2013-01-23 02:50:32,939 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 > sec > 2013-01-23 02:50:33,945 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.8 > sec > MapReduce Total cumulative CPU time: 1 seconds 800 msec > Ended Job = job_201301230112_0001 > MapReduce Jobs Launched: > Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.8 sec MAPRFS Read: 327 MAPRFS > Write: 71 SUCCESS > Total MapReduce CPU Time Spent: 1 seconds 800 msec > OK > Time taken: 21.94 seconds > Result on Hive 0.7.1 > hive> select count(distinct p_part) from hoge3; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > Starting Job = job_201210261659_0019, Tracking URL = > http://testcluster1-1:50030/jobdetails.jsp?jobid=job_201210261659_0019 > Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job > -Dmapred.job.tracker=maprfs:/// -kill job_201210261659_0019 > 2013-01-23 21:42:01,787 Stage-1 map = 0%, reduce = 0% > 2013-01-23 21:42:07,815 Stage-1 map = 100%, reduce = 0% > 2013-01-23 21:42:12,835 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201210261659_0019 > OK > 0 > Time taken: 16.637 seconds > Underlying Hadoop version for Hive 0.9 is Hadoop 1.0.3 and for Hive 0.7 it is > 0.20.203 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira