[
https://issues.apache.org/jira/browse/HIVE-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhinav Chawade updated HIVE-3935:
----------------------------------
Summary: New line character in output when sequence file is used for
storage and table is empty (was: Extra new line character in output when
sequence file is used for storage of a table)
> New line character in output when sequence file is used for storage and table
> is empty
> --------------------------------------------------------------------------------------
>
> Key: HIVE-3935
> URL: https://issues.apache.org/jira/browse/HIVE-3935
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.9.0, 0.10.0
> Environment: Centos 6.3
> Reporter: Abhinav Chawade
>
> When a "select distinct" command is issued on empty table which uses sequence
> file for storage, a new extra line (0x0a) is present in the result set even
> when table has no data. This output is not consistent with result of same
> command Hive 0.7.1 and can cause workflows to fail due to wrong record count.
> Execution on Hive 0.9 and 0.10
> hive> create table hoge2(col1 string,col2 string) partitioned by (p_part
> string) stored as sequencefile;
> hive> describe hoge2;
> OK
> col1 string
> col2 string
> p_part string
> Time taken: 0.24 seconds
> hive> select distinct p_part from hoge2;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> Starting Job = job_201301230112_0001, Tracking URL =
> http://testcluster2-1:50030/jobdetails.jsp?jobid=job_201301230112_0001
> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
> -Dmapred.job.tracker=maprfs:/// -kill job_201301230112_0001
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers:
> 1
> 2013-01-23 02:50:16,843 Stage-1 map = 0%, reduce = 0%
> 2013-01-23 02:50:26,897 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:27,905 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:28,911 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:29,919 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:30,925 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:31,933 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:32,939 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:33,945 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.8
> sec
> MapReduce Total cumulative CPU time: 1 seconds 800 msec
> Ended Job = job_201301230112_0001
> MapReduce Jobs Launched:
> Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.8 sec MAPRFS Read: 327 MAPRFS
> Write: 71 SUCCESS
> Total MapReduce CPU Time Spent: 1 seconds 800 msec
> OK
> Time taken: 21.94 seconds
> Result on Hive 0.7.1
> hive> select count(distinct p_part) from hoge3;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> Starting Job = job_201210261659_0019, Tracking URL =
> http://testcluster1-1:50030/jobdetails.jsp?jobid=job_201210261659_0019
> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
> -Dmapred.job.tracker=maprfs:/// -kill job_201210261659_0019
> 2013-01-23 21:42:01,787 Stage-1 map = 0%, reduce = 0%
> 2013-01-23 21:42:07,815 Stage-1 map = 100%, reduce = 0%
> 2013-01-23 21:42:12,835 Stage-1 map = 100%, reduce = 100%
> Ended Job = job_201210261659_0019
> OK
> 0
> Time taken: 16.637 seconds
> Underlying Hadoop version for Hive 0.9 is Hadoop 1.0.3 and for Hive 0.7 it is
> 0.20.203
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira