[jira] [Updated] (HIVE-3935) New line character in output when sequence file is used for storage and table is empty

Aditya Kishore (JIRA) Thu, 31 Jan 2013 16:45:15 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aditya Kishore updated HIVE-3935:
---------------------------------

    Attachment: HIVE-3935-0.9.patch

Attaching a proposed patch where, if the partition descriptor is empty, use the 
table's InputFormat Class as the InputFormat Class for the split.
                
> New line character in output when sequence file is used for storage and table 
> is empty
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-3935
>                 URL: https://issues.apache.org/jira/browse/HIVE-3935
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.9.0, 0.10.0
>         Environment: Centos 6.3
>            Reporter: Doodle gum
>         Attachments: HIVE-3935-0.9.patch
>
>
> When a "select distinct" command is issued on empty table which uses sequence 
> file for storage, a new extra line (0x0a) is present in the result set even 
> when table has no data. This output is not consistent with result of same 
> command Hive 0.7.1 and can cause workflows to fail due to wrong record count.
> Execution on Hive 0.9 and 0.10
> hive> create table hoge2(col1 string,col2 string) partitioned by (p_part
> string) stored as sequencefile;
> hive> describe hoge2;
> OK
> col1    string
> col2    string
> p_part  string
> Time taken: 0.24 seconds
> hive> select distinct p_part from hoge2;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201301230112_0001, Tracking URL =
> http://testcluster2-1:50030/jobdetails.jsp?jobid=job_201301230112_0001
> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job 
> -Dmapred.job.tracker=maprfs:/// -kill job_201301230112_0001
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2013-01-23 02:50:16,843 Stage-1 map = 0%,  reduce = 0%
> 2013-01-23 02:50:26,897 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:27,905 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:28,911 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:29,919 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:30,925 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:31,933 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:32,939 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:33,945 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.8
> sec
> MapReduce Total cumulative CPU time: 1 seconds 800 msec
> Ended Job = job_201301230112_0001
> MapReduce Jobs Launched:
> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 1.8 sec   MAPRFS Read: 327 MAPRFS
> Write: 71 SUCCESS
> Total MapReduce CPU Time Spent: 1 seconds 800 msec
> OK
> Time taken: 21.94 seconds
> Result on Hive 0.7.1
> hive> select count(distinct p_part) from hoge3;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201210261659_0019, Tracking URL =
> http://testcluster1-1:50030/jobdetails.jsp?jobid=job_201210261659_0019
> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job 
> -Dmapred.job.tracker=maprfs:/// -kill job_201210261659_0019
> 2013-01-23 21:42:01,787 Stage-1 map = 0%,  reduce = 0%
> 2013-01-23 21:42:07,815 Stage-1 map = 100%,  reduce = 0%
> 2013-01-23 21:42:12,835 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201210261659_0019
> OK
> 0
> Time taken: 16.637 seconds
> Underlying Hadoop version for Hive 0.9 is Hadoop 1.0.3 and for Hive 0.7 it is 
> 0.20.203

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3935) New line character in output when sequence file is used for storage and table is empty

Reply via email to