[ 
https://issues.apache.org/jira/browse/HIVE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977699#comment-13977699
 ] 

Prasanth J commented on HIVE-6958:
----------------------------------

The reason for this failure, is related to the behaviour of UNION. INSERT 
queries with UNION ALL will create sub-directories under table/partition 
directory. For example:
{code}
insert overwrite table outputTbl1
SELECT *
FROM (
  SELECT key, count(1) as values from inputTbl1 group by key
  UNION ALL
  SELECT key, count(1) as values from inputTbl1 group by key
) a;
{code}

for the above query, the warehouse/outputTbl1 directory will have 2 
sub-directories corresponding to each SELECT queries like
warehouse/outputTbl1/15/, warehouse/outputTbl1/16/. Here 15 and 16 are operator 
identifiers 
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcFactory.java#L223

This special case (having directory under table) happens only for union insert. 
All other cases will have files underneath the table directory for 
unpartitioned tables. But the metastore utils for updating the fast stats are 
not aware of this directory structure (it expects files underneath table 
directory).  The Warehouse.getFileStatusesForUnpartitionedTable() recurses only 
one level under table directory if it is unpartitioned table 
https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java#L540.
 For union insert, if only 1 level is recursed you will get only the folder 
sizes and not the actual file sizes. Folder sizes are different for different 
OSes. It looks like original diff was generated using Mac OS X and the new diff 
was generated using Centos. Both the diffs are *wrong* as they return folder 
size as opposed to file sizes. 

1) One way to fix this is to change the recurse level to a value greater than 
1. 
2) Another way would be to fix UNION to create files instead of directories. To 
resolve filename conflict it can append the operator id to filename.

[~ashutoshc]/[~jdere] do you guys have any thoughts about this?

> update union_remove_*, other tests for hadoop-2
> -----------------------------------------------
>
>                 Key: HIVE-6958
>                 URL: https://issues.apache.org/jira/browse/HIVE-6958
>             Project: Hive
>          Issue Type: Bug
>          Components: Tests
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-6958.1.patch
>
>
> Update q.out files to match totalSize for Linux platform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to