[ https://issues.apache.org/jira/browse/HIVE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977699#comment-13977699 ]
Prasanth J commented on HIVE-6958: ---------------------------------- The reason for this failure, is related to the behaviour of UNION. INSERT queries with UNION ALL will create sub-directories under table/partition directory. For example: {code} insert overwrite table outputTbl1 SELECT * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, count(1) as values from inputTbl1 group by key ) a; {code} for the above query, the warehouse/outputTbl1 directory will have 2 sub-directories corresponding to each SELECT queries like warehouse/outputTbl1/15/, warehouse/outputTbl1/16/. Here 15 and 16 are operator identifiers https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcFactory.java#L223 This special case (having directory under table) happens only for union insert. All other cases will have files underneath the table directory for unpartitioned tables. But the metastore utils for updating the fast stats are not aware of this directory structure (it expects files underneath table directory). The Warehouse.getFileStatusesForUnpartitionedTable() recurses only one level under table directory if it is unpartitioned table https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java#L540. For union insert, if only 1 level is recursed you will get only the folder sizes and not the actual file sizes. Folder sizes are different for different OSes. It looks like original diff was generated using Mac OS X and the new diff was generated using Centos. Both the diffs are *wrong* as they return folder size as opposed to file sizes. 1) One way to fix this is to change the recurse level to a value greater than 1. 2) Another way would be to fix UNION to create files instead of directories. To resolve filename conflict it can append the operator id to filename. [~ashutoshc]/[~jdere] do you guys have any thoughts about this? > update union_remove_*, other tests for hadoop-2 > ----------------------------------------------- > > Key: HIVE-6958 > URL: https://issues.apache.org/jira/browse/HIVE-6958 > Project: Hive > Issue Type: Bug > Components: Tests > Reporter: Jason Dere > Assignee: Jason Dere > Attachments: HIVE-6958.1.patch > > > Update q.out files to match totalSize for Linux platform. -- This message was sent by Atlassian JIRA (v6.2#6252)