Chao Sun created HIVE-17257:
-------------------------------

             Summary: Hive should merge empty files
                 Key: HIVE-17257
                 URL: https://issues.apache.org/jira/browse/HIVE-17257
             Project: Hive
          Issue Type: Bug
            Reporter: Chao Sun
            Assignee: Chao Sun


Currently if merging file option is turned on and the dest dir contains large 
number of empty files, Hive will not trigger merge task:
{code}
  private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) {
    AverageSize averageSize = getAverageSize(inpFs, dirPath);
    if (averageSize.getTotalSize() <= 0) {
      return -1;
    }

    if (averageSize.getNumFiles() <= 1) {
      return -1;
    }

    if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) {
      return averageSize.getTotalSize();
    }
    return -1;
  }
{code}

This logic doesn't seem right as the it seems better to combine these empty 
files into one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to