[ https://issues.apache.org/jira/browse/HIVE-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15536870#comment-15536870 ]
Vihang Karajgaonkar commented on HIVE-14864: -------------------------------------------- Unfortunately, the documentation of ContentSummary.getLength() says ... returns the length :) This is the implementation of getContentSummary() in FileSystem.java which suggests getLength is actually returning the sum of lengths of all the files within that directory. {noformat} public ContentSummary getContentSummary(Path f) throws IOException { FileStatus status = getFileStatus(f); if (status.isFile()) { // f is a file return new ContentSummary(status.getLen(), 1, 0); } // f is a directory long[] summary = {0, 0, 1}; for(FileStatus s : listStatus(f)) { ContentSummary c = s.isDirectory() ? getContentSummary(s.getPath()) : new ContentSummary(s.getLen(), 1, 0); summary[0] += c.getLength(); summary[1] += c.getFileCount(); summary[2] += c.getDirectoryCount(); } return new ContentSummary(summary[0], summary[1], summary[2]); } {noformat} These are the revelant constructors for ContentSummary. {noformat} /** Constructor */ public ContentSummary(long length, long fileCount, long directoryCount) { this(length, fileCount, directoryCount, -1L, length, -1L); } public ContentSummary( long length, long fileCount, long directoryCount, long quota, long spaceConsumed, long spaceQuota) { this.length = length; this.fileCount = fileCount; this.directoryCount = directoryCount; this.quota = quota; this.spaceConsumed = spaceConsumed; this.spaceQuota = spaceQuota; } {noformat} > Distcp is not called from MoveTask when src is a directory > ---------------------------------------------------------- > > Key: HIVE-14864 > URL: https://issues.apache.org/jira/browse/HIVE-14864 > Project: Hive > Issue Type: Bug > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > > In FileUtils.java the following code does not get executed even when src > directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because > srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We > should use srcFS.getContentSummary(src).getLength() instead. > {noformat} > /* Run distcp if source file/dir is too big */ > if (srcFS.getUri().getScheme().equals("hdfs") && > srcFS.getFileStatus(src).getLen() > > conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) { > LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. > (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + > ")"); > LOG.info("Launch distributed copy (distcp) job."); > HiveConfUtil.updateJobCredentialProviders(conf); > copied = shims.runDistCp(src, dst, conf); > if (copied && deleteSource) { > srcFS.delete(src, true); > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)