Mithun Radhakrishnan created HIVE-17802: -------------------------------------------
Summary: Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer Key: HIVE-17802 URL: https://issues.apache.org/jira/browse/HIVE-17802 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 2.2.0, 3.0.0 Reporter: Mithun Radhakrishnan Assignee: Chris Drome For large Pig/HCat queries that produce a large number of partitions/directories/files, we have seen cases where the HDFS NameNode groaned under the weight of {{FileSystem.setOwner()}} calls, originating from the commit-step. This was the result of the following code in FileOutputCommitterContainer: {code:java} private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission permission, List<AclEntry> acls, String group, boolean recursive) throws IOException { ... if (recursive) { for (FileStatus fileStatus : fs.listStatus(dir)) { if (fileStatus.isDir()) { applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, group, true); } else { fs.setPermission(fileStatus.getPath(), permission); chown(fs, fileStatus.getPath(), group); } } } } private void chown(FileSystem fs, Path file, String group) throws IOException { try { fs.setOwner(file, null, group); } catch (AccessControlException ignore) { // Some users have wrong table group, ignore it. LOG.warn("Failed to change group of partition directories/files: " + file, ignore); } } {code} One call per file/directory is far too many. We have a patch that reduces the namenode pressure. -- This message was sent by Atlassian JIRA (v6.4.14#64029)