> On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128 > > <https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2128> > > > > I had quite a discussion on this with Rohini on HIVE-2936 on how to do > > these fs ops in a performant and robust way. Feel free to follow that.
This piece of code is rewritten to: {code} try { //if destf is an existing directory, rename just move the src file/dir under destf and //destf itself will be the parent dir after the rename if (fs.getFileStatus(destf).isDir()) { destfp = destf; LOG.debug("Renaming:" + destf.toString() + " is an existing directory; " + srcf.toString() + " will move under it."); } } catch (FileNotFoundException ignore) { } {code} So it will be optimized to save 1 call if the destf exists > On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2151 > > <https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2151> > > > > Why not use api instead of FsShell? Does filesystem doesn't offer an > > api for recursively doing chmod and chgrp? FileSystem offers the APIs for changing permission and owner (group), but not recursively. An alternate implementation I thought of initially is like following: {code} try { FileStatus destParentfstatus = fs.getFileStatus(destfp); String destfpGroup = destParentfstatus.getGroup(); FsPermission destfpPermission = destParentfstatus.getPermission(); FileStatus[] fstatusList = fs.listStatus(destf); for (FileStatus fstatus : fstatusList) { fs.setOwner(fstatus.getPath(), null, destfpGroup); fs.setPermission(fstatus.getPath(), destfpPermission); } } catch (FileNotFoundException fnfe) { //it should not be thrown } catch (IOException e) { throw new HiveException("Unable to set permissions of " + destf + " to those of its parent " + destf.getParent(), e); } {code} I chose to use the FsShell since the code with commands (-chgrp, -chmod) looked to me more intuitive. As for which way is more performant, I am not sure but I was also not able to tell any gain in the alternate approach. Any suggestion? > On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2265 > > <https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2265> > > > > I think following way to write this is more robust: > > if(inheritPerms) { > > fs.mkdirs(destfp, destfp.getParent().getPerms); > > } else { > > fs.mkdirs(destfp); > > } It was following the Rohini's comments on HIVE-2936: {code} It is not safe to use fs.mkdirs(path, permission) because the dfs umask is applied on that permission in DFSClient which is not desired. We have been bitten by wrong permission issues because of using that API. It is always safer to do mkdirs() and then do a setPermission() if you are dealing with HDFS. {code} So I did: {code} boolean success = fs.mkdirs(destfp); if (inheritPerms && success) { fs.setPermission(destfp, fs.getFileStatus(destfp.getParent()).getPermission()); } {code} Not sure, since these APIs documentation is quite limited. - Chaoyu ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review22698 ----------------------------------------------------------- On July 2, 2013, 4:39 p.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/12050/ > ----------------------------------------------------------- > > (Updated July 2, 2013, 4:39 p.m.) > > > Review request for hive. > > > Bugs: HIVE- and HIVE-3756 > https://issues.apache.org/jira/browse/HIVE- > https://issues.apache.org/jira/browse/HIVE-3756 > > > Repository: hive-git > > > Description > ------- > > Problems: > 1. When doing load data or insert overwrite to a table, the data files under > database/table directory could not inherit their parent's permissions (i.e. > group) as described in HIVE-3756. > 2. Beside the group issue, the read/write permission mode is also not > inherited > 3. Same problem affects the partition files (see HIVE-3094) > > Cause: > The task results (from load data or insert overwrite) are initially stored in > scratchdir and then loaded under warehouse table directory. FileSystem.rename > is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files > but it preserves their permissions (including group and mode) which are > determined by scratchdir permission or umask. If the scratchdir has different > permissions from those of warehouse table directories, the problem occurs. > > Solution: > After the FileSystem.rename is called, changing all renamed (moved) > files/dirs to their destination parents' permissions if needed (say if > hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method > renameFile doing both rename and permission. It replaces the > FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename > used to move files/dirs under same scratchdir in the middle of task > processing. It looks to me not necessary since they are temp files and also > probably access protected by top scratchdir mode 700 (HIVE-4487). > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 > > Diff: https://reviews.apache.org/r/12050/diff/ > > > Testing > ------- > > The following cases tested that all created subdirs/files inherit their > parents' permission mode and group in : 1). create database; 2). create > table; 3). load data; 4) insert overwrite; 5) partitions. > {code} > hive> dfs -ls -d /user/tester1/hive; > > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 > /user/tester1/hive > > hive> create database tester1 COMMENT 'Database for user tester1' LOCATION > '/user/tester1/hive/tester1.db'; > hive> dfs -ls -R /user/tester1/hive; > > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 > /user/tester1/hive/tester1.db > > hive> use tester1; > hive> create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' STORED AS TEXTFILE; > hive> dfs -ls -R /user/tester1/hive; > > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 > /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 > /user/tester1/hive/tester1.db/tst1 > > hive> load data local inpath '/home/tester1/tst1.input' into table tst1; > > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 > /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1/tst1.input > > hive> create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE; > hive> dfs -ls -R /user/tester1/hive; > > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 > /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1/tst1.input > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 > /user/tester1/hive/tester1.db/tst2 > > hive> insert overwrite table tst2 select * from tst1; > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 > /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1/tst1.input > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 > /user/tester1/hive/tester1.db/tst2 > -rw-rw---- 3 tester1 testgroup123 291 2013-06-22 13:25 > /user/tester1/hive/tester1.db/tst2/000000_0 > > hive> create table tester1.tst3(col2 string) PARTITIONED BY (col1 int) ROW > FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; > hive> dfs -ls -R /user/tester1/hive; > > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:27 > /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1/tst1.input > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 > /user/tester1/hive/tester1.db/tst2 > -rw-rw---- 3 tester1 testgroup123 291 2013-06-22 13:25 > /user/tester1/hive/tester1.db/tst2/000000_0 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:27 > /user/tester1/hive/tester1.db/tst3 > > hive> set hive.exec.dynamic.partition.mode=nonstrict; > > hive> insert overwrite table tester1.tst3 partition (col1) select t1.col2, > t1.col1 from tst1 t1; > hive> dfs -ls -R /user/tester1/hive; > > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:27 > /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 > /user/tester1/hive/tester1.db/tst1/tst1.input > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 > /user/tester1/hive/tester1.db/tst2 > -rw-rw---- 3 tester1 testgroup123 291 2013-06-22 13:25 > /user/tester1/hive/tester1.db/tst2/000000_0 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:28 > /user/tester1/hive/tester1.db/tst3 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:28 > /user/tester1/hive/tester1.db/tst3/col1=1111 > -rw-rw---- 3 tester1 testgroup123 51 2013-06-22 13:28 > /user/tester1/hive/tester1.db/tst3/col1=1111/000000_0 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:28 > /user/tester1/hive/tester1.db/tst3/col1=2222 > -rw-rw---- 3 tester1 testgroup123 51 2013-06-22 13:28 > /user/tester1/hive/tester1.db/tst3/col1=2222/000000_0 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:28 > /user/tester1/hive/tester1.db/tst3/col1=3333 > -rw-rw---- 3 tester1 testgroup123 51 2013-06-22 13:28 > /user/tester1/hive/tester1.db/tst3/col1=3333/000000_0 > {code} > > > Thanks, > > Chaoyu Tang > >