> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines > > 1807-1814 > > <https://reviews.apache.org/r/50359/diff/7/?file=1469107#file1469107line1807> > > > > Why not use newly added Context::getTempDirForPath(Path path) here.
Yeah, sorry. This is a little confusing. The thing is that 'tmpDir' is based on 'dest' (tmpDir = baseCtx.getExternalTmpPath(dest)) where 'dest' is an HDFS temporary directory (not S3). This is the directory causing the .hive-staging to be created on S3 at the end, when HDFS temp dir was copied to S3 (INSERT OVERWRITE). I found out that FileSinkDesc has a 'getDestPath' that returns you the S3 path. So, the condition is if the 'getDestPath' is on S3, then use 'getMRTmpPath', or continue using the temporary path based on 'dest' (HDFS temp path). That part of the code was a little confusing regarding the names 'dest', 'getDestPath', 'getFinalDirName'. I was trying to understand this code, but I could not figure out the idea behind 'getFinalDirnName', and 'getDestPath'; so I ended up writing that condition. Also, the comments that were already there mentioned that the temp file should be in the same filesystem as the destination (in case of non-blobstore directories). > On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, lines > > 7020-7024 > > <https://reviews.apache.org/r/50359/diff/7/?file=1469108#file1469108line7020> > > > > Why not use newly introduced tx.getTempDirForPath(dest_path); here? This part was causing 72 tests failing due to the different scratch directory name. Also I wasn't sure why the stats temp was on the same location as 'queryTmpdir', so I added the condition too incase it has issues with encrypted zones. I like your line best, but I wasn't sure about it, and I ended up doing this condition. I can do the 'ctx.getTempDirForPath' better. What do you think? > On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 6763 > > <https://reviews.apache.org/r/50359/diff/7/?file=1469108#file1469108line6763> > > > > surprised that we weren't using getExternalTmpPathRelTo() here, did we > > miss this when we introduced this method for encrypt support work? Mmm, i'm surprised too. Maybe we missed it. - Sergio ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50359/#review145269 ----------------------------------------------------------- On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50359/ > ----------------------------------------------------------- > > (Updated Aug. 9, 2016, 7:53 p.m.) > > > Review request for hive. > > > Bugs: HIVE-14270 > https://issues.apache.org/jira/browse/HIVE-14270 > > > Repository: hive-git > > > Description > ------- > > This patch will create a temporary directory for Hive intermediate data on > HDFS when S3 tables are used. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > 9f5f619359701b948f57d599a5bdc2ecbdff280a > common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/Context.java > 89893eba9fd2316b9a393f06edefa837bb815faf > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java > 5bd78862e1064d7f64a5d764571015a8df1101e8 > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java > a01a7bdbfec962b6617e98091cdb1325c5b0e84f > ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION > > Diff: https://reviews.apache.org/r/50359/diff/ > > > Testing > ------- > > NO PATCH > ** NON-PARTITIONED TABLE > > - create table dummy (id int); > 3.651s > - insert into table s3dummy values (1); > 39.231s > - insert overwrite table s3dummy values (1); > 42.569s > - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from > dummy; 30.136s > > EXTERNAL TABLE > > - create table s3dummy_ext like s3dummy location > 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s > - insert into table s3dummy_ext values (1); > 45.855s > > WITH PATCH > > ** NON-PARTITIONED TABLE > - create table s3dummy (id int) location > 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s > - insert into table s3dummy values (1); > 15.025s > - insert overwrite table s3dummy values (1); > 25.149s > - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from > dummy; 19.158s > - from dummy insert overwrite table s3dummy select *; > 25.469s > - from dummy insert into table s3dummy select *; > 14.501s > > ** EXTERNAL TABLE > - create table s3dummy_ext like s3dummy location > 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s > - insert into table s3dummy_ext values (1); > 16.070s > > ** PARTITIONED TABLE > - create table s3dummypart (id int) partitioned by (part int) > location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; > 3.176s > - alter table s3dummypart add partition (part=1); > 3.229s > - alter table s3dummypart add partition (part=2); > 3.124s > - insert into table s3dummypart partition (part=1) values (1); > 14.876s > - insert overwrite table s3dummypart partition (part=1) values (1); > 27.594s > - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * > from dummypart; 22.298s > - from dummypart insert overwrite table s3dummypart partition (part=1) select > id; 29.001s > - from dummypart insert into table s3dummypart partition (part=1) select id; > 14.869s > > ** DYNAMIC PARTITIONS > - insert into table s3dummypart partition (part) select id, 1 from dummypart; > 15.185s > - insert into table s3dummypart partition (part) select id, 1 from dummypart; > 18.820s > > > Thanks, > > Sergio Pena > >