----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65130/#review195503 -----------------------------------------------------------
- Why the need for the file name change even for non-bucketed tables? - With this patch, what happens to existing tables (bucketed and non-bucketed) which contain filenames that do not match the format used here? - How are external tables handled, where the user is responsible for managing the files in the table directories? For both bucketed and non-bucketed tables. ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java Lines 3951 (patched) <https://reviews.apache.org/r/65130/#comment274704> Add a comment about the assumptions/actions taken here - it looks like bucketed files are assumed to be in the correct name format? Is there any validation that the And all non-bucketed files are renamed to 000000_0? I wonder if this will ever cause a problem with tons of files with 000000_0, 000000_0_copy_1, 000000_0_copy_2 etc. Though I guess this is currently what happens when INSERT INTO TABLE occurs. ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java Lines 4034 (patched) <https://reviews.apache.org/r/65130/#comment274697> This check looks ugly, but if you have to do it then maybe do what is done in SemanticAnalyzer.analyzeCreateTable() - .startsWith(HiveConf.getVar(conf, HiveConf.ConfVars.STAGINGDIR)) Also, you are invoking toString() on an array (srcs) - use srcs[0].getPath().getName(). Also please add a comment here to describe why this extra check is here. - Jason Dere On Jan. 16, 2018, 9:31 a.m., Deepak Jaiswal wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65130/ > ----------------------------------------------------------- > > (Updated Jan. 16, 2018, 9:31 a.m.) > > > Review request for hive, Eugene Koifman and Jason Dere. > > > Bugs: HIVE-18350 > https://issues.apache.org/jira/browse/HIVE-18350 > > > Repository: hive-git > > > Description > ------- > > Made changes for both bucketed and non-bucketed tables. > Added a positive test for non-bucketed table which renames the loaded file. > Added couple of negative tests for bucketed table which reject a load with > inconsistent file name. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1a2b3c1f6c > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java > 4535c3edc2 > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveCopyFiles.java > cc1d8574b0 > ql/src/test/queries/clientnegative/load_data_bucketed_1.q PRE-CREATION > ql/src/test/queries/clientnegative/load_data_bucketed_2.q PRE-CREATION > ql/src/test/queries/clientpositive/load_data_rename.q PRE-CREATION > ql/src/test/queries/clientpositive/smb_mapjoin_7.q 4a6afb0496 > ql/src/test/results/clientnegative/load_data_bucketed_1.q.out PRE-CREATION > ql/src/test/results/clientnegative/load_data_bucketed_2.q.out PRE-CREATION > ql/src/test/results/clientpositive/beeline/smb_mapjoin_7.q.out 7a6f8c53a5 > ql/src/test/results/clientpositive/llap/load_data_rename.q.out PRE-CREATION > ql/src/test/results/clientpositive/smb_mapjoin_7.q.out b71c5b87c1 > ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out ac49c02913 > > > Diff: https://reviews.apache.org/r/65130/diff/3/ > > > Testing > ------- > > > Thanks, > > Deepak Jaiswal > >