----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review212198 -----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Lines 2702 (patched) <https://reviews.apache.org/r/69367/#comment297881> I think this needs a better description itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java Lines 848 (patched) <https://reviews.apache.org/r/69367/#comment297879> Why is this needed? Shouldn't the compiler set this? itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 132 (patched) <https://reviews.apache.org/r/69367/#comment297882> what is orc.rows.between.memory.checks'='1 for? itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 196 (patched) <https://reviews.apache.org/r/69367/#comment297884> I don't understand the logic here. Since major compaction was done above, there should only be base/bucket0 and base/bucket1 so there is nothing for this query to group. Also, I would think SPLIT_GROUPING_MODE should be "query" here... if it's not, where is it set? itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 255 (patched) <https://reviews.apache.org/r/69367/#comment297885> nit: this could just do ShowCompactions to see if anything got queued up itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 309 (patched) <https://reviews.apache.org/r/69367/#comment297886> How does (3,3,x) end up in bucket0? with bucketing_version=1 it should be (val mod num_buckets)=bucketId. itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 312 (patched) <https://reviews.apache.org/r/69367/#comment297887> And smimilarly, (4,4) is in bucket1... itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 315 (patched) <https://reviews.apache.org/r/69367/#comment297888> since you just ran a major compaction, there is only 1 file per bucket so would split grouper do anything? would there be > 1 split? itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 325 (patched) <https://reviews.apache.org/r/69367/#comment297889> It would be useful to add a test of a table w/o buckets. Ideally one that has > 1 reducer during Insert so that there is > 1 output file. I think there is some propety to specify number of reducers... not sure if Tez respects it. itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 328 (patched) <https://reviews.apache.org/r/69367/#comment297883> should this be set in setUp()? Alternatively, should conf be cloned? seems error prone as it modifies state outside the method ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java Lines 189 (patched) <https://reviews.apache.org/r/69367/#comment297890> What is this for? It seems fragile since it forces some behavior on all tests. Do any newly added tests rely on this? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Line 168 (original), 172 (patched) <https://reviews.apache.org/r/69367/#comment297900> nit: are empty param decls needed? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 202 (patched) <https://reviews.apache.org/r/69367/#comment297905> It should be rowidoffset or splitstart. For 'original' splits (w/o acid meta cols in the file) SyntheticBucketProperties should always be there and so rowIdOffset is there. For 'native' acid files, OrcSplit doesn't have the 1st rowid in the split so splitStart is used to sort. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 203 (patched) <https://reviews.apache.org/r/69367/#comment297906> Would be useful to describe what that invariant is. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 241 (patched) <https://reviews.apache.org/r/69367/#comment297907> This is important to add ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 281 (patched) <https://reviews.apache.org/r/69367/#comment297908> what is this TODO for? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java Line 1233 (original) <https://reviews.apache.org/r/69367/#comment297896> Seems that now the class level JavaDoc is out of sync ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java Lines 637 (patched) <https://reviews.apache.org/r/69367/#comment297891> What throws the IAE? Above I see if (!reader.hasMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME)) { shouldn't it bail out there if there is no index? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java Lines 638 (patched) <https://reviews.apache.org/r/69367/#comment297892> is there a followup Jira for this? ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java Lines 2201 (patched) <https://reviews.apache.org/r/69367/#comment297893> it would be helpful to add COMPACTOR_CRUD_QUERY_BASED property name to the error msg ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 248 (patched) <https://reviews.apache.org/r/69367/#comment297919> What does this do for MM table? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 371 (patched) <https://reviews.apache.org/r/69367/#comment297922> should 'conf' be cloned? will this affect 'conf' for something else? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 537 (patched) <https://reviews.apache.org/r/69367/#comment297923> why does it need "0+ ..." ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java Lines 101 (patched) <https://reviews.apache.org/r/69367/#comment297895> Useful to include table/part name in the msg ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 17 (patched) <https://reviews.apache.org/r/69367/#comment297912> ROW__ID.bucket_column - you mean bucketId? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 57 (patched) <https://reviews.apache.org/r/69367/#comment297915> This doesn't compare statemetId anywhere but it should. I think the easiest is to compare bucketProperty or you could extract statemetId from it and do it explicitly ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 60 (patched) <https://reviews.apache.org/r/69367/#comment297917> I don't think equals makes sense ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 61 (patched) <https://reviews.apache.org/r/69367/#comment297918> it maybe useful to include both ROW__IDs in the message. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 74 (patched) <https://reviews.apache.org/r/69367/#comment297916> nit: make class and fields final to make sure compareTo is inlined? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 91 (patched) <https://reviews.apache.org/r/69367/#comment297914> when is it ok for 2 consecutive ROW_IDs to be equal? - Eugene Koifman On Jan. 21, 2019, 11:04 p.m., Vaibhav Gumashta wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > ----------------------------------------------------------- > > (Updated Jan. 21, 2019, 11:04 p.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > ------- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > d6a41919bf > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java > 15c14c9be5 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > fbb931cbcd > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java > 6d4578e7a0 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > 0e5b3e5473 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > dc05e1990e > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > a0df82cb20 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java > PRE-CREATION > ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f > > > Diff: https://reviews.apache.org/r/69367/diff/7/ > > > Testing > ------- > > > Thanks, > > Vaibhav Gumashta > >