----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67126/#review203111 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java Line 1910 (original), 1910 (patched) <https://reviews.apache.org/r/67126/#comment285189> this change fixes the tez.merge.files case; because the only problem in that cases is that the second filesink is not gathering stats my own opinion: is that gathering statistics has no real overhead (it writes a file)...I think by enabling it here and there it somewhat just adds complexity ql/src/test/results/clientpositive/llap/union_stats.q.out Lines 427 (patched) <https://reviews.apache.org/r/67126/#comment285190> FS_7 is present 2 times in this plan operator ids are reused multiple times in queries like: from (select * from src union all select * from src)s insert overwrite table t1 select * insert overwrite table t2 select *; if I understand correctly actually the file sink id's are reused for in every union branch to do output. HIVE-19237 should fix this; and probably also remove indexInTezUnion setters/etc - Zoltan Haindrich On May 15, 2018, 1:07 p.m., Zoltan Haindrich wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67126/ > ----------------------------------------------------------- > > (Updated May 15, 2018, 1:07 p.m.) > > > Review request for hive, Ashutosh Chauhan and Sergey Shelukhin. > > > Bugs: HIVE-19326 > https://issues.apache.org/jira/browse/HIVE-19326 > > > Repository: hive-git > > > Description > ------- > > in queries like: INSERT ... SELECT ... UNION ALL SELECT ... > the stats are only collected for the first select > > there are 2 issues fixed - which both resulted in the same result: > > * statscollectors have overwritten eachothers result; because the filename > was only dependent from the resulting table name > * in case tez.merge.files the 2. task have not been set to collect statistics > > > Diffs > ----- > > itests/src/test/resources/testconfiguration.properties > 13c08de3c57fdd7fcd4181814bb8e547c699b9f1 > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java > 01a5b4c9c328cb034a613a1539cea2584e122fb4 > ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java > 108bb57c4189b720e672eb6f09b1ef05f78448c2 > ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java > 07991811f92bc7accae2fde23244f424bdd64c6b > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java > 605bb09caba3b60ad8d51e50dace70757ab80188 > ql/src/java/org/apache/hadoop/hive/ql/stats/StatsCollectionContext.java > 5c3328c63e8ca37e284e1dc1cdbee5969e185a80 > ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsPublisher.java > 902b37f7874dd5b1afaf8c8bb1259c6f0ddf817f > ql/src/test/queries/clientpositive/union_fast_stats.q > d69bef3ac083d5a06acda9f47e5d2c1cbe2dfb69 > ql/src/test/queries/clientpositive/union_rowcounts.q PRE-CREATION > ql/src/test/queries/clientpositive/union_stats.q > 0e91c23fea475ec95a7fa67433707cb290b277a2 > ql/src/test/results/clientpositive/llap/multiMapJoin1.q.out > f8adcd4ba24f2122f7e7e20770e24a71cfb01a7e > ql/src/test/results/clientpositive/llap/union_fast_stats.q.out > 4ca5f47a850ba62290c0845eb11a8d0a32780526 > ql/src/test/results/clientpositive/llap/union_rowcounts.q.out PRE-CREATION > ql/src/test/results/clientpositive/llap/union_stats.q.out > 5a088f40f5fea41d04a5b6cb6c12bf852a22f097 > ql/src/test/results/clientpositive/union_stats.q.out > 8bd3f44b6e276d6636082673084d66bba3b5c0d3 > > > Diff: https://reviews.apache.org/r/67126/diff/1/ > > > Testing > ------- > > > Thanks, > > Zoltan Haindrich > >