[ https://issues.apache.org/jira/browse/HIVE-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817998#comment-15817998 ]
Hive QA commented on HIVE-15527: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12846750/HIVE-15527.7.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 10766 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=101) [limit_pushdown2.q,skewjoin_noskew.q,leftsemijoin_mr.q,bucket3.q,skewjoinopt13.q,bucketmapjoin9.q,auto_join15.q,ptf.q,join22.q,vectorized_nested_mapjoin.q,sample4.q,union18.q,multi_insert_gby.q,join33.q,join_cond_pushdown_unqual2.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=108) [union_remove_1.q,ppd_outer_join2.q,date_udf.q,groupby1_noskew.q,join20.q,smb_mapjoin_13.q,groupby_rollup1.q,temp_table_gb1.q,vector_string_concat.q,smb_mapjoin_6.q,metadata_only_queries.q,auto_sortmerge_join_12.q,groupby_bigdata.q,groupby3_map_multi_distinct.q,innerjoin.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=114) [escape_distributeby1.q,join9.q,groupby2.q,groupby4_map.q,udf_max.q,vectorization_pushdown.q,cbo_gby_empty.q,join_cond_pushdown_unqual3.q,vectorization_short_regress.q,join8.q,sample10.q,cross_product_check_1.q,auto_join_stats.q,input_part2.q,groupby_multi_single_reducer3.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=115) [groupby_map_ppr_multi_distinct.q,vectorization_13.q,mapjoin_mapjoin.q,union2.q,join41.q,groupby8_map.q,cbo_subq_not_in.q,identity_project_remove_skip.q,stats5.q,groupby8_map_skew.q,nullgroup2.q,mapjoin_subquery.q,bucket2.q,smb_mapjoin_1.q,union_remove_8.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=117) [timestamp_lazy.q,union29.q,runtime_skewjoin_mapjoin_spark.q,auto_join22.q,union8.q,groupby5_map.q,dynamic_rdd_cache.q,auto_join29.q,groupby6.q,merge1.q,mapjoin_distinct.q,vector_decimal_mapjoin.q,sample5.q,multi_insert_move_tasks_share_dependencies.q,join_array.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=119) [groupby4_noskew.q,groupby3_map_skew.q,join_cond_pushdown_2.q,union19.q,union24.q,union_remove_5.q,groupby7_noskew_multi_single_reducer.q,vectorization_1.q,index_auto_self_join.q,auto_smb_mapjoin_14.q,script_env_var2.q,pcr.q,auto_join_filters.q,join0.q,join37.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=120) [stats12.q,groupby4.q,union_top_level.q,stats2.q,groupby10.q,mapjoin_filter_on_outerjoin.q,auto_sortmerge_join_4.q,limit_partition_metadataonly.q,load_dyn_part4.q,union3.q,groupby_multi_single_reducer.q,smb_mapjoin_14.q,groupby3_noskew_multi_distinct.q,stats18.q,union_remove_21.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=123) [groupby_complex_types.q,multigroupby_singlemr.q,union11.q,groupby7.q,join5.q,bucketmapjoin_negative2.q,vectorization_div0.q,union_script.q,add_part_multiple.q,limit_pushdown.q,union_remove_17.q,uniquejoin.q,metadata_only_queries_with_filters.q,union25.q,load_dyn_part13.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=126) [smb_mapjoin_15.q,script_pipe.q,auto_join24.q,filter_join_breaktask.q,bucket4.q,ppd_multi_insert.q,skewjoinopt20.q,join_thrift.q,multi_insert_gby3.q,groupby8.q,join_map_ppr.q,auto_sortmerge_join_8.q,escape_clusterby1.q,groupby_multi_insert_common_distinct.q,join6.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=131) [join2.q,join36.q,avro_joins_native.q,join18.q,smb_mapjoin_10.q,temp_table.q,union_remove_13.q,auto_sortmerge_join_5.q,groupby5_noskew.q,auto_join0.q,vectorization_17.q,auto_join_stats2.q,skewjoin_union_remove_1.q,union16.q,join_literals.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=94) [bucketmapjoin4.q,bucket_map_join_spark4.q,union21.q,groupby2_noskew.q,timestamp_2.q,date_join1.q,mergejoins.q,smb_mapjoin_11.q,auto_sortmerge_join_3.q,mapjoin_test_outer.q,vectorization_9.q,merge2.q,groupby6_noskew.q,auto_join_without_localtask.q,multi_join_union.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=98) [groupby_map_ppr.q,nullgroup4_multi_distinct.q,join_rc.q,union14.q,smb_mapjoin_12.q,vector_cast_constant.q,union_remove_4.q,auto_join11.q,load_dyn_part7.q,udaf_collect_set.q,vectorization_12.q,groupby_sort_skew_1.q,groupby_sort_skew_1_23.q,smb_mapjoin_25.q,skewjoinopt12.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=151) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2880/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2880/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2880/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12846750 - PreCommit-HIVE-Build > Memory usage is unbound in SortByShuffler for Spark > --------------------------------------------------- > > Key: HIVE-15527 > URL: https://issues.apache.org/jira/browse/HIVE-15527 > Project: Hive > Issue Type: Improvement > Components: Spark > Affects Versions: 1.1.0 > Reporter: Xuefu Zhang > Assignee: Chao Sun > Attachments: HIVE-15527.0.patch, HIVE-15527.0.patch, > HIVE-15527.1.patch, HIVE-15527.2.patch, HIVE-15527.3.patch, > HIVE-15527.4.patch, HIVE-15527.5.patch, HIVE-15527.6.patch, > HIVE-15527.7.patch, HIVE-15527.patch > > > In SortByShuffler.java, an ArrayList is used to back the iterator for values > that have the same key in shuffled result produced by spark transformation > sortByKey. It's possible that memory can be exhausted because of a large key > group. > {code} > @Override > public Tuple2<HiveKey, Iterable<BytesWritable>> next() { > // TODO: implement this by accumulating rows with the same key > into a list. > // Note that this list needs to improved to prevent excessive > memory usage, but this > // can be done in later phase. > while (it.hasNext()) { > Tuple2<HiveKey, BytesWritable> pair = it.next(); > if (curKey != null && !curKey.equals(pair._1())) { > HiveKey key = curKey; > List<BytesWritable> values = curValues; > curKey = pair._1(); > curValues = new ArrayList<BytesWritable>(); > curValues.add(pair._2()); > return new Tuple2<HiveKey, Iterable<BytesWritable>>(key, > values); > } > curKey = pair._1(); > curValues.add(pair._2()); > } > if (curKey == null) { > throw new NoSuchElementException(); > } > // if we get here, this should be the last element we have > HiveKey key = curKey; > curKey = null; > return new Tuple2<HiveKey, Iterable<BytesWritable>>(key, > curValues); > } > {code} > Since the output from sortByKey is already sorted on key, it's possible to > backup the value iterable using the same input iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)