[ https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Solimando updated HIVE-26828: ---------------------------------------- Description: _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM transiently (from [flaky_test output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], in case it disappears): {quote}< Status: Failed < Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml < #### A masked pattern was here #### < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded < Serialization trace: < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) < #### A masked pattern was here #### < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < #### A masked pattern was here #### < ] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5 < FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml < #### A masked pattern was here #### < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded < Serialization trace: < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) < #### A masked pattern was here #### < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < #### A masked pattern was here #### < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5 < PREHOOK: query: SELECT COUNT( * ) < FROM src1 x < JOIN srcpart z1 ON (x.key = z1.key) < JOIN src y1 ON (x.key = y1.key) < JOIN srcpart z2 ON (x.value = z2.value) < JOIN src y2 ON (x.value = y2.value) < WHERE z1.key < 'zzzzzzzz' AND z2.key < 'zzzzzzzzzz' < AND y1.value < 'zzzzzzzz' AND y2.value < 'zzzzzzzzzz' < PREHOOK: type: QUERY < PREHOOK: Input: default@src < PREHOOK: Input: default@src1 < PREHOOK: Input: default@srcpart < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11 < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12 < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11 < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12 < PREHOOK: Output: hdfs://### HDFS PATH ### {quote} The aim of this ticket is to investigate the issue, fix it and re-enable the test. The problem seems to lie in the deserialization of the computed tez dag plan. was: _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM transiently (from [flaky_test output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], in case it disappears): {code:java} < Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex {code} {code:java} The aim of this ticket is to investigate the issue, fix it and re-enable the test.{code} > Fix OOM for hybridgrace_hashjoin_2.q > ------------------------------------ > > Key: HIVE-26828 > URL: https://issues.apache.org/jira/browse/HIVE-26828 > Project: Hive > Issue Type: Bug > Components: Test, Tez > Affects Versions: 4.0.0-alpha-2 > Reporter: Alessandro Solimando > Priority: Major > > _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM > transiently (from [flaky_test > output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], > in case it disappears): > {quote}< Status: Failed > < Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex > vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex > Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], > java.lang.RuntimeException: Failed to load plan: > hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml > < #### A masked pattern was here #### > < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.OutOfMemoryError: GC overhead limit exceeded > < Serialization trace: > < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) > < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > < #### A masked pattern was here #### > < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > < #### A masked pattern was here #### > < ] > < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] > < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] > < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] > < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] > < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] > < DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5 > < FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, > vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed > due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, > vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: > hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml > < #### A masked pattern was here #### > < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.OutOfMemoryError: GC overhead limit exceeded > < Serialization trace: > < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) > < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > < #### A masked pattern was here #### > < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > < #### A masked pattern was here #### > < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed > due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to > OTHER_VERTEX_FAILURE][Masked Vertex killed due to > OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG > did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5 > < PREHOOK: query: SELECT COUNT( * ) > < FROM src1 x > < JOIN srcpart z1 ON (x.key = z1.key) > < JOIN src y1 ON (x.key = y1.key) > < JOIN srcpart z2 ON (x.value = z2.value) > < JOIN src y2 ON (x.value = y2.value) > < WHERE z1.key < 'zzzzzzzz' AND z2.key < 'zzzzzzzzzz' > < AND y1.value < 'zzzzzzzz' AND y2.value < 'zzzzzzzzzz' > < PREHOOK: type: QUERY > < PREHOOK: Input: default@src > < PREHOOK: Input: default@src1 > < PREHOOK: Input: default@srcpart > < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11 > < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12 > < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11 > < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12 > < PREHOOK: Output: hdfs://### HDFS PATH ### > {quote} > The aim of this ticket is to investigate the issue, fix it and re-enable the > test. > The problem seems to lie in the deserialization of the computed tez dag plan. -- This message was sent by Atlassian Jira (v8.20.10#820010)