[ 
https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26828:
----------------------------------------
    Description: 
_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{quote}< Status: Failed
< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex 
vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: 
z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: 
Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
< #### A masked pattern was here ####
< Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
< Serialization trace:
< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
< aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
< #### A masked pattern was here ####
< Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
< #### A masked pattern was here ####
< ]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
< FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, 
vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
< #### A masked pattern was here ####
< Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
< Serialization trace:
< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
< aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
< #### A masked pattern was here ####
< Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
< #### A masked pattern was here ####
< ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due 
to OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked 
Vertex killed due to OTHER_VERTEX_FAILURE]DAG did not succeed due to 
VERTEX_FAILURE. failedVertices:1 killedVertices:5
< PREHOOK: query: SELECT COUNT( * )
< FROM src1 x
< JOIN srcpart z1 ON (x.key = z1.key)
< JOIN src y1 ON (x.key = y1.key)
< JOIN srcpart z2 ON (x.value = z2.value)
< JOIN src y2 ON (x.value = y2.value)
< WHERE z1.key < 'zzzzzzzz' AND z2.key < 'zzzzzzzzzz'
< AND y1.value < 'zzzzzzzz' AND y2.value < 'zzzzzzzzzz'
< PREHOOK: type: QUERY
< PREHOOK: Input: default@src
< PREHOOK: Input: default@src1
< PREHOOK: Input: default@srcpart
< PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11
< PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12
< PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11
< PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12
< PREHOOK: Output: hdfs://### HDFS PATH ###
{quote}
The aim of this ticket is to investigate the issue, fix it and re-enable the 
test.

The problem seems to lie in the deserialization of the computed tez dag plan.

  was:
_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{code:java}
< Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, 
diagnostics=[Vertex {code}
{code:java}




The aim of this ticket is to investigate the issue, fix it and re-enable the 
test.{code}


> Fix OOM for hybridgrace_hashjoin_2.q
> ------------------------------------
>
>                 Key: HIVE-26828
>                 URL: https://issues.apache.org/jira/browse/HIVE-26828
>             Project: Hive
>          Issue Type: Bug
>          Components: Test, Tez
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Alessandro Solimando
>            Priority: Major
>
> _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
> transiently (from [flaky_test 
> output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
>  in case it disappears):
> {quote}< Status: Failed
> < Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex 
> vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex 
> Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
> < #### A masked pattern was here ####
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> < #### A masked pattern was here ####
> < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> < #### A masked pattern was here ####
> < ]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
> < DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
> < FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, 
> vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed 
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
> vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
> hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
> < #### A masked pattern was here ####
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> < #### A masked pattern was here ####
> < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> < #### A masked pattern was here ####
> < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed 
> due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
> OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
> OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG 
> did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
> < PREHOOK: query: SELECT COUNT( * )
> < FROM src1 x
> < JOIN srcpart z1 ON (x.key = z1.key)
> < JOIN src y1 ON (x.key = y1.key)
> < JOIN srcpart z2 ON (x.value = z2.value)
> < JOIN src y2 ON (x.value = y2.value)
> < WHERE z1.key < 'zzzzzzzz' AND z2.key < 'zzzzzzzzzz'
> < AND y1.value < 'zzzzzzzz' AND y2.value < 'zzzzzzzzzz'
> < PREHOOK: type: QUERY
> < PREHOOK: Input: default@src
> < PREHOOK: Input: default@src1
> < PREHOOK: Input: default@srcpart
> < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11
> < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12
> < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11
> < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12
> < PREHOOK: Output: hdfs://### HDFS PATH ###
> {quote}
> The aim of this ticket is to investigate the issue, fix it and re-enable the 
> test.
> The problem seems to lie in the deserialization of the computed tez dag plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to