Ahmad Humayun created FLINK-38446:
-------------------------------------
Summary: Pyflink explain() crashes with Index Out Of Bounds
Key: FLINK-38446
URL: https://issues.apache.org/jira/browse/FLINK-38446
Project: Flink
Issue Type: Bug
Components: Table SQL / Planner
Affects Versions: 2.0.0
Environment: apache-flink 2.0.0
python 3.10.12
Reporter: Ahmad Humayun
The following code causes an Index Out Of Bounds exception when explain is
called.
{code:java}
from pyflink.table import *
from pyflink.table.expressions import *
from pyflink.table.udf import udf
from pyflink.table.types import DataTypes
from pyflink.common import Configuration
from pyflink.table.udf import AggregateFunction, udaf
from pyflink.table import DataTypes
import pandas as pd
def custom_aggregation(values: pd.Series):
return values.product()
custom_udf_agg = udaf(custom_aggregation, result_type=DataTypes.BIGINT(),
func_type="pandas")
cfg = Configuration()
settings = (
EnvironmentSettings.new_instance()
.in_batch_mode()
.with_configuration(cfg)
.build()
)
table_env = TableEnvironment.create(settings)
table_env.create_temporary_function("custom_udf_agg", custom_udf_agg)
data = [
(101,),
(102,),
(101,),
(103,)
]
schema = ["A"]
# Create the source table
source_table = table_env.from_elements(
data,
schema=schema
)
t = source_table.group_by(col('A')).select(col('A').avg.alias('A'))
t = t.distinct()
t = t.group_by(col('A')).select(col('A').max.alias('A'))
t = t.group_by(col('A')).select(call('custom_udf_agg', col('A')).alias('A'))
print(t.explain())
{code}
Here is the error trace:
{code:java}
py4j.protocol.Py4JJavaError: An error occurred while calling o85.explain.
: java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
at
java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at
java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at
java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
at java.base/java.util.Objects.checkIndex(Objects.java:361)
at java.base/java.util.ArrayList.get(ArrayList.java:427)
at
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.lambda$createHashPartitioner$2(BatchExecExchange.java:241)
at
java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180)
at
java.base/java.util.Spliterators$IntArraySpliterator.forEachRemaining(Spliterators.java:1076)
at
java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:711)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
at
java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at
java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
at
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.createHashPartitioner(BatchExecExchange.java:242)
at
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.java:209)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
at
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecSort.translateToPlanInternal(BatchExecSort.java:110)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
at
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.java:161)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
at
org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecCalc.translateToPlanInternal(CommonExecCalc.java:94)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
at
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.java:161)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
at
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecPythonGroupAggregate.translateToPlanInternal(BatchExecPythonGroupAggregate.java:93)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
at
org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecCalc.translateToPlanInternal(CommonExecCalc.java:94)
at
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
at
org.apache.flink.table.planner.delegation.BatchPlanner.$anonfun$translateToPlan$1(BatchPlanner.scala:95)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at
org.apache.flink.table.planner.delegation.BatchPlanner.translateToPlan(BatchPlanner.scala:94)
at
org.apache.flink.table.planner.delegation.PlannerBase.getExplainGraphs(PlannerBase.scala:627)
at
org.apache.flink.table.planner.delegation.BatchPlanner.explain(BatchPlanner.scala:149)
at
org.apache.flink.table.planner.delegation.BatchPlanner.explain(BatchPlanner.scala:49)
at
org.apache.flink.table.api.internal.TableEnvironmentImpl.explainInternal(TableEnvironmentImpl.java:753)
at
org.apache.flink.table.api.internal.TableImpl.explain(TableImpl.java:482)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at
org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:840)
{code}
I believe this is a bug in the planner. If so, can someone please explain why
this occurs? Where exactly is the planner messing up?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)