Ahmad Humayun created FLINK-38446:
-------------------------------------

             Summary: Pyflink explain() crashes with Index Out Of Bounds
                 Key: FLINK-38446
                 URL: https://issues.apache.org/jira/browse/FLINK-38446
             Project: Flink
          Issue Type: Bug
          Components: Table SQL / Planner
    Affects Versions: 2.0.0
         Environment: apache-flink 2.0.0
python 3.10.12
            Reporter: Ahmad Humayun


The following code causes an Index Out Of Bounds exception when explain is 
called.

 
{code:java}
from pyflink.table import *
from pyflink.table.expressions import *
from pyflink.table.udf import udf
from pyflink.table.types import DataTypes
from pyflink.common import Configuration
from pyflink.table.udf import AggregateFunction, udaf
from pyflink.table import DataTypes
import pandas as pd
 
def custom_aggregation(values: pd.Series):
   return values.product()
 
custom_udf_agg = udaf(custom_aggregation, result_type=DataTypes.BIGINT(), 
func_type="pandas")
 
cfg = Configuration()
 
settings = (
  EnvironmentSettings.new_instance()
  .in_batch_mode()
  .with_configuration(cfg)
  .build()
)
 
table_env = TableEnvironment.create(settings)
table_env.create_temporary_function("custom_udf_agg", custom_udf_agg)
 
data = [
   (101,),
   (102,),
   (101,),
   (103,)
]
 
schema = ["A"]
 
# Create the source table
source_table = table_env.from_elements(
   data,
   schema=schema
)
 
t = source_table.group_by(col('A')).select(col('A').avg.alias('A'))
t = t.distinct()
t = t.group_by(col('A')).select(col('A').max.alias('A'))
t = t.group_by(col('A')).select(call('custom_udf_agg', col('A')).alias('A'))
print(t.explain())
{code}

Here is the error trace:



 
{code:java}
py4j.protocol.Py4JJavaError: An error occurred while calling o85.explain.
: java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
        at 
java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
        at 
java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
        at 
java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
        at java.base/java.util.Objects.checkIndex(Objects.java:361)
        at java.base/java.util.ArrayList.get(ArrayList.java:427)
        at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.lambda$createHashPartitioner$2(BatchExecExchange.java:241)
        at 
java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180)
        at 
java.base/java.util.Spliterators$IntArraySpliterator.forEachRemaining(Spliterators.java:1076)
        at 
java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:711)
        at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
        at 
java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
        at 
java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
        at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.createHashPartitioner(BatchExecExchange.java:242)
        at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.java:209)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
        at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecSort.translateToPlanInternal(BatchExecSort.java:110)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
        at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.java:161)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
        at 
org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecCalc.translateToPlanInternal(CommonExecCalc.java:94)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
        at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.java:161)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
        at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecPythonGroupAggregate.translateToPlanInternal(BatchExecPythonGroupAggregate.java:93)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:262)
        at 
org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecCalc.translateToPlanInternal(CommonExecCalc.java:94)
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:168)
        at 
org.apache.flink.table.planner.delegation.BatchPlanner.$anonfun$translateToPlan$1(BatchPlanner.scala:95)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.flink.table.planner.delegation.BatchPlanner.translateToPlan(BatchPlanner.scala:94)
        at 
org.apache.flink.table.planner.delegation.PlannerBase.getExplainGraphs(PlannerBase.scala:627)
        at 
org.apache.flink.table.planner.delegation.BatchPlanner.explain(BatchPlanner.scala:149)
        at 
org.apache.flink.table.planner.delegation.BatchPlanner.explain(BatchPlanner.scala:49)
        at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.explainInternal(TableEnvironmentImpl.java:753)
        at 
org.apache.flink.table.api.internal.TableImpl.explain(TableImpl.java:482)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at 
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at 
org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
        at 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.base/java.lang.Thread.run(Thread.java:840)
{code}

I believe this is a bug in the planner. If so, can someone please explain why 
this occurs? Where exactly is the planner messing up?

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to