Mika Naylor created FLINK-37616:
-----------------------------------

             Summary: PyFlink incorrectly unpickles Row fields
                 Key: FLINK-37616
                 URL: https://issues.apache.org/jira/browse/FLINK-37616
             Project: Flink
          Issue Type: Bug
          Components: API / Python
            Reporter: Mika Naylor
            Assignee: Mika Naylor


If you call {{TableEnvironment.from_elements}} where one of the fields in the 
row contains a {{Row}} Type, for example where one of the values you pass in is:
{code:java}
[
    Row("pyflink1A", "pyflink2A", "pyflink3A"),
    Row("pyflink1B", "pyflink2B", "pyflink3B"),
    Row("pyflink1C", "pyflink2C", "pyflink3C"),
],{code}

where the schema for the field is:
{code:java}
DataTypes.ARRAY(
    DataTypes.ROW(
        [
            DataTypes.FIELD("a", DataTypes.STRING()),
            DataTypes.FIELD("b", DataTypes.STRING()),
            DataTypes.FIELD("c", DataTypes.STRING()),
        ]
    )
),{code}

When you call {{execute().collect()}} on the table, the array is returned as:
{code:java}
[
    <Row(['pyflink1a', 'pyflink2a', 'pyflink3a'])>,
    <Row(['pyflink1b', 'pyflink2b', 'pyflink3b'])>,
    <Row(['pyflink1c', 'pyflink2c', 'pyflink3c'])>
]{code}

Instead of each {{Row}} having 3 values, the collected row only has 1 value, 
which is now a list of the actual values in the row. The input and output rows 
are no longer equal (as their internal _values collection are no longer equal, 
one being a list of strings and the other being a list of a list of strings). 
The len() of the source Row is correctly returned as 3, but the collected row 
incorrectly reports a len() of 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to