Re: [DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between SQL and TableAPI

Timo Walther Tue, 02 Jan 2024 02:26:46 -0800

Hi Jane,

thanks for the heavy investigation and extensive summaries. I'm sorrythat I ignored this discussion for too long but would like to help inshaping a sustainable long-term solution.


I fear that changing:
- RowType#copy()
- RowType's constructor
- FieldsDataType#nullable()
will not solve all transitive issues.

We should approach the problem from a different perspective. In my pointof view:

- DataType and LogicalType are just type declarations.

- RelDataType is similarly just a type declaration. Please correct me ifI'm wrong but RelDataType itself also allows `ROW<f0 INT NOT NULL> NOTNULL`. It's the factory or optimizer that performs necessary changes.- It's up to the framework (i.e. planner or Table API) to decide what todo with these declarations.


Let's take a Java class:

class MyPojo {
  int i;
}

MyPojo can be nullable, but i cannot. This is the reason why we decidedto introduce the current behavior. Complex structs are usually generatedfrom Table API or from the catalog (e.g. when mapping to schema registryor some other external system). It could lead to other downstreaminconsistencies if we change the method above.

I can't provide a better solution right now, I need more research onthis topic. But we should definitely avoid another breaking changesimilar to [1] where the data type system was touched and other projectswere affected.

How about we work together on this topic and create a FLIP for this? Weneed more examples in a unified document. Currently, the proposal issplit across multiple Flink and Calcite JIRA issues and a ML discussion.


Regards,
Timo


[1] https://issues.apache.org/jira/browse/FLINK-33523


On 26.12.23 04:47, Jane Chan wrote:

Thanks Shengkai and Xuyang.

@Shengkai

I have one question: Is the influence only limited to the RowType? Does the

Map or Array type have the same problems?


I think the issue is exclusive to RowType. You may want to review
CALCITE-2464[1] for more details.

[1] https://issues.apache.org/jira/browse/CALCITE-2464

@Xuyang

Is it possible to consider introducing a deprecated option to allow users

to fall back to the previous version (default fallback), and then
officially deprecate it in Flink 2.0?


If I understand correctly, 2.0 allows breaking changes to remove historical
baggage in this release. Therefore, if we want to fix this issue before
2.0, we could introduce a fallback option in the two most recent versions
(1.19 and 1.20). However, from version 2.0 onwards, since we no longer
promise backward compatibility, introducing a fallback option might be
unnecessary. What do you think?

BTW, this jira FLINK-33217[1] is caused by that Flink SQL does not handle

the nullable attribute of the Row type in the way Calcite expected.
However, fixing them will also cause a relatively large impact. We may also
need to check the code part in SQL.


Yes, this is another issue caused by the row type nullability handling.
I've mentioned this JIRA ticket in the reference link to the previous
reply.

Best,
Jane

On Mon, Dec 25, 2023 at 1:42 PM Xuyang <xyzhong...@163.com> wrote:

Hi, Jane, thanks for driving this.


IMO, it is important to keep same consistent semantics between table api
and sql, not only for maintenance, but also for user experience. But for
users, the impact of this modification is a bit large. Is it possible to
consider introducing a deprecated option to allow users to fall back to the
previous version (default fallback), and then officially deprecate it in
Flink 2.0?


BTW, this jira FLINK-33217[1] is caused by that Flink SQL does not handle
the nullable attribute of the Row type in the way Calcite expected.
However, fixing them will also cause a relatively large impact. We may also
need to check the code part in SQL.


[1] https://issues.apache.org/jira/browse/FLINK-33217




--

     Best！
     Xuyang





在 2023-12-25 10:16:28，"Shengkai Fang" <fskm...@gmail.com> 写道：

Thanks for Jane and Sergey's proposal!

+1 to correct the Table API behavior.

I have one question: Is the influence only limited to the RowType? Does

the

Map or Array type have the same problems?

Best,
Shengkai
[DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between SQL
and TableA

Jane Chan <qingyue....@gmail.com> 于2023年12月22日周五 17:40写道：

Dear devs,

Several issues [1][2][3] have been identified regarding the inconsistent
treatment of ROW type nullability between SQL and TableAPI. However,
addressing these discrepancies might necessitate updates to the public

API.

Therefore, I'm initiating this discussion to engage the community in
forging a unified approach to resolve these challenges.

To summarize, SQL prohibits ROW types such as ROW<a INT NOT NULL, b
STRING>, which is implicitly rewritten to ROW<a INT, b STRING> by
Calcite[4]. In contrast, TableAPI permits such types, resulting in
inconsistency.
[image: image.png]
For a comprehensive issue breakdown, please refer to the comment of [1].

According to CALCITE-2464[4], ROW<f0 INT NOT NULL> is not a valid type.

As

a result, the behavior of TableAPI is incorrect and needs to be

consistent

with SQL, which means the implantation for the following public API

needs

to be changed.

    - RowType#copy(boolean nullable) should also set the inner fields to
    null if nullable is true.
    - RowType's constructor should also check nullability.
    - FieldsDataType#nullable() should also set the inner fields to null.

In addition to the necessary changes in the implementation of the
aforementioned API, the semantics of the following API have also been
impacted.

    - `DataTypes.ROW(DataTypes.FIELD("f0",
    DataTypes.INT().notNull())).notNull()` cannot create a type like

`ROW<INT

    NOT NULL>NOT NULL`.
    - Idempotence for chained calls `notNull().nullable().notNull()` for
    `Row` cannot be maintained.

Sergey and I have engaged in a discussion regarding the solution [1].

I'm

interested in gathering additional perspectives on the fix.

Look forward to your ideas!

Best,
Jane


[1] https://issues.apache.org/jira/browse/FLINK-31830
[2] https://issues.apache.org/jira/browse/FLINK-31829
[3] https://issues.apache.org/jira/browse/FLINK-33217
[4] https://issues.apache.org/jira/browse/CALCITE-2464

Re: [DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between SQL and TableAPI

Reply via email to