Hi Jane,
thanks for the heavy investigation and extensive summaries. I'm sorry
that I ignored this discussion for too long but would like to help in
shaping a sustainable long-term solution.
I fear that changing:
- RowType#copy()
- RowType's constructor
- FieldsDataType#nullable()
will not solve all transitive issues.
We should approach the problem from a different perspective. In my point
of view:
- DataType and LogicalType are just type declarations.
- RelDataType is similarly just a type declaration. Please correct me if
I'm wrong but RelDataType itself also allows `ROW<f0 INT NOT NULL> NOT
NULL`. It's the factory or optimizer that performs necessary changes.
- It's up to the framework (i.e. planner or Table API) to decide what to
do with these declarations.
Let's take a Java class:
class MyPojo {
int i;
}
MyPojo can be nullable, but i cannot. This is the reason why we decided
to introduce the current behavior. Complex structs are usually generated
from Table API or from the catalog (e.g. when mapping to schema registry
or some other external system). It could lead to other downstream
inconsistencies if we change the method above.
I can't provide a better solution right now, I need more research on
this topic. But we should definitely avoid another breaking change
similar to [1] where the data type system was touched and other projects
were affected.
How about we work together on this topic and create a FLIP for this? We
need more examples in a unified document. Currently, the proposal is
split across multiple Flink and Calcite JIRA issues and a ML discussion.
Regards,
Timo
[1] https://issues.apache.org/jira/browse/FLINK-33523
On 26.12.23 04:47, Jane Chan wrote:
Thanks Shengkai and Xuyang.
@Shengkai
I have one question: Is the influence only limited to the RowType? Does the
Map or Array type have the same problems?
I think the issue is exclusive to RowType. You may want to review
CALCITE-2464[1] for more details.
[1] https://issues.apache.org/jira/browse/CALCITE-2464
@Xuyang
Is it possible to consider introducing a deprecated option to allow users
to fall back to the previous version (default fallback), and then
officially deprecate it in Flink 2.0?
If I understand correctly, 2.0 allows breaking changes to remove historical
baggage in this release. Therefore, if we want to fix this issue before
2.0, we could introduce a fallback option in the two most recent versions
(1.19 and 1.20). However, from version 2.0 onwards, since we no longer
promise backward compatibility, introducing a fallback option might be
unnecessary. What do you think?
BTW, this jira FLINK-33217[1] is caused by that Flink SQL does not handle
the nullable attribute of the Row type in the way Calcite expected.
However, fixing them will also cause a relatively large impact. We may also
need to check the code part in SQL.
Yes, this is another issue caused by the row type nullability handling.
I've mentioned this JIRA ticket in the reference link to the previous
reply.
Best,
Jane
On Mon, Dec 25, 2023 at 1:42 PM Xuyang <xyzhong...@163.com> wrote:
Hi, Jane, thanks for driving this.
IMO, it is important to keep same consistent semantics between table api
and sql, not only for maintenance, but also for user experience. But for
users, the impact of this modification is a bit large. Is it possible to
consider introducing a deprecated option to allow users to fall back to the
previous version (default fallback), and then officially deprecate it in
Flink 2.0?
BTW, this jira FLINK-33217[1] is caused by that Flink SQL does not handle
the nullable attribute of the Row type in the way Calcite expected.
However, fixing them will also cause a relatively large impact. We may also
need to check the code part in SQL.
[1] https://issues.apache.org/jira/browse/FLINK-33217
--
Best!
Xuyang
在 2023-12-25 10:16:28,"Shengkai Fang" <fskm...@gmail.com> 写道:
Thanks for Jane and Sergey's proposal!
+1 to correct the Table API behavior.
I have one question: Is the influence only limited to the RowType? Does
the
Map or Array type have the same problems?
Best,
Shengkai
[DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between SQL
and TableA
Jane Chan <qingyue....@gmail.com> 于2023年12月22日周五 17:40写道:
Dear devs,
Several issues [1][2][3] have been identified regarding the inconsistent
treatment of ROW type nullability between SQL and TableAPI. However,
addressing these discrepancies might necessitate updates to the public
API.
Therefore, I'm initiating this discussion to engage the community in
forging a unified approach to resolve these challenges.
To summarize, SQL prohibits ROW types such as ROW<a INT NOT NULL, b
STRING>, which is implicitly rewritten to ROW<a INT, b STRING> by
Calcite[4]. In contrast, TableAPI permits such types, resulting in
inconsistency.
[image: image.png]
For a comprehensive issue breakdown, please refer to the comment of [1].
According to CALCITE-2464[4], ROW<f0 INT NOT NULL> is not a valid type.
As
a result, the behavior of TableAPI is incorrect and needs to be
consistent
with SQL, which means the implantation for the following public API
needs
to be changed.
- RowType#copy(boolean nullable) should also set the inner fields to
null if nullable is true.
- RowType's constructor should also check nullability.
- FieldsDataType#nullable() should also set the inner fields to null.
In addition to the necessary changes in the implementation of the
aforementioned API, the semantics of the following API have also been
impacted.
- `DataTypes.ROW(DataTypes.FIELD("f0",
DataTypes.INT().notNull())).notNull()` cannot create a type like
`ROW<INT
NOT NULL>NOT NULL`.
- Idempotence for chained calls `notNull().nullable().notNull()` for
`Row` cannot be maintained.
Sergey and I have engaged in a discussion regarding the solution [1].
I'm
interested in gathering additional perspectives on the fix.
Look forward to your ideas!
Best,
Jane
[1] https://issues.apache.org/jira/browse/FLINK-31830
[2] https://issues.apache.org/jira/browse/FLINK-31829
[3] https://issues.apache.org/jira/browse/FLINK-33217
[4] https://issues.apache.org/jira/browse/CALCITE-2464