Hi Timo, Thanks for your valuable feedback.
How about we work together on this topic and create a FLIP for this? We > need more examples in a unified document. Currently, the proposal is split > across multiple Flink and Calcite JIRA issues and a ML discussion. That sounds like a great idea. A FLIP would provide a more precise and better-organized document, and let's fix it together. Towards several points you mentioned, here are my cents RelDataType is similarly just a type declaration. Please correct me if > I'm wrong but RelDataType itself also allows `ROW<f0 INT NOT NULL> NOT > NULL`. > In the context of RelDataType, `ROW<f0 INT NOT NULL> NOT NULL` is a legitimate type specification. However, I presume that the type you intend to refer to is `ROW<f0 INT NOT NULL>`. Theoretically speaking, the answer is no and yes. **NO** It would not be able to create a type like `RecordType(INTEGER NOT NULL f0by using Calcite fluent API[1]. If the record nullability is true, then the inner field's nullability is set to true implicitly. **YES** It's conceptually viable to create a type like `RecordType(INTEGER NOT NULL f0)` by directly calling the constructor of RelRecordType. Nevertheless, the JavaDoc constructor summary[2] emphasizes that ctor#(StructKind, List<RelDataTypeField>, boolean) should only be called from a factory method. The following code snippet demonstrates the difference at the API level. @Test void testRelRecordType() { // create an inner type INT NOT NULL RelDataType innerFieldType = new BasicSqlType(RelDataTypeSystem.DEFAULT, SqlTypeName.INTEGER); RelDataTypeField f0 = new RelDataTypeFieldImpl("f0", 0, innerFieldType); // Directly call RelRecordType ctor to specify the record level nullability // However, in practice, users are not recommended to do so RelDataType r1 = new RelRecordType(StructKind.FULLY_QUALIFIED, Collections.singletonList(f0), true); // This will print r1: RecordType(INTEGER NOT NULL f0) System.out.println("r1: " + r1.getFullTypeString()); // Use Calcite fluent API RelDataTypeFactory calciteTypeFactory = new SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT); RelDataType r2 = new RelDataTypeFactory.Builder(calciteTypeFactory) .add(f0) .nullable(false) // field nullability will be overridden by record nullability .nullableRecord(true) .build(); // This will print r2: RecordType(INTEGER f0) System.out.println("r2: " + r2.getFullTypeString()); // NOTE: replace type factory with flinkTypeFactory also get RecordType(INTEGER f0) FlinkTypeFactory flinkTypeFactory = ((PlannerBase) (((TableEnvironmentImpl) tableEnv).getPlanner())).getTypeFactory(); } It's the factory or optimizer that performs necessary changes. > - It's up to the framework (i.e. planner or Table API) to decide what to > do with these declarations. > You're correct; theoretically, the responsibility for type declaration resides with the optimizer and framework. However, Flink allows users to create types like `ROW<f0 INT NOT NULL>` through the public API, like `DataTypes.ROW(DataTypes.FIELD("f0", DataTypes.INT.notNull())).nullable()`. In contrast, Calcite restricts such actions(suppose they follow the best practice and use fluent API). Perhaps we can take a reference from Calcite's RelDataTypeFactory.Builder to align the behavior of table API to SQL. > MyPojo can be nullable, but i cannot. This is the reason why we decided > to introduce the current behavior. Complex structs are usually generated > from Table API or from the catalog (e.g. when mapping to schema registry > or some other external system). It could lead to other downstream > inconsistencies if we change the method above. > Correct me if I'm mistaken, but if `MyPojo myPojo = null`, we cannot conclude that `myPojo.i` is not null because an NPE will be thrown. And we can only safely say `myPojo.i` should not be null only if `myPojo != null` is true. Similarly, in SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. If the record itself is absent(nullable), then declaring the inner field as not null(present) may not make much sense. [1] https://github.com/apache/calcite/blob/8d9b27f1ace7f975407920cb88806715b1f0ef82/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactory.java#L615 [2] https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/type/RelRecordType.html#%3Cinit%3E(org.apache.calcite.rel.type.StructKind,java.util.List,boolean) Best, Jane On Tue, Jan 2, 2024 at 6:26 PM Timo Walther <twal...@apache.org> wrote: > Hi Jane, > > thanks for the heavy investigation and extensive summaries. I'm sorry > that I ignored this discussion for too long but would like to help in > shaping a sustainable long-term solution. > > I fear that changing: > - RowType#copy() > - RowType's constructor > - FieldsDataType#nullable() > will not solve all transitive issues. > > We should approach the problem from a different perspective. In my point > of view: > - DataType and LogicalType are just type declarations. > - RelDataType is similarly just a type declaration. Please correct me if > I'm wrong but RelDataType itself also allows `ROW<f0 INT NOT NULL> NOT > NULL`. It's the factory or optimizer that performs necessary changes. > - It's up to the framework (i.e. planner or Table API) to decide what to > do with these declarations. > > Let's take a Java class: > > class MyPojo { > int i; > } > > MyPojo can be nullable, but i cannot. This is the reason why we decided > to introduce the current behavior. Complex structs are usually generated > from Table API or from the catalog (e.g. when mapping to schema registry > or some other external system). It could lead to other downstream > inconsistencies if we change the method above. > > I can't provide a better solution right now, I need more research on > this topic. But we should definitely avoid another breaking change > similar to [1] where the data type system was touched and other projects > were affected. > > How about we work together on this topic and create a FLIP for this? We > need more examples in a unified document. Currently, the proposal is > split across multiple Flink and Calcite JIRA issues and a ML discussion. > > Regards, > Timo > > > [1] https://issues.apache.org/jira/browse/FLINK-33523 > > > On 26.12.23 04:47, Jane Chan wrote: > > Thanks Shengkai and Xuyang. > > > > @Shengkai > > > > I have one question: Is the influence only limited to the RowType? Does > the > >> Map or Array type have the same problems? > >> > > > > I think the issue is exclusive to RowType. You may want to review > > CALCITE-2464[1] for more details. > > > > [1] https://issues.apache.org/jira/browse/CALCITE-2464 > > > > @Xuyang > > > > Is it possible to consider introducing a deprecated option to allow users > >> to fall back to the previous version (default fallback), and then > >> officially deprecate it in Flink 2.0? > >> > > > > If I understand correctly, 2.0 allows breaking changes to remove > historical > > baggage in this release. Therefore, if we want to fix this issue before > > 2.0, we could introduce a fallback option in the two most recent versions > > (1.19 and 1.20). However, from version 2.0 onwards, since we no longer > > promise backward compatibility, introducing a fallback option might be > > unnecessary. What do you think? > > > > BTW, this jira FLINK-33217[1] is caused by that Flink SQL does not handle > >> the nullable attribute of the Row type in the way Calcite expected. > >> However, fixing them will also cause a relatively large impact. We may > also > >> need to check the code part in SQL. > >> > > > > Yes, this is another issue caused by the row type nullability handling. > > I've mentioned this JIRA ticket in the reference link to the previous > > reply. > > > > Best, > > Jane > > > > On Mon, Dec 25, 2023 at 1:42 PM Xuyang <xyzhong...@163.com> wrote: > > > >> Hi, Jane, thanks for driving this. > >> > >> > >> IMO, it is important to keep same consistent semantics between table api > >> and sql, not only for maintenance, but also for user experience. But for > >> users, the impact of this modification is a bit large. Is it possible to > >> consider introducing a deprecated option to allow users to fall back to > the > >> previous version (default fallback), and then officially deprecate it in > >> Flink 2.0? > >> > >> > >> BTW, this jira FLINK-33217[1] is caused by that Flink SQL does not > handle > >> the nullable attribute of the Row type in the way Calcite expected. > >> However, fixing them will also cause a relatively large impact. We may > also > >> need to check the code part in SQL. > >> > >> > >> [1] https://issues.apache.org/jira/browse/FLINK-33217 > >> > >> > >> > >> > >> -- > >> > >> Best! > >> Xuyang > >> > >> > >> > >> > >> > >> 在 2023-12-25 10:16:28,"Shengkai Fang" <fskm...@gmail.com> 写道: > >>> Thanks for Jane and Sergey's proposal! > >>> > >>> +1 to correct the Table API behavior. > >>> > >>> I have one question: Is the influence only limited to the RowType? Does > >> the > >>> Map or Array type have the same problems? > >>> > >>> Best, > >>> Shengkai > >>> [DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between > SQL > >>> and TableA > >>> > >>> Jane Chan <qingyue....@gmail.com> 于2023年12月22日周五 17:40写道: > >>> > >>>> Dear devs, > >>>> > >>>> Several issues [1][2][3] have been identified regarding the > inconsistent > >>>> treatment of ROW type nullability between SQL and TableAPI. However, > >>>> addressing these discrepancies might necessitate updates to the public > >> API. > >>>> Therefore, I'm initiating this discussion to engage the community in > >>>> forging a unified approach to resolve these challenges. > >>>> > >>>> To summarize, SQL prohibits ROW types such as ROW<a INT NOT NULL, b > >>>> STRING>, which is implicitly rewritten to ROW<a INT, b STRING> by > >>>> Calcite[4]. In contrast, TableAPI permits such types, resulting in > >>>> inconsistency. > >>>> [image: image.png] > >>>> For a comprehensive issue breakdown, please refer to the comment of > [1]. > >>>> > >>>> According to CALCITE-2464[4], ROW<f0 INT NOT NULL> is not a valid > type. > >> As > >>>> a result, the behavior of TableAPI is incorrect and needs to be > >> consistent > >>>> with SQL, which means the implantation for the following public API > >> needs > >>>> to be changed. > >>>> > >>>> - RowType#copy(boolean nullable) should also set the inner fields > to > >>>> null if nullable is true. > >>>> - RowType's constructor should also check nullability. > >>>> - FieldsDataType#nullable() should also set the inner fields to > null. > >>>> > >>>> In addition to the necessary changes in the implementation of the > >>>> aforementioned API, the semantics of the following API have also been > >>>> impacted. > >>>> > >>>> - `DataTypes.ROW(DataTypes.FIELD("f0", > >>>> DataTypes.INT().notNull())).notNull()` cannot create a type like > >> `ROW<INT > >>>> NOT NULL>NOT NULL`. > >>>> - Idempotence for chained calls `notNull().nullable().notNull()` > for > >>>> `Row` cannot be maintained. > >>>> > >>>> Sergey and I have engaged in a discussion regarding the solution [1]. > >> I'm > >>>> interested in gathering additional perspectives on the fix. > >>>> > >>>> Look forward to your ideas! > >>>> > >>>> Best, > >>>> Jane > >>>> > >>>> > >>>> [1] https://issues.apache.org/jira/browse/FLINK-31830 > >>>> [2] https://issues.apache.org/jira/browse/FLINK-31829 > >>>> [3] https://issues.apache.org/jira/browse/FLINK-33217 > >>>> [4] https://issues.apache.org/jira/browse/CALCITE-2464 > >>>> > >> > > > >