Re: [DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between SQL and TableAPI

Dawid Wysakowicz Tue, 09 Jan 2024 00:48:56 -0800

Hey all,
First of all, sorry I have not read the entire thread. I just wanted to
make sure you take this one case into consideration.


As far as I know, we map java classes to SQL ROWs? E.g. it is possible to
have a POJO as a parameter to a UDF.
*class MyUDF {*
*  eval(MyPojo a)*
*}*




*class MyPojo { int f0; Integer f1;}*
I remember the problem with how nullability of a ROW's fields is handled
was mostly because in calcite its only goal is to support IS NULL function.

If we try to map a ROW to a POJO with the proposed strategy, that all
fields are nullable if the outer ROW is nullable we cannot represent the
above POJO.

The POJO's f0 field should be NOT NULL which is telling it is represented
as int . With the new proposed logic, all fields of a POJO would need to be
boxed, because all would be NULLABLE

Hope you can still incorporate this example into your design.
Best,
Dawid

On Thu, 4 Jan 2024 at 03:15, Jane Chan <[email protected]> wrote:

> Hi Timo,
>
> Thanks for your valuable feedback.
>
> How about we work together on this topic and create a FLIP for this? We
> > need more examples in a unified document. Currently, the proposal is
> split
> > across multiple Flink and Calcite JIRA issues and a ML discussion.
>
>
> That sounds like a great idea. A FLIP would provide a more precise and
> better-organized document, and let's fix it together.
>
> Towards several points you mentioned, here are my cents
>
> RelDataType is similarly just a type declaration. Please correct me if
> > I'm wrong but RelDataType itself also allows `ROW<f0 INT NOT NULL> NOT
> > NULL`.
> >
>
> In the context of RelDataType, `ROW<f0 INT NOT NULL> NOT NULL` is a
> legitimate type specification. However, I presume that the type you intend
> to refer to is `ROW<f0 INT NOT NULL>`.
>
> Theoretically speaking, the answer is no and yes.
> **NO** It would not be able to create a type like `RecordType(INTEGER NOT
> NULL f0by using Calcite fluent API[1]. If the record nullability is true,
> then the inner field's nullability is set to true implicitly.
> **YES** It's conceptually viable to create a type like `RecordType(INTEGER
> NOT NULL f0)` by directly calling the constructor of RelRecordType.
> Nevertheless, the JavaDoc constructor summary[2] emphasizes that
> ctor#(StructKind, List<RelDataTypeField>, boolean) should only be called
> from a factory method.
>
> The following code snippet demonstrates the difference at the API level.
>
> @Test
> void testRelRecordType() {
>   // create an inner type INT NOT NULL
>   RelDataType innerFieldType =
>           new BasicSqlType(RelDataTypeSystem.DEFAULT, SqlTypeName.INTEGER);
>   RelDataTypeField f0 = new RelDataTypeFieldImpl("f0", 0, innerFieldType);
>
>   // Directly call RelRecordType ctor to specify the record level
> nullability
>
> // However, in practice, users are not recommended to do so
>   RelDataType r1 =
>           new RelRecordType(StructKind.FULLY_QUALIFIED,
> Collections.singletonList(f0), true);
>   // This will print r1: RecordType(INTEGER NOT NULL f0)
>   System.out.println("r1: " + r1.getFullTypeString());
>
>   // Use Calcite fluent API
>   RelDataTypeFactory calciteTypeFactory = new
> SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT);
>
>   RelDataType r2 =
>           new RelDataTypeFactory.Builder(calciteTypeFactory)
>                   .add(f0)
>                   .nullable(false) // field nullability will be overridden
> by record nullability
>                   .nullableRecord(true)
>                   .build();
>   // This will print r2: RecordType(INTEGER f0)
>   System.out.println("r2: " + r2.getFullTypeString());
>
>   // NOTE: replace type factory with flinkTypeFactory also get
> RecordType(INTEGER f0)
>   FlinkTypeFactory flinkTypeFactory =
>           ((PlannerBase) (((TableEnvironmentImpl)
> tableEnv).getPlanner())).getTypeFactory();
> }
>
>
> It's the factory or optimizer that performs necessary changes.
> > - It's up to the framework (i.e. planner or Table API) to decide what to
> > do with these declarations.
> >
>
> You're correct; theoretically, the responsibility for type declaration
> resides with the optimizer and framework. However, Flink allows users to
> create types like `ROW<f0 INT NOT NULL>` through the public API, like
> `DataTypes.ROW(DataTypes.FIELD("f0", DataTypes.INT.notNull())).nullable()`.
> In contrast, Calcite restricts such actions(suppose they follow the best
> practice and use fluent API). Perhaps we can take a reference from
> Calcite's RelDataTypeFactory.Builder to align the behavior of table API to
> SQL.
>
>
> > MyPojo can be nullable, but i cannot. This is the reason why we decided
> > to introduce the current behavior. Complex structs are usually generated
> > from Table API or from the catalog (e.g. when mapping to schema registry
> > or some other external system). It could lead to other downstream
> > inconsistencies if we change the method above.
> >
>
> Correct me if I'm mistaken, but if `MyPojo myPojo = null`, we cannot
> conclude that `myPojo.i` is not null because an NPE will be thrown. And we
> can only safely say `myPojo.i` should not be null only if `myPojo != null`
> is true. Similarly, in SQL, null or NULL is a special marker used to
> indicate that a data value does not exist in the database. If the record
> itself is absent(nullable), then declaring the inner field as not
> null(present) may not make much sense.
>
> [1]
>
> https://github.com/apache/calcite/blob/8d9b27f1ace7f975407920cb88806715b1f0ef82/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactory.java#L615
> [2]
>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/type/RelRecordType.html#%3Cinit%3E(org.apache.calcite.rel.type.StructKind,java.util.List,boolean)
>
> Best,
> Jane
>
>
> On Tue, Jan 2, 2024 at 6:26 PM Timo Walther <[email protected]> wrote:
>
> > Hi Jane,
> >
> > thanks for the heavy investigation and extensive summaries. I'm sorry
> > that I ignored this discussion for too long but would like to help in
> > shaping a sustainable long-term solution.
> >
> > I fear that changing:
> > - RowType#copy()
> > - RowType's constructor
> > - FieldsDataType#nullable()
> > will not solve all transitive issues.
> >
> > We should approach the problem from a different perspective. In my point
> > of view:
> > - DataType and LogicalType are just type declarations.
> > - RelDataType is similarly just a type declaration. Please correct me if
> > I'm wrong but RelDataType itself also allows `ROW<f0 INT NOT NULL> NOT
> > NULL`. It's the factory or optimizer that performs necessary changes.
> > - It's up to the framework (i.e. planner or Table API) to decide what to
> > do with these declarations.
> >
> > Let's take a Java class:
> >
> > class MyPojo {
> >    int i;
> > }
> >
> > MyPojo can be nullable, but i cannot. This is the reason why we decided
> > to introduce the current behavior. Complex structs are usually generated
> > from Table API or from the catalog (e.g. when mapping to schema registry
> > or some other external system). It could lead to other downstream
> > inconsistencies if we change the method above.
> >
> > I can't provide a better solution right now, I need more research on
> > this topic. But we should definitely avoid another breaking change
> > similar to [1] where the data type system was touched and other projects
> > were affected.
> >
> > How about we work together on this topic and create a FLIP for this? We
> > need more examples in a unified document. Currently, the proposal is
> > split across multiple Flink and Calcite JIRA issues and a ML discussion.
> >
> > Regards,
> > Timo
> >
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-33523
> >
> >
> > On 26.12.23 04:47, Jane Chan wrote:
> > > Thanks Shengkai and Xuyang.
> > >
> > > @Shengkai
> > >
> > > I have one question: Is the influence only limited to the RowType? Does
> > the
> > >> Map or Array type have the same problems?
> > >>
> > >
> > > I think the issue is exclusive to RowType. You may want to review
> > > CALCITE-2464[1] for more details.
> > >
> > > [1] https://issues.apache.org/jira/browse/CALCITE-2464
> > >
> > > @Xuyang
> > >
> > > Is it possible to consider introducing a deprecated option to allow
> users
> > >> to fall back to the previous version (default fallback), and then
> > >> officially deprecate it in Flink 2.0?
> > >>
> > >
> > > If I understand correctly, 2.0 allows breaking changes to remove
> > historical
> > > baggage in this release. Therefore, if we want to fix this issue before
> > > 2.0, we could introduce a fallback option in the two most recent
> versions
> > > (1.19 and 1.20). However, from version 2.0 onwards, since we no longer
> > > promise backward compatibility, introducing a fallback option might be
> > > unnecessary. What do you think?
> > >
> > > BTW, this jira FLINK-33217[1] is caused by that Flink SQL does not
> handle
> > >> the nullable attribute of the Row type in the way Calcite expected.
> > >> However, fixing them will also cause a relatively large impact. We may
> > also
> > >> need to check the code part in SQL.
> > >>
> > >
> > > Yes, this is another issue caused by the row type nullability handling.
> > > I've mentioned this JIRA ticket in the reference link to the previous
> > > reply.
> > >
> > > Best,
> > > Jane
> > >
> > > On Mon, Dec 25, 2023 at 1:42 PM Xuyang <[email protected]> wrote:
> > >
> > >> Hi, Jane, thanks for driving this.
> > >>
> > >>
> > >> IMO, it is important to keep same consistent semantics between table
> api
> > >> and sql, not only for maintenance, but also for user experience. But
> for
> > >> users, the impact of this modification is a bit large. Is it possible
> to
> > >> consider introducing a deprecated option to allow users to fall back
> to
> > the
> > >> previous version (default fallback), and then officially deprecate it
> in
> > >> Flink 2.0?
> > >>
> > >>
> > >> BTW, this jira FLINK-33217[1] is caused by that Flink SQL does not
> > handle
> > >> the nullable attribute of the Row type in the way Calcite expected.
> > >> However, fixing them will also cause a relatively large impact. We may
> > also
> > >> need to check the code part in SQL.
> > >>
> > >>
> > >> [1] https://issues.apache.org/jira/browse/FLINK-33217
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >>      Best！
> > >>      Xuyang
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> 在 2023-12-25 10:16:28，"Shengkai Fang" <[email protected]> 写道：
> > >>> Thanks for Jane and Sergey's proposal!
> > >>>
> > >>> +1 to correct the Table API behavior.
> > >>>
> > >>> I have one question: Is the influence only limited to the RowType?
> Does
> > >> the
> > >>> Map or Array type have the same problems?
> > >>>
> > >>> Best,
> > >>> Shengkai
> > >>> [DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between
> > SQL
> > >>> and TableA
> > >>>
> > >>> Jane Chan <[email protected]> 于2023年12月22日周五 17:40写道：
> > >>>
> > >>>> Dear devs,
> > >>>>
> > >>>> Several issues [1][2][3] have been identified regarding the
> > inconsistent
> > >>>> treatment of ROW type nullability between SQL and TableAPI. However,
> > >>>> addressing these discrepancies might necessitate updates to the
> public
> > >> API.
> > >>>> Therefore, I'm initiating this discussion to engage the community in
> > >>>> forging a unified approach to resolve these challenges.
> > >>>>
> > >>>> To summarize, SQL prohibits ROW types such as ROW<a INT NOT NULL, b
> > >>>> STRING>, which is implicitly rewritten to ROW<a INT, b STRING> by
> > >>>> Calcite[4]. In contrast, TableAPI permits such types, resulting in
> > >>>> inconsistency.
> > >>>> [image: image.png]
> > >>>> For a comprehensive issue breakdown, please refer to the comment of
> > [1].
> > >>>>
> > >>>> According to CALCITE-2464[4], ROW<f0 INT NOT NULL> is not a valid
> > type.
> > >> As
> > >>>> a result, the behavior of TableAPI is incorrect and needs to be
> > >> consistent
> > >>>> with SQL, which means the implantation for the following public API
> > >> needs
> > >>>> to be changed.
> > >>>>
> > >>>>     - RowType#copy(boolean nullable) should also set the inner
> fields
> > to
> > >>>>     null if nullable is true.
> > >>>>     - RowType's constructor should also check nullability.
> > >>>>     - FieldsDataType#nullable() should also set the inner fields to
> > null.
> > >>>>
> > >>>> In addition to the necessary changes in the implementation of the
> > >>>> aforementioned API, the semantics of the following API have also
> been
> > >>>> impacted.
> > >>>>
> > >>>>     - `DataTypes.ROW(DataTypes.FIELD("f0",
> > >>>>     DataTypes.INT().notNull())).notNull()` cannot create a type like
> > >> `ROW<INT
> > >>>>     NOT NULL>NOT NULL`.
> > >>>>     - Idempotence for chained calls `notNull().nullable().notNull()`
> > for
> > >>>>     `Row` cannot be maintained.
> > >>>>
> > >>>> Sergey and I have engaged in a discussion regarding the solution
> [1].
> > >> I'm
> > >>>> interested in gathering additional perspectives on the fix.
> > >>>>
> > >>>> Look forward to your ideas!
> > >>>>
> > >>>> Best,
> > >>>> Jane
> > >>>>
> > >>>>
> > >>>> [1] https://issues.apache.org/jira/browse/FLINK-31830
> > >>>> [2] https://issues.apache.org/jira/browse/FLINK-31829
> > >>>> [3] https://issues.apache.org/jira/browse/FLINK-33217
> > >>>> [4] https://issues.apache.org/jira/browse/CALCITE-2464
> > >>>>
> > >>
> > >
> >
> >
>

Re: [DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between SQL and TableAPI

Reply via email to