Hi Timo,

thanks for addressing my points. I'm not set on using STRUCT et al. but
wanted to point out the alternatives.

Regarding the attached class name, I have similar confusion to Hao. I
wonder if Structures types shouldn't be anonymous by default in the sense
that initially we don't attach a class name to it. As you pointed out, it
has no real semantics in SQL and we can't validate it.
Another thing to consider is that if one user creates a table through some
means and another user wants to consume it, the second user may not have
access to the class as is. But the user could easily create a compatible
class on its own.

Consequently, I'm thinking about getting rid of the type at all. Only on
the edges, we can use conversion to the user types when users actually
access the ROW:
* Any table API access that wants to collect results (in your last example
what is t.execute().collect(); returning? How does that work in the
multi-user setup sketched above? Wouldn't it be easier that the consumer
explicitly gives us the POJO type that it expects?)
* Any DataStream conversion
* Any UDF

However, at that point, why do we actually need anything beyond ROW?

Best,

Arvid

On Wed, Apr 23, 2025 at 8:52 AM Timo Walther <twal...@apache.org> wrote:

> Hi Hao,
>
> 1. Can `StructuredType` be nested?
>
> Yes this is supported.
>
> 2. What's the main reason the class won't be enforced in SQL?
>
> SQL should not care about classes. Within the SQL ecosystem, the SQL
> engine controls the data serialization and protocols. The SQL engine
> will not load the class. Classes are a concept of a JVM or Python API
> endpoint. This also the reason why a SQL ARRAY<BIGINT> can be
> represented as List<Long>, long[], Long[]. The latter are only concepts
> in the target programming language and might look different in Python.
>
> Regard,
> Timo
>
>
> On 22.04.25 23:54, Hao Li wrote:
> > Hi Timo,
> >
> > Thanks for the FLIP. +1 with a few questions:
> >
> > 1. Can `StructuredType` be nested? e.g. `STRUCTURED<'com.example.User',
> > name STRING, age INT NOT NULL, address STRUCTURED<'com.example.address',
> > street STRING, zip STRING>>`
> >
> > 2. What's the main reason the class won't be enforced in SQL? Since
> tables
> > created in SQL can also be used in Table API, will it come as a surprise
> if
> > it's working in SQL and then failing in Table API? What if
> > `com.example.User` was not validated in SQL when creating table, then the
> > class was created for something else with different fields and then in
> > Table api, it's not compatible.
> >
> > Hao
> >
> > On Tue, Apr 22, 2025 at 9:39 AM Timo Walther <twal...@apache.org> wrote:
> >
> >> Hi Arvid, Hi Sergey,
> >>
> >> thanks for your feedback. I updated the FLIP accordingly but let me
> >> answer your questions
> >> here as well:
> >>
> >>   > Are we going to enforce that the name is a valid class name? What is
> >>   > happening if it's not a correct name?
> >>   > What are the implications of using a class that is not in the
> >>   > classpath in Table API? It looks to me that the name is
> metadata-only
> >>   > until we try to access the objects directly in Table/DataStream API.
> >>
> >> Names are not enforced or validated. They are pure metadata as mentioned
> >> in Section 2.1. We fallback to Row as the conversion class if the name
> >> cannot be resolved in the current classpath. So when staying in the SQL
> >> ecosystem (i.e. not switching to Table API, DataStream API, or UDFs),
> >> the class must not be present.
> >>
> >>   > Should Expressions.objectOf(String, Object... kv); also have an
> >>   > overload where you can put in the StructuredType in case where
> >>   > the class is not in the CP?
> >>
> >> That makes a lot of sense. I added a DataTypes.STRUCTURED(String,
> >> Field...) method and a Expressions.objectOf(String, Object...).
> >>
> >>   > What is the expected outcome of supplying fewer keys than defined
> >>   > in the structured type? Are we going to make use of nullability
> here?
> >>   > If so, *_INSERT and *_REMOVE may have some use.
> >>
> >> Currently, we go with the most conservative approach, which means that
> >> all keys need to be present. Maybe we can reserve this feature to future
> >> work and make the logic more lenient.
> >>
> >>   > Talking about nullability: Is there some option to make the declared
> >>   > fields NOT NULL? If so, could you amend one example to show that?
> >>   > (Grammar? implies that it's not possible)
> >>
> >> NOT NULL is supported similar to ROW<i INT NOT NULL>. I adjusted one of
> >> the examples.
> >>
> >>   > One bigger concern is around the naming. For me, OBJECT is used for
> >>   > semi-structured types that are open. Your FLIP implies a closed
> design
> >>   > and that you want to add an open OBJECT later. I asked ChatGPT about
> >>   > other DB implementations and it seems like STRUCT is used more often
> >>   > (see below). So, I'd propose to call it STRUCT<...>, STRUCT_OF, >
> >>   > structOf, UPDATE_STRUCT, and updateStruct respectively.
> >>
> >> Naming is hard. I was also torn between STRUCT, STRUCTURED, or OBJECT.
> >> In Flink, the ROW type is rather our STRUCT type, because it works fully
> >> position based. Structured types might be name-based in the future for
> >> better schema evolution, so they rather model an OBJECT type. This was
> >> my reason for choosing OBJECT_OF (typed to class name and fixed fields)
> >> vs. OBJECT (semi-structured without fixed fields). Snowflake also uses
> >> OBJECT(i INT) (for structured types) and OBJECT (for semi structured
> >> types).
> >>
> >> Also, both structured and semi-structured types can then share functions
> >> such as UPDATE_OBJECT().
> >>
> >> What do others think?
> >>
> >> Thanks,
> >> Timo
> >>
> >> On 22.04.25 12:08, Sergey Nuyanzin wrote:
> >>> Thanks for driving this Timo
> >>>
> >>> The FLIP seems reasonable to me
> >>>
> >>> I have one minor question/clarification
> >>> do I understand it correct that after this FLIP we can execute of
> >>> `typeof` against  result of `OBJECT_OF`
> >>> for instance
> >>> SELECT typeof(OBJECT_OF(
> >>>     'com.example.User',
> >>>     'name', 'Bob',
> >>>     'age', 42
> >>> ));
> >>>
> >>> should return `STRUCTURED<'com.example.User', name STRING, age INT>`
> >>> ?
> >>>
> >>> On Tue, Apr 22, 2025 at 10:57 AM Timo Walther <twal...@apache.org>
> >> wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> I would like to ask again for feedback on this FLIP. It is a rather
> >>>> small change but with big impact on usability for structured data.
> >>>>
> >>>> Are there any objections? Otherwise I would like to continue with
> voting
> >>>> soon.
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>> On 10.04.25 07:54, Timo Walther wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> I would like to start a discussion about FLIP-520: Simplify
> >>>>> StructuredType handling [1].
> >>>>>
> >>>>> Flink SQL already supports structured types in the engine,
> serializers,
> >>>>> UDFs, and connector interfaces. However, currently only Table API was
> >>>>> able to make use of them. While UDFs can take objects as input and
> >>>>> return types, it is actually quite inconvenient to use them in
> >>>>> transformations.
> >>>>>
> >>>>> This FLIP fixes some immediate blockers in the use of structured
> types.
> >>>>>
> >>>>> Looking forward to feedback.
> >>>>>
> >>>>> Cheers,
> >>>>> Timo
> >>>>>
> >>>>>
> >>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/
> >>>>> FLIP-520%3A+Simplify+StructuredType+handling
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
>
>

Reply via email to