Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

Timo Walther Thu, 17 Sep 2020 01:48:43 -0700

Hi everyone,

thanks for all the feedback. I updated the FLIP again on Thursday tointegrate the feedback I got from Jingsong and Jark offline. Inparticular I updated the `Improve dealing with Row in DataStream API`section another time. We introduced static methods for Row that shouldmake the semantics clear to users:

// allows to use index-based setters and getters (equivalent to newRow(int))

// method exists for completeness
public static withPositions(int length);

// allows to use name-based setters and getters
public static withNames();

// allows to use both name-based and position-based setters and getters
public static withNamesAndPositions(Map<String, Integer> fieldNames);

In any case, non of the existing methods will be deprecated and onlyadditional functionality will be available through the methods above.


I started a voting thread on Friday. Please feel free to vote.

Regards,
Timo

On 10.09.20 10:21, Danny Chan wrote:

Thanks for driving this Timo, +1 for voting ~

Best,
Danny Chan
在 2020年9月10日 +0800 PM3:54，Timo Walther <[email protected]>，写道：

Thanks everyone for this healthy discussion. I updated the FLIP with the
outcome. I think the result is one of the last core API refactoring and
users will be happy to have a consistent changelog support. Thanks for
all the contributions.

If there are no objections, I would continue with a voting.

What do you think?

Regards,
Timo


On 09.09.20 14:31, Danny Chan wrote:

Thanks, i'm fine with that.


Timo Walther <[email protected]> 于2020年9月9日周三 下午7:02写道：

I agree with Jark. It reduces confusion.

The DataStream API doesn't know changelog processing at all. A
DataStream of Row can be used with both `fromDataStream` and
`fromChangelogStream`. But only the latter API will interpret it as a
changelog something.

And as I mentioned before, the `toChangelogStream` must work with Row
otherwise users are confused due to duplicate records with a missing
changeflag.

I will update the FLIP-136 a last time. I hope we can then continue to a
vote.

Regards,
Timo


On 09.09.20 10:50, Danny Chan wrote:

I think it would bring in much confusion by a different API name just

because the DataStream generic type is different.

If there are ChangelogMode that only works for Row, can we have a type

check there ?


Switch to a new API name does not really solve the problem well, people

still need to declare the ChangelogMode explicitly, and there are some
confusions:


• Should DataStream of Row type always use #fromChangelogStream ?
• Does fromChangelogStream works for only INSERT ChangelogMode ?


Best,
Danny Chan
在 2020年9月9日 +0800 PM4:21，Timo Walther <[email protected]>，写道：

I had this in the inital design, but Jark had concerns at least for the
`toChangelogStream(ChangelogMode)` (see earlier discussion).

`fromDataStream(dataStream, schema, changelogMode)` would be possible.

But in this case I would vote for a symmetric API. If we keep
toChangelogStream we should also have a fromChangelogStream.

And if we unify `toChangelogStream` and `toDataStream`, retractions
cannot be represented for non-Rows and users will experience duplicate
records with a missing changeflag.

Regards,
Timo


On 09.09.20 09:31, Danny Chan wrote:

“But I think the planner needs to
know whether the input is insert-only or not.”

Does fromDataStream(dataStream, schema, changelogMode)

solve your concerns ? People can pass around whatever ChangelogMode

they like as an optional param.

By default: fromDataStream(dataStream, schema), the ChangelogMode is

INSERT.


Best,
Danny Chan
在 2020年9月9日 +0800 PM2:53，[email protected]，写道：


But I think the planner needs to
know whether the input is insert-only or not.

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

Reply via email to