Hi Timo, Sorry for the late reply. I think it would be great if we can make `withNamesAndPositions` internal visible. This reduces the complexity of the public API. It's hard to come up with a perfect solution. So let's move on this FLIP. I don't have other concerns.
Best, Jark On Fri, 18 Sep 2020 at 22:14, Timo Walther <twal...@apache.org> wrote: > Hi Jark, > > the fieldNames map is not intended for users. I would also be fine to > make it a default scope constructor and access it with some internal > utility class next to the Row class. The fieldNames map must only be > used by serializers and converters. A user has no benefit in using it. > > For the creation of new rows (without reusing, which only advanced users > usually do), I don't see a benefit of having: > > final Row reuse = new Row(Arrays.asList("myField", "myOtherField")) > reuse.setField("myField", 12); > reuse.setField("myOtherField", "This is a test"); > > The purpose of Row.withName() is too create a Row easily and readable > without declaring 50+ column names or dealing with indices in this range. > > Personally, I would like to make Row an interface and have concrete row > implementations for different purposes but this would break existing > programs too much. > > What do you think? > > Regards, > Timo > > > On 18.09.20 11:04, Jark Wu wrote: > > Personally I think the fieldNames Map is confusing and not handy. > > I just have an idea but not sure what you think. > > What about adding a new constructor with List field names, this enables > all > > name-based setter/getters. > > Regarding to List -> Map cost for every record, we can suggest users to > > reuse the Row in the task. > > > > new Row(int arity) > > new Row(List<String> fieldNames) > > > > final Row reuse = new Row(Arrays.asList("myField", "myOtherField")) > > reuse.setField("myField", 12); > > reuse.setField("myOtherField", "This is a test"); > > > > My point is that, if we can have a handy constructor for named Row, we > may > > not need to distinguish the named-only or positionAndNamed mode. > > This can avoid (fast-fail) the potential problem when setting an invalid > > field. > > > > We can also come up with a new class for the field names which will > > construct the Map and be shared among all Row instances. > > > > What do you think? > > > > Best, > > Jark > > > > On Thu, 17 Sep 2020 at 16:48, Timo Walther <twal...@apache.org> wrote: > > > >> Hi everyone, > >> > >> thanks for all the feedback. I updated the FLIP again on Thursday to > >> integrate the feedback I got from Jingsong and Jark offline. In > >> particular I updated the `Improve dealing with Row in DataStream API` > >> section another time. We introduced static methods for Row that should > >> make the semantics clear to users: > >> > >> // allows to use index-based setters and getters (equivalent to new > >> Row(int)) > >> // method exists for completeness > >> public static withPositions(int length); > >> > >> // allows to use name-based setters and getters > >> public static withNames(); > >> > >> // allows to use both name-based and position-based setters and getters > >> public static withNamesAndPositions(Map<String, Integer> fieldNames); > >> > >> In any case, non of the existing methods will be deprecated and only > >> additional functionality will be available through the methods above. > >> > >> I started a voting thread on Friday. Please feel free to vote. > >> > >> Regards, > >> Timo > >> > >> On 10.09.20 10:21, Danny Chan wrote: > >>> Thanks for driving this Timo, +1 for voting ~ > >>> > >>> Best, > >>> Danny Chan > >>> 在 2020年9月10日 +0800 PM3:54,Timo Walther <twal...@apache.org>,写道: > >>>> Thanks everyone for this healthy discussion. I updated the FLIP with > the > >>>> outcome. I think the result is one of the last core API refactoring > and > >>>> users will be happy to have a consistent changelog support. Thanks for > >>>> all the contributions. > >>>> > >>>> If there are no objections, I would continue with a voting. > >>>> > >>>> What do you think? > >>>> > >>>> Regards, > >>>> Timo > >>>> > >>>> > >>>> On 09.09.20 14:31, Danny Chan wrote: > >>>>> Thanks, i'm fine with that. > >>>>> > >>>>> > >>>>> Timo Walther <twal...@apache.org> 于2020年9月9日周三 下午7:02写道: > >>>>> > >>>>>> I agree with Jark. It reduces confusion. > >>>>>> > >>>>>> The DataStream API doesn't know changelog processing at all. A > >>>>>> DataStream of Row can be used with both `fromDataStream` and > >>>>>> `fromChangelogStream`. But only the latter API will interpret it as > a > >>>>>> changelog something. > >>>>>> > >>>>>> And as I mentioned before, the `toChangelogStream` must work with > Row > >>>>>> otherwise users are confused due to duplicate records with a missing > >>>>>> changeflag. > >>>>>> > >>>>>> I will update the FLIP-136 a last time. I hope we can then continue > >> to a > >>>>>> vote. > >>>>>> > >>>>>> Regards, > >>>>>> Timo > >>>>>> > >>>>>> > >>>>>> On 09.09.20 10:50, Danny Chan wrote: > >>>>>>> I think it would bring in much confusion by a different API name > just > >>>>>> because the DataStream generic type is different. > >>>>>>> If there are ChangelogMode that only works for Row, can we have a > >> type > >>>>>> check there ? > >>>>>>> > >>>>>>> Switch to a new API name does not really solve the problem well, > >> people > >>>>>> still need to declare the ChangelogMode explicitly, and there are > some > >>>>>> confusions: > >>>>>>> > >>>>>>> • Should DataStream of Row type always use #fromChangelogStream ? > >>>>>>> • Does fromChangelogStream works for only INSERT ChangelogMode ? > >>>>>>> > >>>>>>> > >>>>>>> Best, > >>>>>>> Danny Chan > >>>>>>> 在 2020年9月9日 +0800 PM4:21,Timo Walther <twal...@apache.org>,写道: > >>>>>>>> I had this in the inital design, but Jark had concerns at least > for > >> the > >>>>>>>> `toChangelogStream(ChangelogMode)` (see earlier discussion). > >>>>>>>> > >>>>>>>> `fromDataStream(dataStream, schema, changelogMode)` would be > >> possible. > >>>>>>>> > >>>>>>>> But in this case I would vote for a symmetric API. If we keep > >>>>>>>> toChangelogStream we should also have a fromChangelogStream. > >>>>>>>> > >>>>>>>> And if we unify `toChangelogStream` and `toDataStream`, > retractions > >>>>>>>> cannot be represented for non-Rows and users will experience > >> duplicate > >>>>>>>> records with a missing changeflag. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Timo > >>>>>>>> > >>>>>>>> > >>>>>>>> On 09.09.20 09:31, Danny Chan wrote: > >>>>>>>>> “But I think the planner needs to > >>>>>>>>> know whether the input is insert-only or not.” > >>>>>>>>> > >>>>>>>>> Does fromDataStream(dataStream, schema, changelogMode) > >>>>>>>>> > >>>>>>>>> solve your concerns ? People can pass around whatever > ChangelogMode > >>>>>> they like as an optional param. > >>>>>>>>> By default: fromDataStream(dataStream, schema), the ChangelogMode > >> is > >>>>>> INSERT. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Danny Chan > >>>>>>>>> 在 2020年9月9日 +0800 PM2:53,dev@flink.apache.org,写道: > >>>>>>>>>> > >>>>>>>>>> But I think the planner needs to > >>>>>>>>>> know whether the input is insert-only or not. > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > > > >