@Aljoscha I have not thought about DataStream#project. I see your argument and I do agree with you it makes sense to eventually remove them. They are not deprecated yet though. Moreover I think we could include deprecating it as part of the efforts of FLIP-131/FLIP-134. I will add the argument there.
@Konstantin I do share your sentiment that the contract was broken when they became ineffective. That's why I mentioned the fact they are ineffective ;) Let me give a brief summary how I understand the discussion so far. I will start the effort of removing the non-breaking APIs: * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5) (ALREADY REMOVED) * RuntimeContext#getAllAccumulators (deprecated in 0.10) (ALREADY REMOVED) * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) * StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries (not the equivalent in the ExecutionConfig * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) (ineffective, thus already broke the guarantees) * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) (ineffective, thus already broke the guarantees) * DataStream#writeAsCsv,writeAsText (deprecated in 1.10) I will start a Vote thread for removing: * DataStream#split * XxxDataStream#fold as there was quite a bit of positive feedback for doing so, but there were also concerns. For now (probably until 2.0), I'll leave the methods: * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) * methods in XxxDataStream that use field names and indices for addressing projections/groupings * ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries BTW you can track the effort in FLINK-19033[1] Best, Dawid [1] https://issues.apache.org/jira/browse/FLINK-19033 On 27/08/2020 10:45, Aljoscha Krettek wrote: > Did you consider DataStream.project() yet? In general I think we > should remove most of the relational-ish methods from DataStream. More > candidates in this set of methods would be the tuple index/expression > methods for aggregations like min/max etc... > > Aljoscha > > On 25.08.20 20:52, Konstantin Knauf wrote: >> I would argue that the guarantees of @Public methods that became >> ineffective were broken when they became ineffective (and were >> deprecated). >> >> - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >> - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in >> 1.9) >> >> Removing these methods seems like the better of two evils to me as it >> shows >> users that they have been using no-ops for some time. >> >> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote: >> >>> We have removed some public methods in the past, after a careful >>> deprecation period, if they were not well working any more. >>> >>> The sentiment I got from users is that careful cleanup is in fact >>> appreciated, otherwise things get confusing over time (the deprecated >>> methods cause noise in the API). >>> Still, we need to be very careful here. >>> >>> I would suggest to >>> - start with the non-public breaking methods >>> - remove fold() (very long deprecated) >>> - remove split() buggy >>> >>> Removing the env.socketStream() and env.fileStream() methods would >>> probably be good, too. They are very long deprecated and they don't >>> work >>> well (with checkpoints) and the sources are the first thing a user >>> needs to >>> understand when starting with Flink, so removing noise there is super >>> helpful. >>> >>> >>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz >>> <dwysakow...@apache.org> >>> wrote: >>> >>>> Hey Till, >>>> >>>> You've got a good point here. Removing some of the methods would cause >>>> breaking the stability guarantees. I do understand if we decide not to >>>> remove them for that reason, let me explain though why I am >>>> thinking it >>>> might make sense to remove them already. First of all I am a bit >>>> afraid it >>>> might take a long time before we arrive at the 2.0 version. We have >>>> not >>>> ever discussed that in the community. At the same time a lot of the >>>> methods >>>> already don't work or are buggy, and we do not fix them any more. >>>> >>>> Methods which removing would not break the Public guarantees: >>>> >>>> - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) >>>> - RuntimeContext#getAllAccumulators (deprecated in 0.10) >>>> - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) >>>> - >>>> StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries >>>> (not the equivalent in the ExecutionConfig) >>>> - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) >>>> (deprecated in 1.5) >>>> >>>> Methods which removing would break the Public guarantees: >>>> >>>> which have no effect: >>>> >>>> - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >>>> - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated >>>> in 1.9) >>>> >>>> which are buggy or discouraged and thus we do not support fixing them: >>>> >>>> - DataStream#split (deprecated in 1.8) >>>> - DataStream#fold and all related classes and methods such as >>>> FoldFunction, FoldingState, FoldingStateDescriptor ... >>>> (deprecated in >>>> 1.3/1.4) >>>> >>>> The methods like: >>>> >>>> - >>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...), >>>> >>>> - methods in (Connected)DataStream that specify keys as either >>>> indices or field names >>>> - >>>> >>>> ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries >>>> >>>> >>>> should be working just fine and I feel the least eager to remove >>>> those. >>>> >>>> I'd suggest I will open PRs for removing the methods that will not >>>> cause >>>> breakage of the Public guarantees as the general feedback was rather >>>> positive. For the rest I do understand the resentment to do so and >>>> will not >>>> do it in the 1.x branch. Still I think it is valuable to have the >>>> discussion. >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> >>>> On 18/08/2020 09:26, Till Rohrmann wrote: >>>> >>>> Having looked at the proposed set of methods to remove I've noticed >>>> that >>>> some of them are actually annotated with @Public. According to our >>>> stability guarantees, only major releases (1.0, 2.0, etc.) can >>>> break APIs >>>> with this annotation. Hence, I believe that we cannot simply remove >>>> them >>>> unless the community decides to change the stability guarantees we >>>> give or >>>> by making the next release a major release (Flink 2.0). >>>> >>>> Cheers, >>>> Till >>>> >>>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao...@aliyun.com> wrote: >>>> >>>>> +1 for removing the methods that are deprecated for a while & have >>>>> alternative methods. >>>>> >>>>> One specific thing is that if we remove the DataStream#split, do we >>>>> consider enabling side-output in more operators in the future ? >>>>> Currently >>>>> it should be only available in ProcessFunctions, but not available >>>>> to other >>>>> commonly used UDF like Source or AsyncFunction[1]. >>>>> >>>>> One temporary solution occurs to me is to add a ProcessFunction after >>>>> the operators want to use side-output. But I think the solution is >>>>> not very >>>>> direct to come up with and if it really works we might add it to the >>>>> document of side-output. >>>>> >>>>> [1] https://issues.apache.org/jira/browse/FLINK-7954 >>>>> >>>>> Best, >>>>> Yun >>>>> >>>>> ------------------Original Mail ------------------ >>>>> *Sender:*Kostas Kloudas <kklou...@gmail.com> >>>>> *Send Date:*Tue Aug 18 03:52:44 2020 >>>>> *Recipients:*Dawid Wysakowicz <dwysakow...@apache.org> >>>>> *CC:*dev <dev@flink.apache.org>, user <u...@flink.apache.org> >>>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from >>>>> DataStream API >>>>> >>>>>> +1 for removing them. >>>>>> >>>>>> >>>>>> >>>>>> From a quick look, most of them (not all) have been deprecated a >>>>>> long >>>>>> time ago. >>>>>> >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Kostas >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote: >>>>>> >>>>>>> >>>>>> >>>>>>> @David Yes, my idea was to remove any use of fold method and all >>>>>> related classes including WindowedStream#fold >>>>>> >>>>>>> >>>>>> >>>>>>> @Klou Good idea to also remove the deprecated >>>>>>> enableCheckpointing() & >>>>>> StreamExecutionEnvironment#readFile and alike. I did another pass >>>>>> over some >>>>>> of the classes and thought we could also drop: >>>>>> >>>>>>> >>>>>> >>>>>>> ExecutionConfig#set/getCodeAnalysisMode >>>>>> >>>>>>> ExecutionConfig#disable/enableSysoutLogging >>>>>> >>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError >>>>>> >>>>>>> ExecutionConfig#isLatencyTrackingEnabled >>>>>> >>>>>>> >>>>>> >>>>>>> As for the `forceCheckpointing` I am not fully convinced to >>>>>>> doing it. >>>>>> As far as I know iterations still do not participate in >>>>>> checkpointing >>>>>> correctly. Therefore it still might make sense to force it. In >>>>>> other words >>>>>> there is no real alternative to that method. Unless we only >>>>>> remove the >>>>>> methods from StreamExecutionEnvironment and redirect to the >>>>>> setter in >>>>>> CheckpointConfig. WDYT? >>>>>> >>>>>>> >>>>>> >>>>>>> An updated list of methods I suggest to remove: >>>>>> >>>>>>> >>>>>> >>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) >>>>>> >>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >>>>>> >>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) >>>>>> >>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) >>>>>> >>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in >>>>>> 1.1?) >>>>>> >>>>>>> >>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) >>>>>> >>>>>> (deprecated in 1.2) >>>>>> >>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10) >>>>>> >>>>>>> DataStream#fold and all related classes and methods such as >>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... >>>>>> (deprecated in >>>>>> 1.3/1.4) >>>>>> >>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) >>>>>> (deprecated in 1.5) >>>>>> >>>>>>> DataStream#split (deprecated in 1.8) >>>>>> >>>>>>> Methods in (Connected)DataStream that specify keys as either >>>>>>> indices >>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom, >>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11) >>>>>> >>>>>>> >>>>>> >>>>>>> Bear in mind that majority of the options listed above in >>>>>> ExecutionConfig take no effect. They were left there purely to >>>>>> satisfy the >>>>>> binary compatibility. Personally I don't see any benefit of >>>>>> leaving a >>>>>> method and silently dropping the underlying feature. The only >>>>>> configuration >>>>>> that is respected is setting the number of execution retries. >>>>>> >>>>>>> >>>>>> >>>>>>> I also wanted to make it explicit that most of the changes above >>>>>> would result in a binary incompatibility and require additional >>>>>> exclusions >>>>>> in the japicmp. Those are: >>>>>> >>>>>>> >>>>>> >>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) >>>>>> >>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >>>>>> >>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) >>>>>> >>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) >>>>>> >>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in >>>>>> 1.1?) >>>>>> >>>>>>> DataStream#fold and all related classes and methods such as >>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... >>>>>> (deprecated in >>>>>> 1.3/1.4) >>>>>> >>>>>>> DataStream#split (deprecated in 1.8) >>>>>> >>>>>>> Methods in (Connected)DataStream that specify keys as either >>>>>>> indices >>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom, >>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11) >>>>>> >>>>>>> >>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) >>>>>> >>>>>> (deprecated in 1.2) >>>>>> >>>>>>> >>>>>> >>>>>>> Looking forward to more opinions on the issue. >>>>>> >>>>>>> >>>>>> >>>>>>> Best, >>>>>> >>>>>>> >>>>>> >>>>>>> Dawid >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> On 17/08/2020 12:49, Kostas Kloudas wrote: >>>>>> >>>>>>> >>>>>> >>>>>>> Thanks a lot for starting this Dawid, >>>>>> >>>>>>> >>>>>> >>>>>>> Big +1 for the proposed clean-up, and I would also add the >>>>>>> deprecated >>>>>> >>>>>>> methods of the StreamExecutionEnvironment like: >>>>>> >>>>>>> >>>>>> >>>>>>> enableCheckpointing(long interval, CheckpointingMode mode, boolean >>>>>> force) >>>>>> >>>>>>> enableCheckpointing() >>>>>> >>>>>>> isForceCheckpointing() >>>>>> >>>>>>> >>>>>> >>>>>>> readFile(FileInputFormat inputFormat,String >>>>>> >>>>>>> filePath,FileProcessingMode watchType,long interval, FilePathFilter >>>>>> >>>>>>> filter) >>>>>> >>>>>>> readFileStream(...) >>>>>> >>>>>>> >>>>>> >>>>>>> socketTextStream(String hostname, int port, char delimiter, long >>>>>> maxRetry) >>>>>> >>>>>>> socketTextStream(String hostname, int port, char delimiter) >>>>>> >>>>>>> >>>>>> >>>>>>> There are more, like the (get)/setNumberOfExecutionRetries() >>>>>>> that were >>>>>> >>>>>>> deprecated long ago, but I have not investigated to see if they are >>>>>> >>>>>>> actually easy to remove. >>>>>> >>>>>>> >>>>>> >>>>>>> Cheers, >>>>>> >>>>>>> Kostas >>>>>> >>>>>>> >>>>>> >>>>>>> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz >>>>>> >>>>>>> wrote: >>>>>> >>>>>>> >>>>>> >>>>>>> Hi devs and users, >>>>>> >>>>>>> >>>>>> >>>>>>> I wanted to ask you what do you think about removing some of the >>>>>> deprecated APIs around the DataStream API. >>>>>> >>>>>>> >>>>>> >>>>>>> The APIs I have in mind are: >>>>>> >>>>>>> >>>>>> >>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10) >>>>>> >>>>>>> DataStream#fold and all related classes and methods such as >>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... >>>>>> (deprecated in >>>>>> 1.3/1.4) >>>>>> >>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) >>>>>> (deprecated in 1.5) >>>>>> >>>>>>> DataStream#split (deprecated in 1.8) >>>>>> >>>>>>> Methods in (Connected)DataStream that specify keys as either >>>>>>> indices >>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom, >>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11) >>>>>> >>>>>>> >>>>>> >>>>>>> I think the first three should be straightforward. They are long >>>>>> deprecated. The getAccumulators method is not used very often in my >>>>>> opinion. The same applies to the DataStream#fold which >>>>>> additionally is not >>>>>> very performant. Lastly the setStateBackend has an alternative >>>>>> with a class >>>>>> from the AbstractStateBackend hierarchy, therefore it will be >>>>>> still code >>>>>> compatible. Moreover if we remove the >>>>>> #setStateBackend(AbstractStateBackend) we will get rid off >>>>>> warnings users >>>>>> have right now when setting a statebackend as the correct method >>>>>> cannot be >>>>>> used without an explicit casting. >>>>>> >>>>>>> >>>>>> >>>>>>> As for the DataStream#split I know there were some objections >>>>>>> against >>>>>> removing the #split method in the past. I still believe the >>>>>> output tags can >>>>>> replace the split method already. >>>>>> >>>>>>> >>>>>> >>>>>>> The only problem in the last set of methods I propose to remove is >>>>>> that they were deprecated only in the last release and those >>>>>> method were >>>>>> only partially deprecated. Moreover some of the methods were not >>>>>> deprecated >>>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove >>>>>> the >>>>>> methods in this release. >>>>>> >>>>>>> >>>>>> >>>>>>> Let me know what do you think about it. >>>>>> >>>>>>> >>>>>> >>>>>>> Best, >>>>>> >>>>>>> >>>>>> >>>>>>> Dawid >>>>> >>>>> >> >
signature.asc
Description: OpenPGP digital signature