I would argue that the guarantees of @Public methods that became ineffective were broken when they became ineffective (and were deprecated).
- ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) Removing these methods seems like the better of two evils to me as it shows users that they have been using no-ops for some time. On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote: > We have removed some public methods in the past, after a careful > deprecation period, if they were not well working any more. > > The sentiment I got from users is that careful cleanup is in fact > appreciated, otherwise things get confusing over time (the deprecated > methods cause noise in the API). > Still, we need to be very careful here. > > I would suggest to > - start with the non-public breaking methods > - remove fold() (very long deprecated) > - remove split() buggy > > Removing the env.socketStream() and env.fileStream() methods would > probably be good, too. They are very long deprecated and they don't work > well (with checkpoints) and the sources are the first thing a user needs to > understand when starting with Flink, so removing noise there is super > helpful. > > > On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dwysakow...@apache.org> > wrote: > >> Hey Till, >> >> You've got a good point here. Removing some of the methods would cause >> breaking the stability guarantees. I do understand if we decide not to >> remove them for that reason, let me explain though why I am thinking it >> might make sense to remove them already. First of all I am a bit afraid it >> might take a long time before we arrive at the 2.0 version. We have not >> ever discussed that in the community. At the same time a lot of the methods >> already don't work or are buggy, and we do not fix them any more. >> >> Methods which removing would not break the Public guarantees: >> >> - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) >> - RuntimeContext#getAllAccumulators (deprecated in 0.10) >> - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) >> - >> StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries >> (not the equivalent in the ExecutionConfig) >> - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) >> (deprecated in 1.5) >> >> Methods which removing would break the Public guarantees: >> >> which have no effect: >> >> - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >> - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) >> >> which are buggy or discouraged and thus we do not support fixing them: >> >> - DataStream#split (deprecated in 1.8) >> - DataStream#fold and all related classes and methods such as >> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in >> 1.3/1.4) >> >> The methods like: >> >> - >> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...), >> >> - methods in (Connected)DataStream that specify keys as either >> indices or field names >> - >> ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries >> >> should be working just fine and I feel the least eager to remove those. >> >> I'd suggest I will open PRs for removing the methods that will not cause >> breakage of the Public guarantees as the general feedback was rather >> positive. For the rest I do understand the resentment to do so and will not >> do it in the 1.x branch. Still I think it is valuable to have the >> discussion. >> >> Best, >> >> Dawid >> >> >> On 18/08/2020 09:26, Till Rohrmann wrote: >> >> Having looked at the proposed set of methods to remove I've noticed that >> some of them are actually annotated with @Public. According to our >> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs >> with this annotation. Hence, I believe that we cannot simply remove them >> unless the community decides to change the stability guarantees we give or >> by making the next release a major release (Flink 2.0). >> >> Cheers, >> Till >> >> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao...@aliyun.com> wrote: >> >>> +1 for removing the methods that are deprecated for a while & have >>> alternative methods. >>> >>> One specific thing is that if we remove the DataStream#split, do we >>> consider enabling side-output in more operators in the future ? Currently >>> it should be only available in ProcessFunctions, but not available to other >>> commonly used UDF like Source or AsyncFunction[1]. >>> >>> One temporary solution occurs to me is to add a ProcessFunction after >>> the operators want to use side-output. But I think the solution is not very >>> direct to come up with and if it really works we might add it to the >>> document of side-output. >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-7954 >>> >>> Best, >>> Yun >>> >>> ------------------Original Mail ------------------ >>> *Sender:*Kostas Kloudas <kklou...@gmail.com> >>> *Send Date:*Tue Aug 18 03:52:44 2020 >>> *Recipients:*Dawid Wysakowicz <dwysakow...@apache.org> >>> *CC:*dev <d...@flink.apache.org>, user <user@flink.apache.org> >>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API >>> >>>> +1 for removing them. >>>> >>>> >>>> >>>> From a quick look, most of them (not all) have been deprecated a long >>>> time ago. >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Kostas >>>> >>>> >>>> >>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote: >>>> >>>> > >>>> >>>> > @David Yes, my idea was to remove any use of fold method and all >>>> related classes including WindowedStream#fold >>>> >>>> > >>>> >>>> > @Klou Good idea to also remove the deprecated enableCheckpointing() & >>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some >>>> of the classes and thought we could also drop: >>>> >>>> > >>>> >>>> > ExecutionConfig#set/getCodeAnalysisMode >>>> >>>> > ExecutionConfig#disable/enableSysoutLogging >>>> >>>> > ExecutionConfig#set/isFailTaskOnCheckpointError >>>> >>>> > ExecutionConfig#isLatencyTrackingEnabled >>>> >>>> > >>>> >>>> > As for the `forceCheckpointing` I am not fully convinced to doing it. >>>> As far as I know iterations still do not participate in checkpointing >>>> correctly. Therefore it still might make sense to force it. In other words >>>> there is no real alternative to that method. Unless we only remove the >>>> methods from StreamExecutionEnvironment and redirect to the setter in >>>> CheckpointConfig. WDYT? >>>> >>>> > >>>> >>>> > An updated list of methods I suggest to remove: >>>> >>>> > >>>> >>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) >>>> >>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >>>> >>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) >>>> >>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) >>>> >>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in >>>> 1.1?) >>>> >>>> > >>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) >>>> (deprecated in 1.2) >>>> >>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10) >>>> >>>> > DataStream#fold and all related classes and methods such as >>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in >>>> 1.3/1.4) >>>> >>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) >>>> (deprecated in 1.5) >>>> >>>> > DataStream#split (deprecated in 1.8) >>>> >>>> > Methods in (Connected)DataStream that specify keys as either indices >>>> or field names such as DataStream#keyBy, DataStream#partitionCustom, >>>> ConnectedStream#keyBy, .... (deprecated in 1.11) >>>> >>>> > >>>> >>>> > Bear in mind that majority of the options listed above in >>>> ExecutionConfig take no effect. They were left there purely to satisfy the >>>> binary compatibility. Personally I don't see any benefit of leaving a >>>> method and silently dropping the underlying feature. The only configuration >>>> that is respected is setting the number of execution retries. >>>> >>>> > >>>> >>>> > I also wanted to make it explicit that most of the changes above >>>> would result in a binary incompatibility and require additional exclusions >>>> in the japicmp. Those are: >>>> >>>> > >>>> >>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) >>>> >>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >>>> >>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) >>>> >>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) >>>> >>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in >>>> 1.1?) >>>> >>>> > DataStream#fold and all related classes and methods such as >>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in >>>> 1.3/1.4) >>>> >>>> > DataStream#split (deprecated in 1.8) >>>> >>>> > Methods in (Connected)DataStream that specify keys as either indices >>>> or field names such as DataStream#keyBy, DataStream#partitionCustom, >>>> ConnectedStream#keyBy, .... (deprecated in 1.11) >>>> >>>> > >>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) >>>> (deprecated in 1.2) >>>> >>>> > >>>> >>>> > Looking forward to more opinions on the issue. >>>> >>>> > >>>> >>>> > Best, >>>> >>>> > >>>> >>>> > Dawid >>>> >>>> > >>>> >>>> > >>>> >>>> > On 17/08/2020 12:49, Kostas Kloudas wrote: >>>> >>>> > >>>> >>>> > Thanks a lot for starting this Dawid, >>>> >>>> > >>>> >>>> > Big +1 for the proposed clean-up, and I would also add the deprecated >>>> >>>> > methods of the StreamExecutionEnvironment like: >>>> >>>> > >>>> >>>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean >>>> force) >>>> >>>> > enableCheckpointing() >>>> >>>> > isForceCheckpointing() >>>> >>>> > >>>> >>>> > readFile(FileInputFormat inputFormat,String >>>> >>>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter >>>> >>>> > filter) >>>> >>>> > readFileStream(...) >>>> >>>> > >>>> >>>> > socketTextStream(String hostname, int port, char delimiter, long >>>> maxRetry) >>>> >>>> > socketTextStream(String hostname, int port, char delimiter) >>>> >>>> > >>>> >>>> > There are more, like the (get)/setNumberOfExecutionRetries() that were >>>> >>>> > deprecated long ago, but I have not investigated to see if they are >>>> >>>> > actually easy to remove. >>>> >>>> > >>>> >>>> > Cheers, >>>> >>>> > Kostas >>>> >>>> > >>>> >>>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz >>>> >>>> > wrote: >>>> >>>> > >>>> >>>> > Hi devs and users, >>>> >>>> > >>>> >>>> > I wanted to ask you what do you think about removing some of the >>>> deprecated APIs around the DataStream API. >>>> >>>> > >>>> >>>> > The APIs I have in mind are: >>>> >>>> > >>>> >>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10) >>>> >>>> > DataStream#fold and all related classes and methods such as >>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in >>>> 1.3/1.4) >>>> >>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) >>>> (deprecated in 1.5) >>>> >>>> > DataStream#split (deprecated in 1.8) >>>> >>>> > Methods in (Connected)DataStream that specify keys as either indices >>>> or field names such as DataStream#keyBy, DataStream#partitionCustom, >>>> ConnectedStream#keyBy, .... (deprecated in 1.11) >>>> >>>> > >>>> >>>> > I think the first three should be straightforward. They are long >>>> deprecated. The getAccumulators method is not used very often in my >>>> opinion. The same applies to the DataStream#fold which additionally is not >>>> very performant. Lastly the setStateBackend has an alternative with a class >>>> from the AbstractStateBackend hierarchy, therefore it will be still code >>>> compatible. Moreover if we remove the >>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users >>>> have right now when setting a statebackend as the correct method cannot be >>>> used without an explicit casting. >>>> >>>> > >>>> >>>> > As for the DataStream#split I know there were some objections against >>>> removing the #split method in the past. I still believe the output tags can >>>> replace the split method already. >>>> >>>> > >>>> >>>> > The only problem in the last set of methods I propose to remove is >>>> that they were deprecated only in the last release and those method were >>>> only partially deprecated. Moreover some of the methods were not deprecated >>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the >>>> methods in this release. >>>> >>>> > >>>> >>>> > Let me know what do you think about it. >>>> >>>> > >>>> >>>> > Best, >>>> >>>> > >>>> >>>> > Dawid >>> >>> -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk