Having looked at the proposed set of methods to remove I've noticed that some of them are actually annotated with @Public. According to our stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs with this annotation. Hence, I believe that we cannot simply remove them unless the community decides to change the stability guarantees we give or by making the next release a major release (Flink 2.0).
Cheers, Till On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao...@aliyun.com> wrote: > +1 for removing the methods that are deprecated for a while & have > alternative methods. > > One specific thing is that if we remove the DataStream#split, do we > consider enabling side-output in more operators in the future ? Currently > it should be only available in ProcessFunctions, but not available to other > commonly used UDF like Source or AsyncFunction[1]. > > One temporary solution occurs to me is to add a ProcessFunction after the > operators want to use side-output. But I think the solution is not very > direct to come up with and if it really works we might add it to the > document of side-output. > > [1] https://issues.apache.org/jira/browse/FLINK-7954 > > Best, > Yun > > ------------------Original Mail ------------------ > *Sender:*Kostas Kloudas <kklou...@gmail.com> > *Send Date:*Tue Aug 18 03:52:44 2020 > *Recipients:*Dawid Wysakowicz <dwysakow...@apache.org> > *CC:*dev <d...@flink.apache.org>, user <user@flink.apache.org> > *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API > >> +1 for removing them. >> >> >> >> From a quick look, most of them (not all) have been deprecated a long >> time ago. >> >> >> >> Cheers, >> >> Kostas >> >> >> >> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote: >> >> > >> >> > @David Yes, my idea was to remove any use of fold method and all >> related classes including WindowedStream#fold >> >> > >> >> > @Klou Good idea to also remove the deprecated enableCheckpointing() & >> StreamExecutionEnvironment#readFile and alike. I did another pass over some >> of the classes and thought we could also drop: >> >> > >> >> > ExecutionConfig#set/getCodeAnalysisMode >> >> > ExecutionConfig#disable/enableSysoutLogging >> >> > ExecutionConfig#set/isFailTaskOnCheckpointError >> >> > ExecutionConfig#isLatencyTrackingEnabled >> >> > >> >> > As for the `forceCheckpointing` I am not fully convinced to doing it. >> As far as I know iterations still do not participate in checkpointing >> correctly. Therefore it still might make sense to force it. In other words >> there is no real alternative to that method. Unless we only remove the >> methods from StreamExecutionEnvironment and redirect to the setter in >> CheckpointConfig. WDYT? >> >> > >> >> > An updated list of methods I suggest to remove: >> >> > >> >> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) >> >> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >> >> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) >> >> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) >> >> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?) >> >> > >> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) >> (deprecated in 1.2) >> >> > RuntimeContext#getAllAccumulators (deprecated in 0.10) >> >> > DataStream#fold and all related classes and methods such as >> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in >> 1.3/1.4) >> >> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) >> (deprecated in 1.5) >> >> > DataStream#split (deprecated in 1.8) >> >> > Methods in (Connected)DataStream that specify keys as either indices or >> field names such as DataStream#keyBy, DataStream#partitionCustom, >> ConnectedStream#keyBy, .... (deprecated in 1.11) >> >> > >> >> > Bear in mind that majority of the options listed above in >> ExecutionConfig take no effect. They were left there purely to satisfy the >> binary compatibility. Personally I don't see any benefit of leaving a >> method and silently dropping the underlying feature. The only configuration >> that is respected is setting the number of execution retries. >> >> > >> >> > I also wanted to make it explicit that most of the changes above would >> result in a binary incompatibility and require additional exclusions in the >> japicmp. Those are: >> >> > >> >> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) >> >> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) >> >> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) >> >> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) >> >> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?) >> >> > DataStream#fold and all related classes and methods such as >> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in >> 1.3/1.4) >> >> > DataStream#split (deprecated in 1.8) >> >> > Methods in (Connected)DataStream that specify keys as either indices or >> field names such as DataStream#keyBy, DataStream#partitionCustom, >> ConnectedStream#keyBy, .... (deprecated in 1.11) >> >> > >> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) >> (deprecated in 1.2) >> >> > >> >> > Looking forward to more opinions on the issue. >> >> > >> >> > Best, >> >> > >> >> > Dawid >> >> > >> >> > >> >> > On 17/08/2020 12:49, Kostas Kloudas wrote: >> >> > >> >> > Thanks a lot for starting this Dawid, >> >> > >> >> > Big +1 for the proposed clean-up, and I would also add the deprecated >> >> > methods of the StreamExecutionEnvironment like: >> >> > >> >> > enableCheckpointing(long interval, CheckpointingMode mode, boolean >> force) >> >> > enableCheckpointing() >> >> > isForceCheckpointing() >> >> > >> >> > readFile(FileInputFormat inputFormat,String >> >> > filePath,FileProcessingMode watchType,long interval, FilePathFilter >> >> > filter) >> >> > readFileStream(...) >> >> > >> >> > socketTextStream(String hostname, int port, char delimiter, long >> maxRetry) >> >> > socketTextStream(String hostname, int port, char delimiter) >> >> > >> >> > There are more, like the (get)/setNumberOfExecutionRetries() that were >> >> > deprecated long ago, but I have not investigated to see if they are >> >> > actually easy to remove. >> >> > >> >> > Cheers, >> >> > Kostas >> >> > >> >> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz >> >> > wrote: >> >> > >> >> > Hi devs and users, >> >> > >> >> > I wanted to ask you what do you think about removing some of the >> deprecated APIs around the DataStream API. >> >> > >> >> > The APIs I have in mind are: >> >> > >> >> > RuntimeContext#getAllAccumulators (deprecated in 0.10) >> >> > DataStream#fold and all related classes and methods such as >> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in >> 1.3/1.4) >> >> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) >> (deprecated in 1.5) >> >> > DataStream#split (deprecated in 1.8) >> >> > Methods in (Connected)DataStream that specify keys as either indices or >> field names such as DataStream#keyBy, DataStream#partitionCustom, >> ConnectedStream#keyBy, .... (deprecated in 1.11) >> >> > >> >> > I think the first three should be straightforward. They are long >> deprecated. The getAccumulators method is not used very often in my >> opinion. The same applies to the DataStream#fold which additionally is not >> very performant. Lastly the setStateBackend has an alternative with a class >> from the AbstractStateBackend hierarchy, therefore it will be still code >> compatible. Moreover if we remove the >> #setStateBackend(AbstractStateBackend) we will get rid off warnings users >> have right now when setting a statebackend as the correct method cannot be >> used without an explicit casting. >> >> > >> >> > As for the DataStream#split I know there were some objections against >> removing the #split method in the past. I still believe the output tags can >> replace the split method already. >> >> > >> >> > The only problem in the last set of methods I propose to remove is that >> they were deprecated only in the last release and those method were only >> partially deprecated. Moreover some of the methods were not deprecated in >> ConnectedStreams. Nevertheless I'd still be inclined to remove the methods >> in this release. >> >> > >> >> > Let me know what do you think about it. >> >> > >> >> > Best, >> >> > >> >> > Dawid > >