+1 for removing the methods that are deprecated for a while & have alternative methods.
One specific thing is that if we remove the DataStream#split, do we consider enabling side-output in more operators in the future ? Currently it should be only available in ProcessFunctions, but not available to other commonly used UDF like Source or AsyncFunction[1]. One temporary solution occurs to me is to add a ProcessFunction after the operators want to use side-output. But I think the solution is not very direct to come up with and if it really works we might add it to the document of side-output. [1] https://issues.apache.org/jira/browse/FLINK-7954 Best, Yun ------------------Original Mail ------------------ Sender:Kostas Kloudas <kklou...@gmail.com> Send Date:Tue Aug 18 03:52:44 2020 Recipients:Dawid Wysakowicz <dwysakow...@apache.org> CC:dev <dev@flink.apache.org>, user <u...@flink.apache.org> Subject:Re: [DISCUSS] Removing deprecated methods from DataStream API +1 for removing them. From a quick look, most of them (not all) have been deprecated a long time ago. Cheers, Kostas On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote: > > @David Yes, my idea was to remove any use of fold method and all related > classes including WindowedStream#fold > > @Klou Good idea to also remove the deprecated enableCheckpointing() & > StreamExecutionEnvironment#readFile and alike. I did another pass over some > of the classes and thought we could also drop: > > ExecutionConfig#set/getCodeAnalysisMode > ExecutionConfig#disable/enableSysoutLogging > ExecutionConfig#set/isFailTaskOnCheckpointError > ExecutionConfig#isLatencyTrackingEnabled > > As for the `forceCheckpointing` I am not fully convinced to doing it. As far > as I know iterations still do not participate in checkpointing correctly. > Therefore it still might make sense to force it. In other words there is no > real alternative to that method. Unless we only remove the methods from > StreamExecutionEnvironment and redirect to the setter in CheckpointConfig. > WDYT? > > An updated list of methods I suggest to remove: > > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?) > StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) > (deprecated in 1.2) > RuntimeContext#getAllAccumulators (deprecated in 0.10) > DataStream#fold and all related classes and methods such as FoldFunction, > FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4) > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated > in 1.5) > DataStream#split (deprecated in 1.8) > Methods in (Connected)DataStream that specify keys as either indices or field > names such as DataStream#keyBy, DataStream#partitionCustom, > ConnectedStream#keyBy, .... (deprecated in 1.11) > > Bear in mind that majority of the options listed above in ExecutionConfig > take no effect. They were left there purely to satisfy the binary > compatibility. Personally I don't see any benefit of leaving a method and > silently dropping the underlying feature. The only configuration that is > respected is setting the number of execution retries. > > I also wanted to make it explicit that most of the changes above would result > in a binary incompatibility and require additional exclusions in the japicmp. > Those are: > > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?) > DataStream#fold and all related classes and methods such as FoldFunction, > FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4) > DataStream#split (deprecated in 1.8) > Methods in (Connected)DataStream that specify keys as either indices or field > names such as DataStream#keyBy, DataStream#partitionCustom, > ConnectedStream#keyBy, .... (deprecated in 1.11) > StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) > (deprecated in 1.2) > > Looking forward to more opinions on the issue. > > Best, > > Dawid > > > On 17/08/2020 12:49, Kostas Kloudas wrote: > > Thanks a lot for starting this Dawid, > > Big +1 for the proposed clean-up, and I would also add the deprecated > methods of the StreamExecutionEnvironment like: > > enableCheckpointing(long interval, CheckpointingMode mode, boolean force) > enableCheckpointing() > isForceCheckpointing() > > readFile(FileInputFormat inputFormat,String > filePath,FileProcessingMode watchType,long interval, FilePathFilter > filter) > readFileStream(...) > > socketTextStream(String hostname, int port, char delimiter, long maxRetry) > socketTextStream(String hostname, int port, char delimiter) > > There are more, like the (get)/setNumberOfExecutionRetries() that were > deprecated long ago, but I have not investigated to see if they are > actually easy to remove. > > Cheers, > Kostas > > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz > wrote: > > Hi devs and users, > > I wanted to ask you what do you think about removing some of the deprecated > APIs around the DataStream API. > > The APIs I have in mind are: > > RuntimeContext#getAllAccumulators (deprecated in 0.10) > DataStream#fold and all related classes and methods such as FoldFunction, > FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4) > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated > in 1.5) > DataStream#split (deprecated in 1.8) > Methods in (Connected)DataStream that specify keys as either indices or field > names such as DataStream#keyBy, DataStream#partitionCustom, > ConnectedStream#keyBy, .... (deprecated in 1.11) > > I think the first three should be straightforward. They are long deprecated. > The getAccumulators method is not used very often in my opinion. The same > applies to the DataStream#fold which additionally is not very performant. > Lastly the setStateBackend has an alternative with a class from the > AbstractStateBackend hierarchy, therefore it will be still code compatible. > Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get > rid off warnings users have right now when setting a statebackend as the > correct method cannot be used without an explicit casting. > > As for the DataStream#split I know there were some objections against > removing the #split method in the past. I still believe the output tags can > replace the split method already. > > The only problem in the last set of methods I propose to remove is that they > were deprecated only in the last release and those method were only partially > deprecated. Moreover some of the methods were not deprecated in > ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in > this release. > > Let me know what do you think about it. > > Best, > > Dawid