Hey Till, You've got a good point here. Removing some of the methods would cause breaking the stability guarantees. I do understand if we decide not to remove them for that reason, let me explain though why I am thinking it might make sense to remove them already. First of all I am a bit afraid it might take a long time before we arrive at the 2.0 version. We have not ever discussed that in the community. At the same time a lot of the methods already don't work or are buggy, and we do not fix them any more.
Methods which removing would not break the Public guarantees: * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) * RuntimeContext#getAllAccumulators (deprecated in 0.10) * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) * StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries (not the equivalent in the ExecutionConfig) * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5) Methods which removing would break the Public guarantees: which have no effect: * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10) * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) which are buggy or discouraged and thus we do not support fixing them: * DataStream#split (deprecated in 1.8) * DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4) The methods like: * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...), * methods in (Connected)DataStream that specify keys as either indices or field names * ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries should be working just fine and I feel the least eager to remove those. I'd suggest I will open PRs for removing the methods that will not cause breakage of the Public guarantees as the general feedback was rather positive. For the rest I do understand the resentment to do so and will not do it in the 1.x branch. Still I think it is valuable to have the discussion. Best, Dawid On 18/08/2020 09:26, Till Rohrmann wrote: > Having looked at the proposed set of methods to remove I've noticed > that some of them are actually annotated with @Public. According to > our stability guarantees, only major releases (1.0, 2.0, etc.) can > break APIs with this annotation. Hence, I believe that we cannot > simply remove them unless the community decides to change the > stability guarantees we give or by making the next release a major > release (Flink 2.0). > > Cheers, > Till > > On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao...@aliyun.com > <mailto:yungao...@aliyun.com>> wrote: > > +1 for removing the methods that are deprecated for a while & have > alternative methods. > > One specific thing is that if we remove the DataStream#split, do > we consider enabling side-output in more operators in the future ? > Currently it should be only available in ProcessFunctions, but not > available to other commonly used UDF like Source or AsyncFunction[1]. > > One temporary solution occurs to me is to add a ProcessFunction > after the operators want to use side-output. But I think the > solution is not very direct to come up with and if it really works > we might add it to the document of side-output. > > [1] https://issues.apache.org/jira/browse/FLINK-7954 > > Best, > Yun > > ------------------Original Mail ------------------ > *Sender:*Kostas Kloudas <kklou...@gmail.com > <mailto:kklou...@gmail.com>> > *Send Date:*Tue Aug 18 03:52:44 2020 > *Recipients:*Dawid Wysakowicz <dwysakow...@apache.org > <mailto:dwysakow...@apache.org>> > *CC:*dev <d...@flink.apache.org <mailto:d...@flink.apache.org>>, > user <user@flink.apache.org <mailto:user@flink.apache.org>> > *Subject:*Re: [DISCUSS] Removing deprecated methods from > DataStream API > > +1 for removing them. > > > > From a quick look, most of them (not all) have been > deprecated a long time ago. > > > > Cheers, > > Kostas > > > > On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote: > > > > > > @David Yes, my idea was to remove any use of fold method > and all related classes including WindowedStream#fold > > > > > > @Klou Good idea to also remove the deprecated > enableCheckpointing() & > StreamExecutionEnvironment#readFile and alike. I did > another pass over some of the classes and thought we could > also drop: > > > > > > ExecutionConfig#set/getCodeAnalysisMode > > > ExecutionConfig#disable/enableSysoutLogging > > > ExecutionConfig#set/isFailTaskOnCheckpointError > > > ExecutionConfig#isLatencyTrackingEnabled > > > > > > As for the `forceCheckpointing` I am not fully convinced > to doing it. As far as I know iterations still do not > participate in checkpointing correctly. Therefore it still > might make sense to force it. In other words there is no > real alternative to that method. Unless we only remove the > methods from StreamExecutionEnvironment and redirect to > the setter in CheckpointConfig. WDYT? > > > > > > An updated list of methods I suggest to remove: > > > > > > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) > > > ExecutionConfig#disable/enableSysoutLogging (deprecated > in 1.10) > > > ExecutionConfig#set/isFailTaskOnCheckpointError > (deprecated in 1.9) > > > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) > > > ExecutionConfig#(get)/setNumberOfExecutionRetries() > (deprecated in 1.1?) > > > > > StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) > (deprecated in 1.2) > > > RuntimeContext#getAllAccumulators (deprecated in 0.10) > > > DataStream#fold and all related classes and methods such > as FoldFunction, FoldingState, FoldingStateDescriptor ... > (deprecated in 1.3/1.4) > > > > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) > (deprecated in 1.5) > > > DataStream#split (deprecated in 1.8) > > > Methods in (Connected)DataStream that specify keys as > either indices or field names such as DataStream#keyBy, > DataStream#partitionCustom, ConnectedStream#keyBy, .... > (deprecated in 1.11) > > > > > > Bear in mind that majority of the options listed above > in ExecutionConfig take no effect. They were left there > purely to satisfy the binary compatibility. Personally I > don't see any benefit of leaving a method and silently > dropping the underlying feature. The only configuration > that is respected is setting the number of execution retries. > > > > > > I also wanted to make it explicit that most of the > changes above would result in a binary incompatibility and > require additional exclusions in the japicmp. Those are: > > > > > > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9) > > > ExecutionConfig#disable/enableSysoutLogging (deprecated > in 1.10) > > > ExecutionConfig#set/isFailTaskOnCheckpointError > (deprecated in 1.9) > > > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7) > > > ExecutionConfig#(get)/setNumberOfExecutionRetries() > (deprecated in 1.1?) > > > DataStream#fold and all related classes and methods such > as FoldFunction, FoldingState, FoldingStateDescriptor ... > (deprecated in 1.3/1.4) > > > DataStream#split (deprecated in 1.8) > > > Methods in (Connected)DataStream that specify keys as > either indices or field names such as DataStream#keyBy, > DataStream#partitionCustom, ConnectedStream#keyBy, .... > (deprecated in 1.11) > > > > > StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) > (deprecated in 1.2) > > > > > > Looking forward to more opinions on the issue. > > > > > > Best, > > > > > > Dawid > > > > > > > > > On 17/08/2020 12:49, Kostas Kloudas wrote: > > > > > > Thanks a lot for starting this Dawid, > > > > > > Big +1 for the proposed clean-up, and I would also add > the deprecated > > > methods of the StreamExecutionEnvironment like: > > > > > > enableCheckpointing(long interval, CheckpointingMode > mode, boolean force) > > > enableCheckpointing() > > > isForceCheckpointing() > > > > > > readFile(FileInputFormat inputFormat,String > > > filePath,FileProcessingMode watchType,long interval, > FilePathFilter > > > filter) > > > readFileStream(...) > > > > > > socketTextStream(String hostname, int port, char > delimiter, long maxRetry) > > > socketTextStream(String hostname, int port, char delimiter) > > > > > > There are more, like the > (get)/setNumberOfExecutionRetries() that were > > > deprecated long ago, but I have not investigated to see > if they are > > > actually easy to remove. > > > > > > Cheers, > > > Kostas > > > > > > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz > > > wrote: > > > > > > Hi devs and users, > > > > > > I wanted to ask you what do you think about removing > some of the deprecated APIs around the DataStream API. > > > > > > The APIs I have in mind are: > > > > > > RuntimeContext#getAllAccumulators (deprecated in 0.10) > > > DataStream#fold and all related classes and methods such > as FoldFunction, FoldingState, FoldingStateDescriptor ... > (deprecated in 1.3/1.4) > > > > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) > (deprecated in 1.5) > > > DataStream#split (deprecated in 1.8) > > > Methods in (Connected)DataStream that specify keys as > either indices or field names such as DataStream#keyBy, > DataStream#partitionCustom, ConnectedStream#keyBy, .... > (deprecated in 1.11) > > > > > > I think the first three should be straightforward. They > are long deprecated. The getAccumulators method is not > used very often in my opinion. The same applies to the > DataStream#fold which additionally is not very performant. > Lastly the setStateBackend has an alternative with a class > from the AbstractStateBackend hierarchy, therefore it will > be still code compatible. Moreover if we remove the > #setStateBackend(AbstractStateBackend) we will get rid off > warnings users have right now when setting a statebackend > as the correct method cannot be used without an explicit > casting. > > > > > > As for the DataStream#split I know there were some > objections against removing the #split method in the past. > I still believe the output tags can replace the split > method already. > > > > > > The only problem in the last set of methods I propose to > remove is that they were deprecated only in the last > release and those method were only partially deprecated. > Moreover some of the methods were not deprecated in > ConnectedStreams. Nevertheless I'd still be inclined to > remove the methods in this release. > > > > > > Let me know what do you think about it. > > > > > > Best, > > > > > > Dawid >
signature.asc
Description: OpenPGP digital signature