@Aljoscha I have not thought about DataStream#project. I see your
argument and I do agree with you it makes sense to eventually remove
them. They are not deprecated yet though. Moreover I think we could
include deprecating it as part of the efforts of FLIP-131/FLIP-134. I
will add the argument there.

@Konstantin I do share your sentiment that the contract was broken when
they became ineffective. That's why I mentioned the fact they are
ineffective ;)

Let me give a brief summary how I understand the discussion so far.

I will start the effort of removing the non-breaking APIs:

  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5) (ALREADY REMOVED)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10) (ALREADY
    REMOVED)
  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * 
StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
    (not the equivalent in the ExecutionConfig
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
    (ineffective, thus already broke the guarantees)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
    (ineffective, thus already broke the guarantees)
  * DataStream#writeAsCsv,writeAsText (deprecated in 1.10)

I will start a Vote thread for removing:

  * DataStream#split
  * XxxDataStream#fold

as there was quite a bit of positive feedback for doing so, but there
were also concerns.

For now (probably until 2.0), I'll leave the methods:

  * 
StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
  * methods in XxxDataStream that use field names and indices for
    addressing projections/groupings
  * ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries

BTW you can track the effort in FLINK-19033[1]

Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-19033


On 27/08/2020 10:45, Aljoscha Krettek wrote:
> Did you consider DataStream.project() yet? In general I think we
> should remove most of the relational-ish methods from DataStream. More
> candidates in this set of methods would be the tuple index/expression
> methods for aggregations like min/max etc...
>
> Aljoscha
>
> On 25.08.20 20:52, Konstantin Knauf wrote:
>> I would argue that the guarantees of @Public methods that became
>> ineffective were broken when they became ineffective (and were
>> deprecated).
>>
>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in
>> 1.9)
>>
>> Removing these methods seems like the better of two evils to me as it
>> shows
>> users that they have been using no-ops for some time.
>>
>> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote:
>>
>>> We have removed some public methods in the past, after a careful
>>> deprecation period, if they were not well working any more.
>>>
>>> The sentiment I got from users is that careful cleanup is in fact
>>> appreciated, otherwise things get confusing over time (the deprecated
>>> methods cause noise in the API).
>>> Still, we need to be very careful here.
>>>
>>> I would suggest to
>>>    - start with the non-public breaking methods
>>>    - remove fold() (very long deprecated)
>>>    - remove split() buggy
>>>
>>> Removing the env.socketStream() and env.fileStream() methods would
>>> probably be good, too. They are very long deprecated and they don't
>>> work
>>> well (with checkpoints) and the sources are the first thing a user
>>> needs to
>>> understand when starting with Flink, so removing noise there is super
>>> helpful.
>>>
>>>
>>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz
>>> <dwysakow...@apache.org>
>>> wrote:
>>>
>>>> Hey Till,
>>>>
>>>> You've got a good point here. Removing some of the methods would cause
>>>> breaking the stability guarantees. I do understand if we decide not to
>>>> remove them for that reason, let me explain though why I am
>>>> thinking it
>>>> might make sense to remove them already. First of all I am a bit
>>>> afraid it
>>>> might take a long time before we arrive at the 2.0 version. We have
>>>> not
>>>> ever discussed that in the community. At the same time a lot of the
>>>> methods
>>>> already don't work or are buggy, and we do not fix them any more.
>>>>
>>>> Methods which removing would not break the Public guarantees:
>>>>
>>>>     - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>     - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>     - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>     -
>>>> StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>>     (not the equivalent in the ExecutionConfig)
>>>>     - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>     (deprecated in 1.5)
>>>>
>>>> Methods which removing would break the Public guarantees:
>>>>
>>>> which have no effect:
>>>>
>>>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated
>>>> in 1.9)
>>>>
>>>> which are buggy or discouraged and thus we do not support fixing them:
>>>>
>>>>     - DataStream#split (deprecated in 1.8)
>>>>     - DataStream#fold and all related classes and methods such as
>>>>     FoldFunction, FoldingState, FoldingStateDescriptor ...
>>>> (deprecated in
>>>>     1.3/1.4)
>>>>
>>>> The methods like:
>>>>
>>>>     -
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>>>
>>>>     - methods in (Connected)DataStream that specify keys as either
>>>>     indices or field names
>>>>     -
>>>>    
>>>> ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>>
>>>>
>>>> should be working just fine and I feel the least eager to remove
>>>> those.
>>>>
>>>> I'd suggest I will open PRs for removing the methods that will not
>>>> cause
>>>> breakage of the Public guarantees as the general feedback was rather
>>>> positive. For the rest I do understand the resentment to do so and
>>>> will not
>>>> do it in the 1.x branch. Still I think it is valuable to have the
>>>> discussion.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>>>
>>>> Having looked at the proposed set of methods to remove I've noticed
>>>> that
>>>> some of them are actually annotated with @Public. According to our
>>>> stability guarantees, only major releases (1.0, 2.0, etc.) can
>>>> break APIs
>>>> with this annotation. Hence, I believe that we cannot simply remove
>>>> them
>>>> unless the community decides to change the stability guarantees we
>>>> give or
>>>> by making the next release a major release (Flink 2.0).
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao...@aliyun.com> wrote:
>>>>
>>>>> +1 for removing the methods that are deprecated for a while & have
>>>>> alternative methods.
>>>>>
>>>>> One specific thing is that if we remove the DataStream#split, do we
>>>>> consider enabling side-output in more operators in the future ?
>>>>> Currently
>>>>> it should be only available in ProcessFunctions, but not available
>>>>> to other
>>>>> commonly used UDF like Source or AsyncFunction[1].
>>>>>
>>>>> One temporary solution occurs to me is to add a ProcessFunction after
>>>>> the operators want to use side-output. But I think the solution is
>>>>> not very
>>>>> direct to come up with and if it really works we might add it to the
>>>>> document of side-output.
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>>>>
>>>>> Best,
>>>>>   Yun
>>>>>
>>>>> ------------------Original Mail ------------------
>>>>> *Sender:*Kostas Kloudas <kklou...@gmail.com>
>>>>> *Send Date:*Tue Aug 18 03:52:44 2020
>>>>> *Recipients:*Dawid Wysakowicz <dwysakow...@apache.org>
>>>>> *CC:*dev <dev@flink.apache.org>, user <u...@flink.apache.org>
>>>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from
>>>>> DataStream API
>>>>>
>>>>>> +1 for removing them.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  From a quick look, most of them (not all) have been deprecated a
>>>>>> long
>>>>>> time ago.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Kostas
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> @David Yes, my idea was to remove any use of fold method and all
>>>>>> related classes including WindowedStream#fold
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> @Klou Good idea to also remove the deprecated
>>>>>>> enableCheckpointing() &
>>>>>> StreamExecutionEnvironment#readFile and alike. I did another pass
>>>>>> over some
>>>>>> of the classes and thought we could also drop:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode
>>>>>>
>>>>>>> ExecutionConfig#disable/enableSysoutLogging
>>>>>>
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError
>>>>>>
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> As for the `forceCheckpointing` I am not fully convinced to
>>>>>>> doing it.
>>>>>> As far as I know iterations still do not participate in
>>>>>> checkpointing
>>>>>> correctly. Therefore it still might make sense to force it. In
>>>>>> other words
>>>>>> there is no real alternative to that method. Unless we only
>>>>>> remove the
>>>>>> methods from StreamExecutionEnvironment and redirect to the
>>>>>> setter in
>>>>>> CheckpointConfig. WDYT?
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> An updated list of methods I suggest to remove:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>>
>>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>>
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>>
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>>
>>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>>> 1.1?)
>>>>>>
>>>>>>>
>>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>>>
>>>>>> (deprecated in 1.2)
>>>>>>
>>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>>
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ...
>>>>>> (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>>> (deprecated in 1.5)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>
>>>>>>> Methods in (Connected)DataStream that specify keys as either
>>>>>>> indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Bear in mind that majority of the options listed above in
>>>>>> ExecutionConfig take no effect. They were left there purely to
>>>>>> satisfy the
>>>>>> binary compatibility. Personally I don't see any benefit of
>>>>>> leaving a
>>>>>> method and silently dropping the underlying feature. The only
>>>>>> configuration
>>>>>> that is respected is setting the number of execution retries.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> I also wanted to make it explicit that most of the changes above
>>>>>> would result in a binary incompatibility and require additional
>>>>>> exclusions
>>>>>> in the japicmp. Those are:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>>
>>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>>
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>>
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>>
>>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>>> 1.1?)
>>>>>>
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ...
>>>>>> (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>
>>>>>>> Methods in (Connected)DataStream that specify keys as either
>>>>>>> indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>>>
>>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>>>
>>>>>> (deprecated in 1.2)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Looking forward to more opinions on the issue.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Best,
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Dawid
>>>>>>
>>>>>>>
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Thanks a lot for starting this Dawid,
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Big +1 for the proposed clean-up, and I would also add the
>>>>>>> deprecated
>>>>>>
>>>>>>> methods of the StreamExecutionEnvironment like:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>>>> force)
>>>>>>
>>>>>>> enableCheckpointing()
>>>>>>
>>>>>>> isForceCheckpointing()
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> readFile(FileInputFormat inputFormat,String
>>>>>>
>>>>>>> filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>>>
>>>>>>> filter)
>>>>>>
>>>>>>> readFileStream(...)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> socketTextStream(String hostname, int port, char delimiter, long
>>>>>> maxRetry)
>>>>>>
>>>>>>> socketTextStream(String hostname, int port, char delimiter)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> There are more, like the (get)/setNumberOfExecutionRetries()
>>>>>>> that were
>>>>>>
>>>>>>> deprecated long ago, but I have not investigated to see if they are
>>>>>>
>>>>>>> actually easy to remove.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Cheers,
>>>>>>
>>>>>>> Kostas
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Hi devs and users,
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> I wanted to ask you what do you think about removing some of the
>>>>>> deprecated APIs around the DataStream API.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> The APIs I have in mind are:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>>
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ...
>>>>>> (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>>> (deprecated in 1.5)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>
>>>>>>> Methods in (Connected)DataStream that specify keys as either
>>>>>>> indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> I think the first three should be straightforward. They are long
>>>>>> deprecated. The getAccumulators method is not used very often in my
>>>>>> opinion. The same applies to the DataStream#fold which
>>>>>> additionally is not
>>>>>> very performant. Lastly the setStateBackend has an alternative
>>>>>> with a class
>>>>>> from the AbstractStateBackend hierarchy, therefore it will be
>>>>>> still code
>>>>>> compatible. Moreover if we remove the
>>>>>> #setStateBackend(AbstractStateBackend) we will get rid off
>>>>>> warnings users
>>>>>> have right now when setting a statebackend as the correct method
>>>>>> cannot be
>>>>>> used without an explicit casting.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> As for the DataStream#split I know there were some objections
>>>>>>> against
>>>>>> removing the #split method in the past. I still believe the
>>>>>> output tags can
>>>>>> replace the split method already.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> The only problem in the last set of methods I propose to remove is
>>>>>> that they were deprecated only in the last release and those
>>>>>> method were
>>>>>> only partially deprecated. Moreover some of the methods were not
>>>>>> deprecated
>>>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove
>>>>>> the
>>>>>> methods in this release.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Let me know what do you think about it.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Best,
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Dawid
>>>>>
>>>>>
>>
>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to