Hi,

these APIs don't expose the underlying source directly, so I don't think we need to worry about deprecating them as well. There's also nothing inherently wrong with using a deprecated API internally, though even just for the experience of using our own new APIs I would personally say that they should be migrated to the new Source API. It's hard to reason that users must migrate to a new API if we don't do it internally as well.


Best
Ingo

On 09.06.22 15:41, Lijie Wang wrote:
  Hi Martijn,

I don't mean it's a blocker. Just a information. And I'm also +1 for this.

Put it another way: should we migrate the `#readFile(...)` to new API or
provide a similar method "readxxx“ based on the new Source API?

And if we don't migrate it, does it mean that the `#readFile(...)` should
also be marked as deprecated?

Best,
Lijie

Martijn Visser <martijnvis...@apache.org> 于2022年6月9日周四 21:03写道:

Hi Lijie,

I don't see any problem with deprecating those methods at this moment, as
long as we don't remove them until the replacements are available. Besides
that, are we sure there are no replacements already, especially with the
new FileSource?

Best regards,

Martijn

Op do 9 jun. 2022 om 14:23 schreef Lijie Wang <wangdachui9...@gmail.com>:

Hi all,

FYI, currently, some commonly used methods in StreamExecutionEnvironment
are still based on the old SourceFunction (and there is no alternative):
`StreamExecutionEnvironment#readFile(...)`
`StreamExecutionEnvironment#readTextFile(...)`

I think these should be migrated to the new source API before deprecate
the
SourceFunction.

Best,
Lijie

Martijn Visser <martijnvis...@apache.org> 于2022年6月9日周四 16:05写道:

Hi all,

I think implicitly we've already considered the SourceFunction and
SinkFunction as deprecated. They are even marked as so on the Flink
roadmap
[1]. That also shows that connectors that are using these interfaces
are
either approaching end-of-life. The fact that we're actively migrating
connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus add-on
FLIPs)
shows that we've already determined that target.

With regards to the motivation of FLIP-27, I think reading up on the
original discussion thread is also worthwhile [2] to see more context.
FLIP-27 was also very important as it brought a unified connector which
can
support both streaming and batch (with batch being considered a special
case of streaming in Flink's vision).

So +1 to deprecate SourceFunction. I would also argue that we should
already mark the SinkFunction as deprecated to avoid having this
discussion
again in a couple of months.

Best regards,

Martijn

[1] https://flink.apache.org/roadmap.html
[2] https://lists.apache.org/thread/334co89dbhc8qpr9nvmz8t1gp4sz2c8y

Op do 9 jun. 2022 om 09:48 schreef Jing Ge <j...@ververica.com>:

Hi,

I am very happy to see opinions from different perspectives. That
will
help
us understand the problem better. Thanks all for the informative
discussion.

Let's see the big picture and check following facts together:

1. FLIP-27 was intended to solve some technical issues that are very
difficult to solve with SourceFunction[1]. When we say
"SourceFunction
is
easy", well, it depends. If we take a look at the implementation of
the
Kafka connector, we will know how complicated it is to build a
serious
connector for production with the old SourceFunction. To every
problem
there is a solution and to every solution there is a problem. The
fact
is
that there is no perfect but a feasible solution. If we try to solve
complicated problems, we have to expose some complexity. Comparing to
connectors for POC, demo, training(no offense), I would also solve
issues
for connectors like Kafka connector that are widely used in
production
with
higher priority. I think that should be one reason why FLIP-27 has
been
designed and why the new source API went public.

2. FLIP-27 and the implementation was introduced roughly at the end
of
2019
and went public on 19.04.2021, which means Flink has provided two
different
public/graduated source solutions for more than one year. On the day
that
the new source API went public, there should be a consensus in the
community that we should start the migration. Old SourceFunction
interface,
in the ideal case, should have been deprecated on that day, otherwise
we
should not graduate the new source API to avoid confusing (connector)
developers[2].

3. It is true that the new source API is hard to understand and even
hard
to implement for simple cases. Thanks for the feedback. That is
something
we need to improve. The current design&implementation could be
considered
as the low level API. The next step is to create the high level API
to
reduce some unnecessary complexity for those simple cases. But, IMHO,
this
should not be the prerequisite to postpone the deprecation of the old
SourceFunction APIs.

4. As long as the old SourceFunction is not marked as deprecated,
developers will continue asking which one should be used. Let's make
a
concrete example. If a new connector is developed now and the
developer
asks for a suggestion of the choice between the old and new source
API
on
the ML, which one should we suggest? I think it should be the new
Source
API. If a fresh new connector has been developed with the old
SourceFunction API before asking for the consensus in the community
and
the
developer wants to merge it to the master. Should we allow it? If the
answer of all these questions is pointing to the new Source API, the
old
SourceFunction is de facto already deprecated, just has not been
marked
as
@deprecated, which confuses developers even more.

  Best regards,
Jing

[1]




https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
[2] https://lists.apache.org/thread/7okp4y46n3o3rx5mn0t3qobrof8zxwqs

On Wed, Jun 8, 2022 at 2:21 AM Alexander Fedulov <
alexan...@ververica.com>
wrote:

Hey Austin,

Since we are getting deeper into the implementation details of the
DataGeneratorSource
and it is not the main topic of this thread, I propose to move our
discussion to where it belongs: [DISCUSS] FLIP-238 [1]. Could you
please
briefly formulate your requirements to make it easier for the
others
to
follow? I am happy to continue this conversation there.

[1]
https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt

Best,
Alexander Fedulov

On Tue, Jun 7, 2022 at 6:14 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

@Austin, in the FLIP I mentioned above [1], the user is
expected
to
pass a MapFunction<Long,
OUT>
to the generator. I wonder if you could have your external client
and
polling logic wrapped in a custom
MapFunction implementation class? Would that answer your needs or
do
you
have some
more sophisticated scenario in mind?

At first glance, the FLIP looks good but for this case in regards
to
the
map function, but leaves out 1) ability to control polling
intervals
and
2)
ability to produce an unknown number of records, both per-poll
and
overall
boundedness. Do you think something like this could be built from
the
same
pieces?
I'm also wondering what handles threading, is that on the user or
is
that
part of the DataGeneratorSource?

Best,
Austin

On Tue, Jun 7, 2022 at 9:34 AM Alexander Fedulov <
alexan...@ververica.com>
wrote:

Hi everyone,

Thanks for all the input and a lively discussion. It seems that
there
is
a
consensus that due to
the inherent complexity of FLIP-27 sources we should provide
more
user-facing utilities to bridge
the gap between the existing SourceFunction-based functionality
and
the
new
APIs.

To start addressing this I picked the issue that David raised
and
many
upvoted. Here is a proposal
for  the new DataGeneratorSource: FLIP-238 [1]. Please take a
look, I
am
going to open a separate
discussion thread on it shortly.

Jing also raised some great points regarding the interfaces and
subclasses.
It seems to me that
what might actually help is some sort of a "soft deprecation"
concept
and
annotation. It could be
used in places where we do not have an alternative
implementation
yet,
but
we clearly want
to indicate that continuing to build on top of these interfaces
is
discouraged. The area of
impact of deprecating all SourceFunction subclasses is rather
big,
and
we
can expect it to
take a while. The hope would be that if in the meantime someone
finds
themselves using one of
such old APIs, the "soft deprecation" annotation will be a
clear
indication
and encouragement to
work on introducing an alternative FLIP-27-based implementation
instead.

@Austin, in the FLIP I mentioned above [1], the user is
expected
to
pass a MapFunction<Long,
OUT>
to the generator. I wonder if you could have your external
client
and
polling logic wrapped in a custom
MapFunction implementation class? Would that answer your needs
or
do
you
have some
more sophisticated scenario in mind?

[1] https://cwiki.apache.org/confluence/x/9Av1D
Best,
Alexander Fedulov

On Mon, Jun 6, 2022 at 7:08 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

Thanks for the nice discussion all.

I was recently trying to implement a very simple polling
source
and
would've loved a higher-level base to work from. I'm
wondering
if
in
addition to the data generator use cases, it would be good to
support a
simple non-parallel polling abstraction to make it easier to,
for
instance,
start prototyping with data in existing APIs without adding a
Kafka
or
such
in the middle.

Best,
Austin

On Mon, Jun 6, 2022 at 10:02 AM tison <wander4...@gmail.com>
wrote:

Well. It's a bit off-topic. For deprecating SourceFunction
as
FLIP-27
series works go ahead, +1 from my side. It's a significant
work
towards
the
unification of batch and streaming effort :)

Best,
tison.


tison <wander4...@gmail.com> 于2022年6月6日周一 21:54写道:

The starting point of the version bump and removal
question
is
that
downstream projects may experience a tough time to adapt
new
interfaces
while Flink keeps in 1.x versions so that users may
expect
it
as
an
easy
task. From my experience, it's really challenge to
maintain
compatibility between multiple versions of Flink while
significant
changes
made but sharing 1.x version series - users may not be
aware
that
it's
almost a major version bump.

Best,
tison.


tison <wander4...@gmail.com> 于2022年6月6日周一 21:51写道:

One question from my side:

As SourceFunction a @Public interface, we cannot remove
it
before
doing
a major version bump (Flink 2.0).

Of course it's not a blocker to make such deprecation
and
let
the
new
interface step in. My question is whether we have a plan
to
finally
remove
the deprecated interfaces, or postpone it until a clear
plan
of
Flink
2.0?

Best,
tison.


David Anderson <dander...@apache.org> 于2022年6月6日周一
21:35写道:


David, can you elaborate why you need watermark
generation
in
the
source
for your data generators?


The training exercises should strive to provide
examples
of
best
practices.
If the exercises and their solutions use

env.fromSource(source,
WatermarkStrategy.noWatermarks(),
"name-of-source")
   .map(...)
   .assignTimestampsAndWatermarks(...)

this will help establish this anti-pattern as the
normal
way
of
doing
things.

Most new Flink users are using a KafkaSource with a
noWatermarks
strategy
and a SimpleStringSchema, followed by a map that does
the
real
deserialization, followed by the real watermarking --
because
they
aren't
seeing examples that teach how these interfaces are
meant
to
be
used.

When we redo the sources used in training exercises, I
want
to
avoid
these
pitfalls.

David

On Mon, Jun 6, 2022 at 9:12 AM Konstantin Knauf <
kna...@apache.org

wrote:

Hi everyone,

very interesting thread. The proposal for deprecation
seems
to
have
sparked
a very important discussion. Do we what users
struggle
with
specifically?

Speaking for myself, when I upgrade flink-faker to
the
new
Source
API
an
unbounded version of the NumberSequenceSource would
have
been
all I
needed,
but that's just the data generator use case. I think,
that
one
could
be
solved quite easily. David, can you elaborate why you
need
watermark
generation in the source for your data generators?

Cheers,

Konstantin





Am So., 5. Juni 2022 um 17:48 Uhr schrieb Piotr
Nowojski
<
pnowoj...@apache.org>:

Also +1 to what David has written. But it doesn't
mean
we
should
be
waiting
indefinitely to deprecate SourceFunction.

Best,
Piotrek

niedz., 5 cze 2022 o 16:46 Jark Wu <
imj...@gmail.com

napisał(a):

+1 to David's point.

Usually, when we deprecate some interfaces, we
should
point
users
to
use
the recommended alternatives.
However, implementing the new Source interface
for
some
simple
scenarios
is
too challenging and complex.
We also found it isn't easy to push the internal
connector
to
upgrade
to
the new Source because
"FLIP-27 are hard to understand, while
SourceFunction
is
easy".

+1 to make implementing a simple Source easier
before
deprecating
SourceFunction.

Best,
Jark


On Sun, 5 Jun 2022 at 07:29, Jingsong Lee <
lzljs3620...@apache.org

wrote:

+1 to David and Ingo.

Before deprecate and remove SourceFunction, we
should
have
some
easier
APIs
to wrap new Source, the cost to write a new
Source
is
too
high
now.



Ingo Bürk <airbla...@apache.org>于2022年6月5日
周日05:32写道:

I +1 everything David said. The new Source
API
raised
the
complexity
significantly. It's great to have such a
rich,
powerful
API
that
can
do
everything, but in the process we lost the
ability
to
onboard
people
to
the APIs.


Best
Ingo

On 04.06.22 21:21, David Anderson wrote:
I'm in favor of this, but I think we need
to
make
it
easier
to
implement
data generators and test sources. As things
stand
in
1.15,
unless
you
can
be satisfied with using a
NumberSequenceSource
followed
by
a
map,
things
get quite complicated. I looked into
reworking
the
data
generators
used
in
the training exercises, and got discouraged
by
the
amount
of
work
involved.
(The sources used in the training want to
be
unbounded,
and
need
watermarking in the sources, which means
that
using
NumberSequenceSource
isn't an option.)

I think the proposed deprecation will be
better
received
if
it
can
be
accompanied by something that makes
implementing
a
simple
Source
easier
than it is now. People are continuing to
implement
new
SourceFunctions
because the interfaces defined by FLIP-27
are
hard
to
understand,
while
SourceFunction is easy. Alex, I believe you
were
looking
into
implementing
an easier-to-use building block that could
be
used
in
situations
like
this.
Can we get something like that in place
first?

David

On Fri, Jun 3, 2022 at 4:52 PM Jing Ge <
j...@ververica.com

wrote:

Hi,

Thanks Alex for driving this!

+1 To give the Flink developers,
especially
Connector
developers
the
clear
signal that the new Source API is
recommended
according
to
FLIP-27,
we
should mark them as deprecated.

There are some open questions to discuss:

1. Do we need to mark all
subinterfaces/subclasses
as
deprecated?
e.g.
FromElementsFunction, etc. there are many.
What
are
the
replacements?
2. Do we need to mark all subclasses that
have
replacement
as
deprecated?
e.g. ExternallyInducedSource whose
replacement
class,
if I
am
not
mistaken,
ExternallyInducedSourceReader is
@Experimental
3. Do we need to mark all related test
utility
classes
as
deprecated?

I think it might make sense to create an
umbrella
ticket
to
cover
all
of
these with the following process:

1. Mark SourceFunction as deprecated asap.
2. Mark subinterfaces and subclasses as
deprecated,
if
there are
graduated
replacements. Good example is that
KafkaSource
replaced
KafkaConsumer
which
has been marked as deprecated.
3. Do not mark subinterfaces and
subclasses
as
deprecated,
if
replacement
classes are still experimental, check if
it
is
time
to
graduate
them.
After
graduation, go to step 2. It might take a
while
for
graduation.
4. Do not mark subinterfaces and
subclasses
as
deprecated,
if
the
replacement classes are experimental and
are
too
young
to
graduate.
We
have
to wait. But in this case we could create
new
tickets
under
the
umbrella
ticket.
5. Do not mark subinterfaces and
subclasses
as
deprecated,
if
there
is
no
replacement at all. We have to create new
tickets
and
wait
until
the
new
implementation has been done and
graduated.
It
will
take a
longer
time,
roughly 1,5 years.
6. For test classes, we could follow the
same
rule.
But
I
think
for
some
cases, we could consider doing the
replacement
directly
without
going
through the deprecation phase.

When we look back on all of these, we can
realize
it
is
a
big
epic
(even
bigger than an epic). It needs someone to
drive
it
and
keep
focus
on
it
continuously with support from the
community
and
push
the
development
towards the new Source API of FLIP-27.

If we could have consensus for this,  Alex
and I
could
create
the
umbrella
ticket to kick it off.

Best regards,
Jing


On Fri, Jun 3, 2022 at 3:54 PM Alexander
Fedulov <
alexan...@ververica.com>
wrote:

Hi everyone,

I would like to start the discussion
about
marking
SourceFunction-based
interfaces as deprecated. With the
FLIP-27
APIs
becoming
the
new
standard,
the old ones have to be eventually phased
out.
Although
this
state
is
well
known within the community and no new
connectors
based
on
the
old
interfaces can be accepted into the
project,
the
footprint
of
SourceFunction in the user code still
keeps
growing
(primarily
for
data
generators and test utilities). I believe
it
is
best
to
mark
SourceFunction
as deprecated as soon as possible. What
do
you
think?

Best,
Alexander Fedulov









--
https://twitter.com/snntrable
https://github.com/knaufk













Reply via email to