Hi Yijie,

Thanks for the interest in contributing the Pulsar connector.

In general, I think having Pulsar connector with strong support is a
valuable addition to Flink. So I am happy the shepherd this effort.
Meanwhile, I would also like to provide some context and recent efforts on
the Flink connectors ecosystem.

The current way Flink maintains its connector has hit the scalability bar.
With more and more connectors coming into Flink repo, we are facing a few
problems such as long build and testing time. To address this problem, we
have attempted to do the following:
1. Split out the connectors into a separate repository. This is temporarily
on hold due to potential solution to shorten the build time.
2. Encourage the connectors to stay as ecosystem project while Flink tries
to provide good support for functionality and compatibility tests. Robert
has driven to create a Flink Ecosystem project website and it is going
through some final approval process.

Given the above efforts, it would be great to first see if we can have
Pulsar connector as an ecosystem project with great support. It would be
good to hear how the Flink Pulsar connector is tested currently to see if
we can learn something to maintain it as an ecosystem project with good
quality and test coverage. If the quality as an ecosystem project is hard
to guarantee, we may as well adopt it into the main repo.

BTW, another ongoing effort is FLIP-27 where we are making changes to the
Flink source connector architecture and interface. This change will likely
land in 1.10. Therefore timing wise, if we are going to have the Pulsar
connector in main repo, I am wondering if we should hold a little bit and
let the Pulsar connector adapt to the new interface to avoid shortly
deprecated work?

Thanks,

Jiangjie (Becket) Qin

On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <ches...@apache.org> wrote:

> I'm quite worried that we may end up repeating history.
>
> There were already 2 attempts at contributing a pulsar connector, both
> of which failed because no committer was getting involved, despite the
> contributor opening a dedicated discussion thread about the contribution
> beforehand and getting several +1's from committers.
>
> We should really make sure that if we welcome/approve such a
> contribution it will actually get the attention it deserves.
>
> As such, I'm inclined to recommend maintaining the connector outside of
> Flink. We could link to it from the documentation to give it more exposure.
> With the upcoming page for sharing artifacts among the community (what's
> the state of that anyway?), this may be a better option.
>
> On 04/09/2019 10:16, Till Rohrmann wrote:
> > Hi everyone,
> >
> > thanks a lot for starting this discussion Yijie. I think the Pulsar
> > connector would be a very valuable addition since Pulsar becomes more and
> > more popular and it would further expand Flink's interoperability. Also
> > from a project perspective it makes sense for me to place the connector
> in
> > the downstream project.
> >
> > My main concern/question is how can the Flink community maintain the
> > connector? We have seen in the past that connectors are some of the most
> > actively developed components because they need to be kept in sync with
> the
> > external system and with Flink. Given that the Pulsar community is
> willing
> > to help with maintaining, improving and evolving the connector, I'm
> > optimistic that we can achieve this. Hence, +1 for contributing it back
> to
> > Flink.
> >
> > Cheers,
> > Till
> >
> >
> >
> > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <guosi...@gmail.com> wrote:
> >
> >> Hi Yun,
> >>
> >> Since I was the main driver behind FLINK-9641 and FLINK-9168, let me
> try to
> >> add more context on this.
> >>
> >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source and
> >> sink for Flink. The integration was done with Flink 1.6.0. We sent out
> pull
> >> requests about a year ago and we ended up maintaining those connectors
> in
> >> Pulsar for Pulsar users to use Flink to process event streams in Pulsar.
> >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The
> Flink
> >> 1.6 integration is pretty simple and there is no schema considerations.
> >>
> >> In the past year, we have made a lot of changes in Pulsar and brought
> >> Pulsar schema as the first-class citizen in Pulsar. We also integrated
> with
> >> other computing engines for processing Pulsar event streams with Pulsar
> >> schema.
> >>
> >> It led us to rethink how to integrate with Flink in the best way. Then
> we
> >> reimplement the pulsar-flink connectors from the ground up with schema
> and
> >> bring table API and catalog API as the first-class citizen in the
> >> integration. With that being said, in the new pulsar-flink
> implementation,
> >> you can register pulsar as a flink catalog and query / process the event
> >> streams using Flink SQL.
> >>
> >> This is an example about how to use Pulsar as a Flink catalog:
> >>
> >>
> https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog
> >>
> >> Yijie has also written a blog post explaining why we re-implement the
> flink
> >> connector with Flink 1.9 and what are the changes we made in the new
> >> connector:
> >>
> >>
> https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f
> >>
> >> We believe Pulsar is not just a simple data sink or source for Flink. It
> >> actually can be a fully integrated streaming data storage for Flink in
> many
> >> areas (sink, source, schema/catalog and state). The combination of Flink
> >> and Pulsar can create a great streaming warehouse architecture for
> >> streaming-first, unified data processing. Since we are talking to
> >> contribute Pulsar integration to Flink here, we are also dedicated to
> >> maintain, improve and evolve the integration with Flink to help the
> users
> >> who use both Flink and Pulsar.
> >>
> >> Hope this give you a bit more background about the pulsar flink
> >> integration. Let me know what are your thoughts.
> >>
> >> Thanks,
> >> Sijie
> >>
> >>
> >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <myas...@live.com> wrote:
> >>
> >>> Hi Yijie
> >>>
> >>> I can see that Pulsar becomes more and more popular recently and very
> >> glad
> >>> to see more people willing to contribute to Flink ecosystem.
> >>>
> >>> Before any further discussion, would you please give some explanation
> of
> >>> the relationship between this thread to current existing JIRAs of
> pulsar
> >>> source [1] and sink [2] connector? Will the contribution contains part
> of
> >>> those PRs or totally different implementation?
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-9641
> >>> [2] https://issues.apache.org/jira/browse/FLINK-9168
> >>>
> >>> Best
> >>> Yun Tang
> >>> ________________________________
> >>> From: Yijie Shen <henry.yijies...@gmail.com>
> >>> Sent: Tuesday, September 3, 2019 13:57
> >>> To: dev@flink.apache.org <dev@flink.apache.org>
> >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink
> >>>
> >>> Dear Flink Community!
> >>>
> >>> I would like to open the discussion of contributing Pulsar Flink
> >>> connector [0] back to Flink.
> >>>
> >>> ## A brief introduction to Apache Pulsar
> >>>
> >>> Apache Pulsar[1] is a multi-tenant, high-performance distributed
> >>> pub-sub messaging system. Pulsar includes multiple features such as
> >>> native support for multiple clusters in a Pulsar instance, with
> >>> seamless geo-replication of messages across clusters, very low publish
> >>> and end-to-end latency, seamless scalability to over a million topics,
> >>> and guaranteed message delivery with persistent message storage
> >>> provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by
> >>> more and more companies[2].
> >>>
> >>> ## The status of Pulsar Flink connector
> >>>
> >>> The Pulsar Flink connector we are planning to contribute is built upon
> >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are:
> >>> - Pulsar as a streaming source with exactly-once guarantee.
> >>> - Sink streaming results to Pulsar with at-least-once semantics. (We
> >>> would update this to exactly-once as well when Pulsar gets all
> >>> transaction features ready in its 2.5.0 version)
> >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and can
> >>> automatically (de)serialize messages with the help of Pulsar schema.
> >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which enables the
> >>> use of Pulsar topics as tables in Table API as well as SQL client.
> >>>
> >>> ## Reference
> >>> [0] https://github.com/streamnative/pulsar-flink
> >>> [1] https://pulsar.apache.org/
> >>> [2] https://pulsar.apache.org/en/powered-by/
> >>> [3]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System
> >>> [4]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs
> >>>
> >>> Best,
> >>> Yijie Shen
> >>>
>
>

Reply via email to