Hi everyone,

thanks a lot for starting this discussion Yijie. I think the Pulsar
connector would be a very valuable addition since Pulsar becomes more and
more popular and it would further expand Flink's interoperability. Also
from a project perspective it makes sense for me to place the connector in
the downstream project.

My main concern/question is how can the Flink community maintain the
connector? We have seen in the past that connectors are some of the most
actively developed components because they need to be kept in sync with the
external system and with Flink. Given that the Pulsar community is willing
to help with maintaining, improving and evolving the connector, I'm
optimistic that we can achieve this. Hence, +1 for contributing it back to
Flink.

Cheers,
Till



On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <guosi...@gmail.com> wrote:

> Hi Yun,
>
> Since I was the main driver behind FLINK-9641 and FLINK-9168, let me try to
> add more context on this.
>
> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source and
> sink for Flink. The integration was done with Flink 1.6.0. We sent out pull
> requests about a year ago and we ended up maintaining those connectors in
> Pulsar for Pulsar users to use Flink to process event streams in Pulsar.
> (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The Flink
> 1.6 integration is pretty simple and there is no schema considerations.
>
> In the past year, we have made a lot of changes in Pulsar and brought
> Pulsar schema as the first-class citizen in Pulsar. We also integrated with
> other computing engines for processing Pulsar event streams with Pulsar
> schema.
>
> It led us to rethink how to integrate with Flink in the best way. Then we
> reimplement the pulsar-flink connectors from the ground up with schema and
> bring table API and catalog API as the first-class citizen in the
> integration. With that being said, in the new pulsar-flink implementation,
> you can register pulsar as a flink catalog and query / process the event
> streams using Flink SQL.
>
> This is an example about how to use Pulsar as a Flink catalog:
>
> https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog
>
> Yijie has also written a blog post explaining why we re-implement the flink
> connector with Flink 1.9 and what are the changes we made in the new
> connector:
>
> https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f
>
> We believe Pulsar is not just a simple data sink or source for Flink. It
> actually can be a fully integrated streaming data storage for Flink in many
> areas (sink, source, schema/catalog and state). The combination of Flink
> and Pulsar can create a great streaming warehouse architecture for
> streaming-first, unified data processing. Since we are talking to
> contribute Pulsar integration to Flink here, we are also dedicated to
> maintain, improve and evolve the integration with Flink to help the users
> who use both Flink and Pulsar.
>
> Hope this give you a bit more background about the pulsar flink
> integration. Let me know what are your thoughts.
>
> Thanks,
> Sijie
>
>
> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <myas...@live.com> wrote:
>
> > Hi Yijie
> >
> > I can see that Pulsar becomes more and more popular recently and very
> glad
> > to see more people willing to contribute to Flink ecosystem.
> >
> > Before any further discussion, would you please give some explanation of
> > the relationship between this thread to current existing JIRAs of pulsar
> > source [1] and sink [2] connector? Will the contribution contains part of
> > those PRs or totally different implementation?
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-9641
> > [2] https://issues.apache.org/jira/browse/FLINK-9168
> >
> > Best
> > Yun Tang
> > ________________________________
> > From: Yijie Shen <henry.yijies...@gmail.com>
> > Sent: Tuesday, September 3, 2019 13:57
> > To: dev@flink.apache.org <dev@flink.apache.org>
> > Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink
> >
> > Dear Flink Community!
> >
> > I would like to open the discussion of contributing Pulsar Flink
> > connector [0] back to Flink.
> >
> > ## A brief introduction to Apache Pulsar
> >
> > Apache Pulsar[1] is a multi-tenant, high-performance distributed
> > pub-sub messaging system. Pulsar includes multiple features such as
> > native support for multiple clusters in a Pulsar instance, with
> > seamless geo-replication of messages across clusters, very low publish
> > and end-to-end latency, seamless scalability to over a million topics,
> > and guaranteed message delivery with persistent message storage
> > provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by
> > more and more companies[2].
> >
> > ## The status of Pulsar Flink connector
> >
> > The Pulsar Flink connector we are planning to contribute is built upon
> > Flink 1.9.0 and Pulsar 2.4.0. The main features are:
> > - Pulsar as a streaming source with exactly-once guarantee.
> > - Sink streaming results to Pulsar with at-least-once semantics. (We
> > would update this to exactly-once as well when Pulsar gets all
> > transaction features ready in its 2.5.0 version)
> > - Build upon Flink new Table API Type system (FLIP-37[3]), and can
> > automatically (de)serialize messages with the help of Pulsar schema.
> > - Integrate with Flink new Catalog API (FLIP-30[4]), which enables the
> > use of Pulsar topics as tables in Table API as well as SQL client.
> >
> > ## Reference
> > [0] https://github.com/streamnative/pulsar-flink
> > [1] https://pulsar.apache.org/
> > [2] https://pulsar.apache.org/en/powered-by/
> > [3]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System
> > [4]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs
> >
> >
> > Best,
> > Yijie Shen
> >
>

Reply via email to