I'm quite worried that we may end up repeating history.

There were already 2 attempts at contributing a pulsar connector, both of which failed because no committer was getting involved, despite the contributor opening a dedicated discussion thread about the contribution beforehand and getting several +1's from committers.

We should really make sure that if we welcome/approve such a contribution it will actually get the attention it deserves.

As such, I'm inclined to recommend maintaining the connector outside of Flink. We could link to it from the documentation to give it more exposure. With the upcoming page for sharing artifacts among the community (what's the state of that anyway?), this may be a better option.

On 04/09/2019 10:16, Till Rohrmann wrote:
Hi everyone,

thanks a lot for starting this discussion Yijie. I think the Pulsar
connector would be a very valuable addition since Pulsar becomes more and
more popular and it would further expand Flink's interoperability. Also
from a project perspective it makes sense for me to place the connector in
the downstream project.

My main concern/question is how can the Flink community maintain the
connector? We have seen in the past that connectors are some of the most
actively developed components because they need to be kept in sync with the
external system and with Flink. Given that the Pulsar community is willing
to help with maintaining, improving and evolving the connector, I'm
optimistic that we can achieve this. Hence, +1 for contributing it back to
Flink.

Cheers,
Till



On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <guosi...@gmail.com> wrote:

Hi Yun,

Since I was the main driver behind FLINK-9641 and FLINK-9168, let me try to
add more context on this.

FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source and
sink for Flink. The integration was done with Flink 1.6.0. We sent out pull
requests about a year ago and we ended up maintaining those connectors in
Pulsar for Pulsar users to use Flink to process event streams in Pulsar.
(See https://github.com/apache/pulsar/tree/master/pulsar-flink). The Flink
1.6 integration is pretty simple and there is no schema considerations.

In the past year, we have made a lot of changes in Pulsar and brought
Pulsar schema as the first-class citizen in Pulsar. We also integrated with
other computing engines for processing Pulsar event streams with Pulsar
schema.

It led us to rethink how to integrate with Flink in the best way. Then we
reimplement the pulsar-flink connectors from the ground up with schema and
bring table API and catalog API as the first-class citizen in the
integration. With that being said, in the new pulsar-flink implementation,
you can register pulsar as a flink catalog and query / process the event
streams using Flink SQL.

This is an example about how to use Pulsar as a Flink catalog:

https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog

Yijie has also written a blog post explaining why we re-implement the flink
connector with Flink 1.9 and what are the changes we made in the new
connector:

https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f

We believe Pulsar is not just a simple data sink or source for Flink. It
actually can be a fully integrated streaming data storage for Flink in many
areas (sink, source, schema/catalog and state). The combination of Flink
and Pulsar can create a great streaming warehouse architecture for
streaming-first, unified data processing. Since we are talking to
contribute Pulsar integration to Flink here, we are also dedicated to
maintain, improve and evolve the integration with Flink to help the users
who use both Flink and Pulsar.

Hope this give you a bit more background about the pulsar flink
integration. Let me know what are your thoughts.

Thanks,
Sijie


On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <myas...@live.com> wrote:

Hi Yijie

I can see that Pulsar becomes more and more popular recently and very
glad
to see more people willing to contribute to Flink ecosystem.

Before any further discussion, would you please give some explanation of
the relationship between this thread to current existing JIRAs of pulsar
source [1] and sink [2] connector? Will the contribution contains part of
those PRs or totally different implementation?

[1] https://issues.apache.org/jira/browse/FLINK-9641
[2] https://issues.apache.org/jira/browse/FLINK-9168

Best
Yun Tang
________________________________
From: Yijie Shen <henry.yijies...@gmail.com>
Sent: Tuesday, September 3, 2019 13:57
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink

Dear Flink Community!

I would like to open the discussion of contributing Pulsar Flink
connector [0] back to Flink.

## A brief introduction to Apache Pulsar

Apache Pulsar[1] is a multi-tenant, high-performance distributed
pub-sub messaging system. Pulsar includes multiple features such as
native support for multiple clusters in a Pulsar instance, with
seamless geo-replication of messages across clusters, very low publish
and end-to-end latency, seamless scalability to over a million topics,
and guaranteed message delivery with persistent message storage
provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by
more and more companies[2].

## The status of Pulsar Flink connector

The Pulsar Flink connector we are planning to contribute is built upon
Flink 1.9.0 and Pulsar 2.4.0. The main features are:
- Pulsar as a streaming source with exactly-once guarantee.
- Sink streaming results to Pulsar with at-least-once semantics. (We
would update this to exactly-once as well when Pulsar gets all
transaction features ready in its 2.5.0 version)
- Build upon Flink new Table API Type system (FLIP-37[3]), and can
automatically (de)serialize messages with the help of Pulsar schema.
- Integrate with Flink new Catalog API (FLIP-30[4]), which enables the
use of Pulsar topics as tables in Table API as well as SQL client.

## Reference
[0] https://github.com/streamnative/pulsar-flink
[1] https://pulsar.apache.org/
[2] https://pulsar.apache.org/en/powered-by/
[3]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System
[4]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs

Best,
Yijie Shen


Reply via email to