Hi, I think having a Pulsar connector in Flink can be a good mutual benefit to both communities.
Another perspective is that Pulsar connector is the 1st streaming connector that integrates with Flink's metadata management system and Catalog APIs. It'll be cool to see how the integration turns out and whether we need to improve Flink Catalog stack, which are currently in Beta, to cater to streaming source/sink. Thus I'm in favor of merging Pulsar connector into Flink 1.10. I'd suggest to submit smaller sized PRs, e.g. maybe one for basic source/sink functionalities and another for schema and catalog integration, just to make them easier to review. It doesn't seem to hurt to wait for FLIP-27. But I don't think FLIP-27 should be a blocker in cases where it cannot make its way into 1.10 or doesn't leave reasonable amount of time for committers to review or for Pulsar connector to fully adapt to new interfaces. Bowen On Thu, Sep 5, 2019 at 3:21 AM Becket Qin <becket....@gmail.com> wrote: > Hi Till, > > You are right. It all depends on when the new source interface is going to > be ready. Personally I think it would be there in about a month or so. But > I could be too optimistic. It would also be good to hear what do Aljoscha > and Stephan think as they are also involved in FLIP-27. > > In general I think we should have Pulsar connector in Flink 1.10, > preferably with the new source interface. We can also check it in right now > with old source interface, but I suspect few users will use it before the > next official release. Therefore, it seems reasonable to wait a little bit > to see whether we can jump to the new source interface. As long as we make > sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much. > > Thanks, > > Jiangjie (Becket) Qin > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann <trohrm...@apache.org> wrote: > > > Hi everyone, > > > > I'm wondering what the problem would be if we committed the Pulsar > > connector before the new source interface is ready. If I understood it > > correctly, then we need to support the old source interface anyway for > the > > existing connectors. By checking it in early I could see the benefit that > > our users could start using the connector earlier. Moreover, it would > > prevent that the Pulsar integration is being delayed in case that the > > source interface should be delayed. The only downside I see is the extra > > review effort and potential fixes which might be irrelevant for the new > > source interface implementation. I guess it mainly depends on how certain > > we are when the new source interface will be ready. > > > > Cheers, > > Till > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <becket....@gmail.com> wrote: > > > > > Hi Sijie and Yijie, > > > > > > Thanks for sharing your thoughts. > > > > > > Just want to have some update on FLIP-27. Although the FLIP wiki and > > > discussion thread has been quiet for some time, a few committer / > > > contributors in Flink community were actually prototyping the entire > > thing. > > > We have made some good progress there but want to update the FLIP wiki > > > after the entire thing is verified to work in case there are some last > > > minute surprise in the implementation. I don't have an exact ETA yet, > > but I > > > guess it is going to be within a month or so. > > > > > > I am happy to review the current Flink Pulsar connector and see if it > > would > > > fit in FLIP-27. It would be good to avoid the case that we checked in > the > > > Pulsar connector with some review efforts and shortly after that the > new > > > Source interface is ready. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen <henry.yijies...@gmail.com> > > > wrote: > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > As Sijie said, the goal of the connector has always been to provide > > > > users with the latest features of both systems as soon as possible. > We > > > > propose to contribute the connector to Flink and hope to get more > > > > suggestions and feedback from Flink experts to ensure the high > quality > > > > of the connector. > > > > > > > > For FLIP-27, we noticed its existence at the beginning of reworking > > > > the connector implementation based on Flink 1.9; we also wanted to > > > > build a connector that supports both batch and stream computing based > > > > on it. > > > > However, it has been inactive for some time, so we decided to provide > > > > a connector with most of the new features, such as the new type > system > > > > and the new catalog API first. We will pay attention to the progress > > > > of FLIP-27 continually and incorporate it with the connector as soon > > > > as possible. > > > > > > > > Regarding the test status of the connector, we are following the > other > > > > connectors' test in Flink repository and aimed to provide throughout > > > > tests as we could. We are also happy to hear suggestions and > > > > supervision from the Flink community to improve the stability and > > > > performance of the connector continuously. > > > > > > > > Best, > > > > Yijie > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <guosi...@gmail.com> wrote: > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > It seems to me that the main question here is about - "how can the > > > Flink > > > > > community maintain the connector?". > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > 1) I think how and where to host this integration is kind of less > > > > important > > > > > here. I believe there can be many ways to achieve it. > > > > > As part of the contribution, what we are looking for here is how > > these > > > > two > > > > > communities can build the collaboration relationship on developing > > > > > the integration between Pulsar and Flink. Even we can try our best > to > > > > catch > > > > > up all the updates in Flink community. We are still > > > > > facing the fact that we have less experiences in Flink than folks > in > > > > Flink > > > > > community. In order to make sure we maintain and deliver > > > > > a high-quality pulsar-flink integration to the users who use both > > > > > technologies, we need some help from the experts from Flink > > community. > > > > > > > > > > 2) We have been following FLIP-27 for a while. Originally we were > > > > thinking > > > > > of contributing the connectors back after integrating with the > > > > > new API introduced in FLIP-27. But we decided to initiate the > > > > conversation > > > > > as early as possible. Because we believe there are more benefits > > doing > > > > > it now rather than later. As part of contribution, it can help > Flink > > > > > community understand more about Pulsar and the potential > integration > > > > points. > > > > > Also we can also help Flink community verify the new connector API > as > > > > well > > > > > as other new API (e.g. catalog API). > > > > > > > > > > Thanks, > > > > > Sijie > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <becket....@gmail.com> > > > wrote: > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar connector. > > > > > > > > > > > > In general, I think having Pulsar connector with strong support > is > > a > > > > > > valuable addition to Flink. So I am happy the shepherd this > effort. > > > > > > Meanwhile, I would also like to provide some context and recent > > > > efforts on > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > The current way Flink maintains its connector has hit the > > scalability > > > > bar. > > > > > > With more and more connectors coming into Flink repo, we are > > facing a > > > > few > > > > > > problems such as long build and testing time. To address this > > > problem, > > > > we > > > > > > have attempted to do the following: > > > > > > 1. Split out the connectors into a separate repository. This is > > > > temporarily > > > > > > on hold due to potential solution to shorten the build time. > > > > > > 2. Encourage the connectors to stay as ecosystem project while > > Flink > > > > tries > > > > > > to provide good support for functionality and compatibility > tests. > > > > Robert > > > > > > has driven to create a Flink Ecosystem project website and it is > > > going > > > > > > through some final approval process. > > > > > > > > > > > > Given the above efforts, it would be great to first see if we can > > > have > > > > > > Pulsar connector as an ecosystem project with great support. It > > would > > > > be > > > > > > good to hear how the Flink Pulsar connector is tested currently > to > > > see > > > > if > > > > > > we can learn something to maintain it as an ecosystem project > with > > > good > > > > > > quality and test coverage. If the quality as an ecosystem project > > is > > > > hard > > > > > > to guarantee, we may as well adopt it into the main repo. > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are making > changes > > to > > > > the > > > > > > Flink source connector architecture and interface. This change > will > > > > likely > > > > > > land in 1.10. Therefore timing wise, if we are going to have the > > > Pulsar > > > > > > connector in main repo, I am wondering if we should hold a little > > bit > > > > and > > > > > > let the Pulsar connector adapt to the new interface to avoid > > shortly > > > > > > deprecated work? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > ches...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > > > > > > > There were already 2 attempts at contributing a pulsar > connector, > > > > both > > > > > > > of which failed because no committer was getting involved, > > despite > > > > the > > > > > > > contributor opening a dedicated discussion thread about the > > > > contribution > > > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > > > > > We should really make sure that if we welcome/approve such a > > > > > > > contribution it will actually get the attention it deserves. > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the connector > > > outside > > > > of > > > > > > > Flink. We could link to it from the documentation to give it > more > > > > > > exposure. > > > > > > > With the upcoming page for sharing artifacts among the > community > > > > (what's > > > > > > > the state of that anyway?), this may be a better option. > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I think the > > > Pulsar > > > > > > > > connector would be a very valuable addition since Pulsar > > becomes > > > > more > > > > > > and > > > > > > > > more popular and it would further expand Flink's > > > interoperability. > > > > Also > > > > > > > > from a project perspective it makes sense for me to place the > > > > connector > > > > > > > in > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > My main concern/question is how can the Flink community > > maintain > > > > the > > > > > > > > connector? We have seen in the past that connectors are some > of > > > the > > > > > > most > > > > > > > > actively developed components because they need to be kept in > > > sync > > > > with > > > > > > > the > > > > > > > > external system and with Flink. Given that the Pulsar > community > > > is > > > > > > > willing > > > > > > > > to help with maintaining, improving and evolving the > connector, > > > I'm > > > > > > > > optimistic that we can achieve this. Hence, +1 for > contributing > > > it > > > > back > > > > > > > to > > > > > > > > Flink. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <guosi...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > >> > > > > > > > >> Since I was the main driver behind FLINK-9641 and > FLINK-9168, > > > let > > > > me > > > > > > > try to > > > > > > > >> add more context on this. > > > > > > > >> > > > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as > > > > source > > > > > > and > > > > > > > >> sink for Flink. The integration was done with Flink 1.6.0. > We > > > > sent out > > > > > > > pull > > > > > > > >> requests about a year ago and we ended up maintaining those > > > > connectors > > > > > > > in > > > > > > > >> Pulsar for Pulsar users to use Flink to process event > streams > > in > > > > > > Pulsar. > > > > > > > >> (See > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > ). > > > > The > > > > > > > Flink > > > > > > > >> 1.6 integration is pretty simple and there is no schema > > > > > > considerations. > > > > > > > >> > > > > > > > >> In the past year, we have made a lot of changes in Pulsar > and > > > > brought > > > > > > > >> Pulsar schema as the first-class citizen in Pulsar. We also > > > > integrated > > > > > > > with > > > > > > > >> other computing engines for processing Pulsar event streams > > with > > > > > > Pulsar > > > > > > > >> schema. > > > > > > > >> > > > > > > > >> It led us to rethink how to integrate with Flink in the best > > > way. > > > > Then > > > > > > > we > > > > > > > >> reimplement the pulsar-flink connectors from the ground up > > with > > > > schema > > > > > > > and > > > > > > > >> bring table API and catalog API as the first-class citizen > in > > > the > > > > > > > >> integration. With that being said, in the new pulsar-flink > > > > > > > implementation, > > > > > > > >> you can register pulsar as a flink catalog and query / > process > > > the > > > > > > event > > > > > > > >> streams using Flink SQL. > > > > > > > >> > > > > > > > >> This is an example about how to use Pulsar as a Flink > catalog: > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > >> > > > > > > > >> Yijie has also written a blog post explaining why we > > > re-implement > > > > the > > > > > > > flink > > > > > > > >> connector with Flink 1.9 and what are the changes we made in > > the > > > > new > > > > > > > >> connector: > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > >> > > > > > > > >> We believe Pulsar is not just a simple data sink or source > for > > > > Flink. > > > > > > It > > > > > > > >> actually can be a fully integrated streaming data storage > for > > > > Flink in > > > > > > > many > > > > > > > >> areas (sink, source, schema/catalog and state). The > > combination > > > of > > > > > > Flink > > > > > > > >> and Pulsar can create a great streaming warehouse > architecture > > > for > > > > > > > >> streaming-first, unified data processing. Since we are > talking > > > to > > > > > > > >> contribute Pulsar integration to Flink here, we are also > > > > dedicated to > > > > > > > >> maintain, improve and evolve the integration with Flink to > > help > > > > the > > > > > > > users > > > > > > > >> who use both Flink and Pulsar. > > > > > > > >> > > > > > > > >> Hope this give you a bit more background about the pulsar > > flink > > > > > > > >> integration. Let me know what are your thoughts. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> Sijie > > > > > > > >> > > > > > > > >> > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <myas...@live.com> > > > > wrote: > > > > > > > >> > > > > > > > >>> Hi Yijie > > > > > > > >>> > > > > > > > >>> I can see that Pulsar becomes more and more popular > recently > > > and > > > > very > > > > > > > >> glad > > > > > > > >>> to see more people willing to contribute to Flink > ecosystem. > > > > > > > >>> > > > > > > > >>> Before any further discussion, would you please give some > > > > explanation > > > > > > > of > > > > > > > >>> the relationship between this thread to current existing > > JIRAs > > > of > > > > > > > pulsar > > > > > > > >>> source [1] and sink [2] connector? Will the contribution > > > contains > > > > > > part > > > > > > > of > > > > > > > >>> those PRs or totally different implementation? > > > > > > > >>> > > > > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > >>> > > > > > > > >>> Best > > > > > > > >>> Yun Tang > > > > > > > >>> ________________________________ > > > > > > > >>> From: Yijie Shen <henry.yijies...@gmail.com> > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > >>> To: dev@flink.apache.org <dev@flink.apache.org> > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back > to > > > > Flink > > > > > > > >>> > > > > > > > >>> Dear Flink Community! > > > > > > > >>> > > > > > > > >>> I would like to open the discussion of contributing Pulsar > > > Flink > > > > > > > >>> connector [0] back to Flink. > > > > > > > >>> > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > >>> > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance > > > distributed > > > > > > > >>> pub-sub messaging system. Pulsar includes multiple features > > > such > > > > as > > > > > > > >>> native support for multiple clusters in a Pulsar instance, > > with > > > > > > > >>> seamless geo-replication of messages across clusters, very > > low > > > > > > publish > > > > > > > >>> and end-to-end latency, seamless scalability to over a > > million > > > > > > topics, > > > > > > > >>> and guaranteed message delivery with persistent message > > storage > > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been > > > adopted > > > > by > > > > > > > >>> more and more companies[2]. > > > > > > > >>> > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > >>> > > > > > > > >>> The Pulsar Flink connector we are planning to contribute is > > > built > > > > > > upon > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > > > > > >>> - Pulsar as a streaming source with exactly-once guarantee. > > > > > > > >>> - Sink streaming results to Pulsar with at-least-once > > > semantics. > > > > (We > > > > > > > >>> would update this to exactly-once as well when Pulsar gets > > all > > > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), > > and > > > > can > > > > > > > >>> automatically (de)serialize messages with the help of > Pulsar > > > > schema. > > > > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which > > > > enables > > > > > > the > > > > > > > >>> use of Pulsar topics as tables in Table API as well as SQL > > > > client. > > > > > > > >>> > > > > > > > >>> ## Reference > > > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > >>> [3] > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > >>> [4] > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > >>> > > > > > > > >>> Best, > > > > > > > >>> Yijie Shen > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >