Hi Till,

You are right. It all depends on when the new source interface is going to
be ready. Personally I think it would be there in about a month or so. But
I could be too optimistic. It would also be good to hear what do Aljoscha
and Stephan think as they are also involved in FLIP-27.

In general I think we should have Pulsar connector in Flink 1.10,
preferably with the new source interface. We can also check it in right now
with old source interface, but I suspect few users will use it before the
next official release. Therefore, it seems reasonable to wait a little bit
to see whether we can jump to the new source interface. As long as we make
sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much.

Thanks,

Jiangjie (Becket) Qin

On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi everyone,
>
> I'm wondering what the problem would be if we committed the Pulsar
> connector before the new source interface is ready. If I understood it
> correctly, then we need to support the old source interface anyway for the
> existing connectors. By checking it in early I could see the benefit that
> our users could start using the connector earlier. Moreover, it would
> prevent that the Pulsar integration is being delayed in case that the
> source interface should be delayed. The only downside I see is the extra
> review effort and potential fixes which might be irrelevant for the new
> source interface implementation. I guess it mainly depends on how certain
> we are when the new source interface will be ready.
>
> Cheers,
> Till
>
> On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <becket....@gmail.com> wrote:
>
> > Hi Sijie and Yijie,
> >
> > Thanks for sharing your thoughts.
> >
> > Just want to have some update on FLIP-27. Although the FLIP wiki and
> > discussion thread has been quiet for some time, a few committer /
> > contributors in Flink community were actually prototyping the entire
> thing.
> > We have made some good progress there but want to update the FLIP wiki
> > after the entire thing is verified to work in case there are some last
> > minute surprise in the implementation. I don't have an exact ETA yet,
> but I
> > guess it is going to be within a month or so.
> >
> > I am happy to review the current Flink Pulsar connector and see if it
> would
> > fit in FLIP-27. It would be good to avoid the case that we checked in the
> > Pulsar connector with some review efforts and shortly after that the new
> > Source interface is ready.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen <henry.yijies...@gmail.com>
> > wrote:
> >
> > > Thanks for all the feedback and suggestions!
> > >
> > > As Sijie said, the goal of the connector has always been to provide
> > > users with the latest features of both systems as soon as possible. We
> > > propose to contribute the connector to Flink and hope to get more
> > > suggestions and feedback from Flink experts to ensure the high quality
> > > of the connector.
> > >
> > > For FLIP-27, we noticed its existence at the beginning of reworking
> > > the connector implementation based on Flink 1.9; we also wanted to
> > > build a connector that supports both batch and stream computing based
> > > on it.
> > > However, it has been inactive for some time, so we decided to provide
> > > a connector with most of the new features, such as the new type system
> > > and the new catalog API first. We will pay attention to the progress
> > > of FLIP-27 continually and incorporate it with the connector as soon
> > > as possible.
> > >
> > > Regarding the test status of the connector, we are following the other
> > > connectors' test in Flink repository and aimed to provide throughout
> > > tests as we could. We are also happy to hear suggestions and
> > > supervision from the Flink community to improve the stability and
> > > performance of the connector continuously.
> > >
> > > Best,
> > > Yijie
> > >
> > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <guosi...@gmail.com> wrote:
> > > >
> > > > Thanks everyone for the comments and feedback.
> > > >
> > > > It seems to me that the main question here is about - "how can the
> > Flink
> > > > community maintain the connector?".
> > > >
> > > > Here are two thoughts from myself.
> > > >
> > > > 1) I think how and where to host this integration is kind of less
> > > important
> > > > here. I believe there can be many ways to achieve it.
> > > > As part of the contribution, what we are looking for here is how
> these
> > > two
> > > > communities can build the collaboration relationship on developing
> > > > the integration between Pulsar and Flink. Even we can try our best to
> > > catch
> > > > up all the updates in Flink community. We are still
> > > > facing the fact that we have less experiences in Flink than folks in
> > > Flink
> > > > community. In order to make sure we maintain and deliver
> > > > a high-quality pulsar-flink integration to the users who use both
> > > > technologies, we need some help from the experts from Flink
> community.
> > > >
> > > > 2) We have been following FLIP-27 for a while. Originally we were
> > > thinking
> > > > of contributing the connectors back after integrating with the
> > > > new API introduced in FLIP-27. But we decided to initiate the
> > > conversation
> > > > as early as possible. Because we believe there are more benefits
> doing
> > > > it now rather than later. As part of contribution, it can help Flink
> > > > community understand more about Pulsar and the potential integration
> > > points.
> > > > Also we can also help Flink community verify the new connector API as
> > > well
> > > > as other new API (e.g. catalog API).
> > > >
> > > > Thanks,
> > > > Sijie
> > > >
> > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <becket....@gmail.com>
> > wrote:
> > > >
> > > > > Hi Yijie,
> > > > >
> > > > > Thanks for the interest in contributing the Pulsar connector.
> > > > >
> > > > > In general, I think having Pulsar connector with strong support is
> a
> > > > > valuable addition to Flink. So I am happy the shepherd this effort.
> > > > > Meanwhile, I would also like to provide some context and recent
> > > efforts on
> > > > > the Flink connectors ecosystem.
> > > > >
> > > > > The current way Flink maintains its connector has hit the
> scalability
> > > bar.
> > > > > With more and more connectors coming into Flink repo, we are
> facing a
> > > few
> > > > > problems such as long build and testing time. To address this
> > problem,
> > > we
> > > > > have attempted to do the following:
> > > > > 1. Split out the connectors into a separate repository. This is
> > > temporarily
> > > > > on hold due to potential solution to shorten the build time.
> > > > > 2. Encourage the connectors to stay as ecosystem project while
> Flink
> > > tries
> > > > > to provide good support for functionality and compatibility tests.
> > > Robert
> > > > > has driven to create a Flink Ecosystem project website and it is
> > going
> > > > > through some final approval process.
> > > > >
> > > > > Given the above efforts, it would be great to first see if we can
> > have
> > > > > Pulsar connector as an ecosystem project with great support. It
> would
> > > be
> > > > > good to hear how the Flink Pulsar connector is tested currently to
> > see
> > > if
> > > > > we can learn something to maintain it as an ecosystem project with
> > good
> > > > > quality and test coverage. If the quality as an ecosystem project
> is
> > > hard
> > > > > to guarantee, we may as well adopt it into the main repo.
> > > > >
> > > > > BTW, another ongoing effort is FLIP-27 where we are making changes
> to
> > > the
> > > > > Flink source connector architecture and interface. This change will
> > > likely
> > > > > land in 1.10. Therefore timing wise, if we are going to have the
> > Pulsar
> > > > > connector in main repo, I am wondering if we should hold a little
> bit
> > > and
> > > > > let the Pulsar connector adapt to the new interface to avoid
> shortly
> > > > > deprecated work?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <
> ches...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > I'm quite worried that we may end up repeating history.
> > > > > >
> > > > > > There were already 2 attempts at contributing a pulsar connector,
> > > both
> > > > > > of which failed because no committer was getting involved,
> despite
> > > the
> > > > > > contributor opening a dedicated discussion thread about the
> > > contribution
> > > > > > beforehand and getting several +1's from committers.
> > > > > >
> > > > > > We should really make sure that if we welcome/approve such a
> > > > > > contribution it will actually get the attention it deserves.
> > > > > >
> > > > > > As such, I'm inclined to recommend maintaining the connector
> > outside
> > > of
> > > > > > Flink. We could link to it from the documentation to give it more
> > > > > exposure.
> > > > > > With the upcoming page for sharing artifacts among the community
> > > (what's
> > > > > > the state of that anyway?), this may be a better option.
> > > > > >
> > > > > > On 04/09/2019 10:16, Till Rohrmann wrote:
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > thanks a lot for starting this discussion Yijie. I think the
> > Pulsar
> > > > > > > connector would be a very valuable addition since Pulsar
> becomes
> > > more
> > > > > and
> > > > > > > more popular and it would further expand Flink's
> > interoperability.
> > > Also
> > > > > > > from a project perspective it makes sense for me to place the
> > > connector
> > > > > > in
> > > > > > > the downstream project.
> > > > > > >
> > > > > > > My main concern/question is how can the Flink community
> maintain
> > > the
> > > > > > > connector? We have seen in the past that connectors are some of
> > the
> > > > > most
> > > > > > > actively developed components because they need to be kept in
> > sync
> > > with
> > > > > > the
> > > > > > > external system and with Flink. Given that the Pulsar community
> > is
> > > > > > willing
> > > > > > > to help with maintaining, improving and evolving the connector,
> > I'm
> > > > > > > optimistic that we can achieve this. Hence, +1 for contributing
> > it
> > > back
> > > > > > to
> > > > > > > Flink.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <guosi...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > >> Hi Yun,
> > > > > > >>
> > > > > > >> Since I was the main driver behind FLINK-9641 and FLINK-9168,
> > let
> > > me
> > > > > > try to
> > > > > > >> add more context on this.
> > > > > > >>
> > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as
> > > source
> > > > > and
> > > > > > >> sink for Flink. The integration was done with Flink 1.6.0. We
> > > sent out
> > > > > > pull
> > > > > > >> requests about a year ago and we ended up maintaining those
> > > connectors
> > > > > > in
> > > > > > >> Pulsar for Pulsar users to use Flink to process event streams
> in
> > > > > Pulsar.
> > > > > > >> (See
> https://github.com/apache/pulsar/tree/master/pulsar-flink
> > ).
> > > The
> > > > > > Flink
> > > > > > >> 1.6 integration is pretty simple and there is no schema
> > > > > considerations.
> > > > > > >>
> > > > > > >> In the past year, we have made a lot of changes in Pulsar and
> > > brought
> > > > > > >> Pulsar schema as the first-class citizen in Pulsar. We also
> > > integrated
> > > > > > with
> > > > > > >> other computing engines for processing Pulsar event streams
> with
> > > > > Pulsar
> > > > > > >> schema.
> > > > > > >>
> > > > > > >> It led us to rethink how to integrate with Flink in the best
> > way.
> > > Then
> > > > > > we
> > > > > > >> reimplement the pulsar-flink connectors from the ground up
> with
> > > schema
> > > > > > and
> > > > > > >> bring table API and catalog API as the first-class citizen in
> > the
> > > > > > >> integration. With that being said, in the new pulsar-flink
> > > > > > implementation,
> > > > > > >> you can register pulsar as a flink catalog and query / process
> > the
> > > > > event
> > > > > > >> streams using Flink SQL.
> > > > > > >>
> > > > > > >> This is an example about how to use Pulsar as a Flink catalog:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > >
> >
> https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog
> > > > > > >>
> > > > > > >> Yijie has also written a blog post explaining why we
> > re-implement
> > > the
> > > > > > flink
> > > > > > >> connector with Flink 1.9 and what are the changes we made in
> the
> > > new
> > > > > > >> connector:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > >
> >
> https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f
> > > > > > >>
> > > > > > >> We believe Pulsar is not just a simple data sink or source for
> > > Flink.
> > > > > It
> > > > > > >> actually can be a fully integrated streaming data storage for
> > > Flink in
> > > > > > many
> > > > > > >> areas (sink, source, schema/catalog and state). The
> combination
> > of
> > > > > Flink
> > > > > > >> and Pulsar can create a great streaming warehouse architecture
> > for
> > > > > > >> streaming-first, unified data processing. Since we are talking
> > to
> > > > > > >> contribute Pulsar integration to Flink here, we are also
> > > dedicated to
> > > > > > >> maintain, improve and evolve the integration with Flink to
> help
> > > the
> > > > > > users
> > > > > > >> who use both Flink and Pulsar.
> > > > > > >>
> > > > > > >> Hope this give you a bit more background about the pulsar
> flink
> > > > > > >> integration. Let me know what are your thoughts.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Sijie
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <myas...@live.com>
> > > wrote:
> > > > > > >>
> > > > > > >>> Hi Yijie
> > > > > > >>>
> > > > > > >>> I can see that Pulsar becomes more and more popular recently
> > and
> > > very
> > > > > > >> glad
> > > > > > >>> to see more people willing to contribute to Flink ecosystem.
> > > > > > >>>
> > > > > > >>> Before any further discussion, would you please give some
> > > explanation
> > > > > > of
> > > > > > >>> the relationship between this thread to current existing
> JIRAs
> > of
> > > > > > pulsar
> > > > > > >>> source [1] and sink [2] connector? Will the contribution
> > contains
> > > > > part
> > > > > > of
> > > > > > >>> those PRs or totally different implementation?
> > > > > > >>>
> > > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641
> > > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168
> > > > > > >>>
> > > > > > >>> Best
> > > > > > >>> Yun Tang
> > > > > > >>> ________________________________
> > > > > > >>> From: Yijie Shen <henry.yijies...@gmail.com>
> > > > > > >>> Sent: Tuesday, September 3, 2019 13:57
> > > > > > >>> To: dev@flink.apache.org <dev@flink.apache.org>
> > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to
> > > Flink
> > > > > > >>>
> > > > > > >>> Dear Flink Community!
> > > > > > >>>
> > > > > > >>> I would like to open the discussion of contributing Pulsar
> > Flink
> > > > > > >>> connector [0] back to Flink.
> > > > > > >>>
> > > > > > >>> ## A brief introduction to Apache Pulsar
> > > > > > >>>
> > > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance
> > distributed
> > > > > > >>> pub-sub messaging system. Pulsar includes multiple features
> > such
> > > as
> > > > > > >>> native support for multiple clusters in a Pulsar instance,
> with
> > > > > > >>> seamless geo-replication of messages across clusters, very
> low
> > > > > publish
> > > > > > >>> and end-to-end latency, seamless scalability to over a
> million
> > > > > topics,
> > > > > > >>> and guaranteed message delivery with persistent message
> storage
> > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been
> > adopted
> > > by
> > > > > > >>> more and more companies[2].
> > > > > > >>>
> > > > > > >>> ## The status of Pulsar Flink connector
> > > > > > >>>
> > > > > > >>> The Pulsar Flink connector we are planning to contribute is
> > built
> > > > > upon
> > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are:
> > > > > > >>> - Pulsar as a streaming source with exactly-once guarantee.
> > > > > > >>> - Sink streaming results to Pulsar with at-least-once
> > semantics.
> > > (We
> > > > > > >>> would update this to exactly-once as well when Pulsar gets
> all
> > > > > > >>> transaction features ready in its 2.5.0 version)
> > > > > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]),
> and
> > > can
> > > > > > >>> automatically (de)serialize messages with the help of Pulsar
> > > schema.
> > > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which
> > > enables
> > > > > the
> > > > > > >>> use of Pulsar topics as tables in Table API as well as SQL
> > > client.
> > > > > > >>>
> > > > > > >>> ## Reference
> > > > > > >>> [0] https://github.com/streamnative/pulsar-flink
> > > > > > >>> [1] https://pulsar.apache.org/
> > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/
> > > > > > >>> [3]
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System
> > > > > > >>> [4]
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>> Yijie Shen
> > > > > > >>>
> > > > > >
> > > > > >
> > > > >
> > >
> >
>

Reply via email to