Re: Apache Pinot Sink

Venkata Sanath Muppalla Wed, 06 Jan 2021 13:48:07 -0800

+1 As Yupeng mentioned, we at Uber are also looking into the Pinot Sink. It
would be great to collaborate on this proposal.


Thanks,
Sanath

On Wed, Jan 6, 2021 at 9:23 AM Yupeng Fu <yup...@uber.com.invalid> wrote:

> Hi Mats,
>
> Glad to see this interest! We at Uber are also working on a Pinot sink (for
> BATCH execution), with some help from the Pinot community on abstracting
> Pinot interfaces for segment writes and catalog retrieval. Perhaps we can
> collaborate on this proposal/POC.
>
> Cheers,
>
> Yupeng
>
>
>
> On Wed, Jan 6, 2021 at 6:12 AM Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> > That's good to hear. I wasn't sure because the explanation focused a lot
> > on checkpoints and the details of it while with the new Sink interface
> > implementers don't need to be concerned with those. And in fact, when
> > the Sink is used in BATCH execution mode there will be no checkpoints.
> >
> > Other than that, the implementation sketch makes sense to me. I think to
> > make further assessments you will probably have to work on a
> > proof-of-concept.
> >
> > Best,
> > Aljoscha
> >
> > On 2021/01/06 11:18, Poerschke, Mats wrote:
> > >Yes, we will use the latest sink interface.
> > >
> > >Best,
> > >Mats
> > >
> > >> On 6. Jan 2021, at 11:05, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> > >>
> > >> It's great to see interest in this. Where you planning to use the new
> > Sink interface that we recently introduced? [1]
> > >>
> > >> Best,
> > >> Aljoscha
> > >>
> > >> [1] https://s.apache.org/FLIP-143
> > >>
> > >> On 2021/01/05 12:21, Poerschke, Mats wrote:
> > >>> Hi all,
> > >>>
> > >>> we want to contribute a sink connector for Apache Pinot. The
> following
> > briefly describes the planned control flow. Please feel free to comment
> on
> > any of its aspects.
> > >>>
> > >>> Background
> > >>> Apache Pinot is a large-scale real-time data ingestion engine working
> > on data segments internally. The controller exposes an external API which
> > allows posting new segments via REST call. A thereby posted segment must
> > contain an id (called segment name).
> > >>>
> > >>> Control Flow
> > >>> The Flink sink will collect data tuples locally. When creating a
> > checkpoint, all those tuples are grouped into one segment which is then
> > pushed to the Pinot controller. We will assign each pushed segment a
> unique
> > incrementing identifier.
> > >>> After receiving a success response from the Pinot controller, the
> > latest segment name is persisted within the Flink checkpoint.
> > >>> In case we have to recover from a failure, the latest successfully
> > pushed segment name can be reconstructed from the Flink checkpoint. At
> this
> > point the system might be in an inconsistent state. The Pinot controller
> > might have already stored a newer segment (which’s name was, due to the
> > failure, not persisted in the flink checkpoint).
> > >>> This inconsistency is resolved with the next successful checkpoint
> > creation. The there pushed segment will get the same segment name
> assigned
> > as the inconsistent segment. Thus, Pinot replaces the old with the new
> > segment which prevents introducing duplicate entries.
> > >>>
> > >>>
> > >>> Best regards
> > >>> Mats Pörschke
> > >>>
> > >
> >
>

Re: Apache Pinot Sink

Reply via email to