Re: Apache Pinot Sink

Poerschke, Mats Tue, 05 Jan 2021 04:59:02 -0800

Just as a short addition: We plan to contribute the sink to Apache Bahir.

Best regards
Mats Pörschke


> On 5. Jan 2021, at 13:21, Poerschke, Mats 
> <mats.poersc...@student.hpi.uni-potsdam.de> wrote:
> 
> Hi all,
> 
> we want to contribute a sink connector for Apache Pinot. The following 
> briefly describes the planned control flow. Please feel free to comment on 
> any of its aspects.
> 
> Background
> Apache Pinot is a large-scale real-time data ingestion engine working on data 
> segments internally. The controller exposes an external API which allows 
> posting new segments via REST call. A thereby posted segment must contain an 
> id (called segment name).
> 
> Control Flow
> The Flink sink will collect data tuples locally. When creating a checkpoint, 
> all those tuples are grouped into one segment which is then pushed to the 
> Pinot controller. We will assign each pushed segment a unique incrementing 
> identifier.
> After receiving a success response from the Pinot controller, the latest 
> segment name is persisted within the Flink checkpoint.
> In case we have to recover from a failure, the latest successfully pushed 
> segment name can be reconstructed from the Flink checkpoint. At this point 
> the system might be in an inconsistent state. The Pinot controller might have 
> already stored a newer segment (which’s name was, due to the failure, not 
> persisted in the flink checkpoint).
> This inconsistency is resolved with the next successful checkpoint creation. 
> The there pushed segment will get the same segment name assigned as the 
> inconsistent segment. Thus, Pinot replaces the old with the new segment which 
> prevents introducing duplicate entries.
> 
> 
> Best regards
> Mats Pörschke
>

Re: Apache Pinot Sink

Reply via email to