Yes, we will use the latest sink interface.

Best,
Mats

> On 6. Jan 2021, at 11:05, Aljoscha Krettek <aljos...@apache.org> wrote:
> 
> It's great to see interest in this. Where you planning to use the new Sink 
> interface that we recently introduced? [1]
> 
> Best,
> Aljoscha
> 
> [1] https://s.apache.org/FLIP-143
> 
> On 2021/01/05 12:21, Poerschke, Mats wrote:
>> Hi all,
>> 
>> we want to contribute a sink connector for Apache Pinot. The following 
>> briefly describes the planned control flow. Please feel free to comment on 
>> any of its aspects.
>> 
>> Background
>> Apache Pinot is a large-scale real-time data ingestion engine working on 
>> data segments internally. The controller exposes an external API which 
>> allows posting new segments via REST call. A thereby posted segment must 
>> contain an id (called segment name).
>> 
>> Control Flow
>> The Flink sink will collect data tuples locally. When creating a checkpoint, 
>> all those tuples are grouped into one segment which is then pushed to the 
>> Pinot controller. We will assign each pushed segment a unique incrementing 
>> identifier.
>> After receiving a success response from the Pinot controller, the latest 
>> segment name is persisted within the Flink checkpoint.
>> In case we have to recover from a failure, the latest successfully pushed 
>> segment name can be reconstructed from the Flink checkpoint. At this point 
>> the system might be in an inconsistent state. The Pinot controller might 
>> have already stored a newer segment (which’s name was, due to the failure, 
>> not persisted in the flink checkpoint).
>> This inconsistency is resolved with the next successful checkpoint creation. 
>> The there pushed segment will get the same segment name assigned as the 
>> inconsistent segment. Thus, Pinot replaces the old with the new segment 
>> which prevents introducing duplicate entries.
>> 
>> 
>> Best regards
>> Mats Pörschke
>> 

Reply via email to