Just as a short addition: We plan to contribute the sink to Apache Bahir. Best regards Mats Pörschke
> On 5. Jan 2021, at 13:21, Poerschke, Mats > <mats.poersc...@student.hpi.uni-potsdam.de> wrote: > > Hi all, > > we want to contribute a sink connector for Apache Pinot. The following > briefly describes the planned control flow. Please feel free to comment on > any of its aspects. > > Background > Apache Pinot is a large-scale real-time data ingestion engine working on data > segments internally. The controller exposes an external API which allows > posting new segments via REST call. A thereby posted segment must contain an > id (called segment name). > > Control Flow > The Flink sink will collect data tuples locally. When creating a checkpoint, > all those tuples are grouped into one segment which is then pushed to the > Pinot controller. We will assign each pushed segment a unique incrementing > identifier. > After receiving a success response from the Pinot controller, the latest > segment name is persisted within the Flink checkpoint. > In case we have to recover from a failure, the latest successfully pushed > segment name can be reconstructed from the Flink checkpoint. At this point > the system might be in an inconsistent state. The Pinot controller might have > already stored a newer segment (which’s name was, due to the failure, not > persisted in the flink checkpoint). > This inconsistency is resolved with the next successful checkpoint creation. > The there pushed segment will get the same segment name assigned as the > inconsistent segment. Thus, Pinot replaces the old with the new segment which > prevents introducing duplicate entries. > > > Best regards > Mats Pörschke >