On Mon, Jul 25, 2016 at 5:38 AM, Aparup Banerjee (apbanerj)
<apban...@cisco.com> wrote:
> We are building a Stream processing system using Apache beam on top of Flink
> using the Flink Runner. Our pipelines take Kafka streams as sources , and
> can write to multiple sinks. The system needs to be tenant aware. Tenants
> can share same Kafka topic. Tenants can write their own pipelines. We am
> providing a small framework to write pipelines (on top of beam),  so we have
> control of what data stream is available to pipeline developer. I am looking
> for some strategies for following :
>
> How can I partition / group the data in a way that pipeline developers don’t
> need to care about tenancy , but the data integrity is maintained ?
> Ways in which I can assign compute(work nodes for e.g) to different jobs
> based on Tenant configuration.

There is no built-in support for this in Flink, but King.com worked on
something similar using custom operators. You can check out the blog
post here: https://techblog.king.com/rbea-scalable-real-time-analytics-king/

I'm pulling in Gyula (cc'd) who worked on the implementation at King...

Reply via email to