On Mon, Jul 25, 2016 at 5:38 AM, Aparup Banerjee (apbanerj) <apban...@cisco.com> wrote: > We are building a Stream processing system using Apache beam on top of Flink > using the Flink Runner. Our pipelines take Kafka streams as sources , and > can write to multiple sinks. The system needs to be tenant aware. Tenants > can share same Kafka topic. Tenants can write their own pipelines. We am > providing a small framework to write pipelines (on top of beam), so we have > control of what data stream is available to pipeline developer. I am looking > for some strategies for following : > > How can I partition / group the data in a way that pipeline developers don’t > need to care about tenancy , but the data integrity is maintained ? > Ways in which I can assign compute(work nodes for e.g) to different jobs > based on Tenant configuration.
There is no built-in support for this in Flink, but King.com worked on something similar using custom operators. You can check out the blog post here: https://techblog.king.com/rbea-scalable-real-time-analytics-king/ I'm pulling in Gyula (cc'd) who worked on the implementation at King...