We are building a Stream processing system using Apache beam on top of Flink using the Flink Runner. Our pipelines take Kafka streams as sources , and can write to multiple sinks. The system needs to be tenant aware. Tenants can share same Kafka topic. Tenants can write their own pipelines. We am providing a small framework to write pipelines (on top of beam), so we have control of what data stream is available to pipeline developer. I am looking for some strategies for following :
1. How can I partition / group the data in a way that pipeline developers don’t need to care about tenancy , but the data integrity is maintained ? 2. Ways in which I can assign compute(work nodes for e.g) to different jobs based on Tenant configuration. Thanks, Aparup