We are building a Stream processing system using Apache beam on top of Flink 
using the Flink Runner. Our pipelines take Kafka streams as sources , and can 
write to multiple sinks. The system needs to be tenant aware. Tenants can share 
same Kafka topic. Tenants can write their own pipelines. We am providing a 
small framework to write pipelines (on top of beam),  so we have control of 
what data stream is available to pipeline developer. I am looking for some 
strategies for following :


  1.  How can I partition / group the data in a way that pipeline developers 
don’t need to care about tenancy , but the data integrity is maintained ?
  2.  Ways in which I can assign compute(work nodes for e.g) to different jobs 
based on Tenant configuration.

Thanks,
Aparup

Reply via email to