Hi,

Well, what we are doing at King is trying to solve a similar problem. It
would be great if you could read the blogpost because it goes into detail
about the actual implementation but let me recap here quickly:

We are building a stream processing system that data scientists and other
developers share at King in a way that they can use it through a simple web
interface without knowing any operational details. The stream processing
system itself is One complex Flink job that receives both events and
executes the user scripts/jobs that are written in a higher lever dsl.

The DSL is designed so that we can execute the operations in a fixed
streaming topology instead of having to dynamically deploy new jobs for
every new scripts. Both scripts and events are sent through Kafka so this
makes our backend Flink job naturally multi-tenant. This is of course not
always appropriate as there is no resource isolation between individual
scripts but we can work around this by dedicating backend jobs to different
teams.

Let me know if this helps!
Gyula

Aparup Banerjee (apbanerj) <apban...@cisco.com> ezt írta (időpont: 2016.
júl. 26., K, 14:50):

> Thanks.
>
> Hi Gyula anything , you can share on this?
>
> Aparup
>
>
>
>
> On 7/26/16, 4:38 AM, "Ufuk Celebi" <u...@apache.org> wrote:
>
> >On Mon, Jul 25, 2016 at 5:38 AM, Aparup Banerjee (apbanerj)
> ><apban...@cisco.com> wrote:
> >> We are building a Stream processing system using Apache beam on top of
> Flink
> >> using the Flink Runner. Our pipelines take Kafka streams as sources ,
> and
> >> can write to multiple sinks. The system needs to be tenant aware.
> Tenants
> >> can share same Kafka topic. Tenants can write their own pipelines. We am
> >> providing a small framework to write pipelines (on top of beam),  so we
> have
> >> control of what data stream is available to pipeline developer. I am
> looking
> >> for some strategies for following :
> >>
> >> How can I partition / group the data in a way that pipeline developers
> don’t
> >> need to care about tenancy , but the data integrity is maintained ?
> >> Ways in which I can assign compute(work nodes for e.g) to different jobs
> >> based on Tenant configuration.
> >
> >There is no built-in support for this in Flink, but King.com worked on
> >something similar using custom operators. You can check out the blog
> >post here:
> https://techblog.king.com/rbea-scalable-real-time-analytics-king/
> >
> >I'm pulling in Gyula (cc'd) who worked on the implementation at King...
>

Reply via email to