Re: run flink on edge vs hub

Eleanore Jin Mon, 18 May 2020 21:39:45 -0700

Hi Arvid,

Thanks for the suggestion! I will tryout to see how it works.


Best,
Eleanore

On Mon, May 18, 2020 at 8:04 AM Arvid Heise <ar...@ververica.com> wrote:

> Hi Eleanore,
>
> The question in general is what you understand under edge data centers as
> the term is pretty fuzzy. Since Flink is running on Java, it's not suitable
> for embedded clusters as of now. There is plenty of work done already to
> tests that Flink runs on ARM clusters [1].
>
> If you just mean in general moving away from a monolithic hub cluster to
> smaller clusters, then this is easily done with Flink on the compute side.
> The question is rather how data storage should look in such an edge setting
> and how the interfaces look.
>
> From your example, it seems as if you want to use Flink as a reactive
> server, possibly easily scalable. If so, then yes it is possible with
> Flink, even though I'd say it's not the primary use case for Flink. In any
> case, synchronous requests will be a bit difficult/unnatural. I'd probably
> go for an async job pattern. So Flink listens to some port for requests (
> socketTextStream [2]) with a job id, processes data and keeps the data in
> state keyed by job id. The client then uses the job id to fetch the job
> state through queryable state [2]. The responses eventually time out
> through TTL [4].
>
> Of course, you'd put a small proxy in front of that composited job
> (separate input/query port) that translates the queries from the client to
> the Flink job. The proxy would most likely also generate the job id and
> return it to the client. Ultimately, that proxy could offer a synchronous
> interface and pull for the result itself, but that makes the proxy suddenly
> quite heavy.
>
> The proxy setup can be reused for different edge clusters making it a one
> time investment. Note that there are other software stacks for reactive
> servers that offer the functionality out of the box.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-ARM-support-for-Flink-td30298.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#data-sources
> [3]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html
> [4]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl
>
> On Mon, May 18, 2020 at 4:39 AM Eleanore Jin <eleanore....@gmail.com>
> wrote:
>
>> Hi Community,
>>
>> Currently we are running flink in 'hub' data centers where data is
>> ingested into the platform via kafka, and flink job will read from kafka,
>> do the transformations, and publish to another kafka topic.
>>
>> I would also like to see if the same logic (read input message -> do
>> transformation -> return output message) can be applied on 'edge' data
>> centers.
>>
>> The requirement for run on 'edge' is to return the response
>> synchronously. Like the synchronous http based request/response.
>>
>> Can you please provide some guidance/thoughts on this?
>>
>> Thanks a lot!
>> Eleanore
>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: run flink on edge vs hub

Reply via email to