Re: Does Flink operators synchronize states?

Yuta Morisawa Wed, 04 Nov 2020 02:06:59 -0800

Hi Arvid,

Thank you for your detailed answer. I read your answer and finally foundthat I did not understand well on the difference between micro-batchmodel and continuous(one-by-one) processing model. I am familiar withmicro-batch model but not with continuous one. So, I will search somedocumentation on it. Thank you again your answer.


Regards,
Yuta

On 2020/11/02 1:07, Arvid Heise wrote:

Hi Yuta,
there are indeed a few important differences between Spark and Flink.However, please also note that different APIs behave differently on bothsystems. So it would be good if you could clarify what you are doing, soI can go in more detail.
As a starting point, you can always check the architecture overview page[1] of Flink.
Then keep in mind that Flink approaches the whole scheduling from astreaming perspective and Spark from a batch perspective. In Flink, mosttasks are always running with a few exceptions (pure batch API = Sparkdefault), whereas in Spark tasks are usually scheduled in waves with afew exceptions (continuous processing in structured streaming = Flinkdefault).
Note that there is also quite a bit moving in both systems. In Flink, wetry to get rid of the old batch subsystem and fully integrate it instreaming, such that the actual scheduling mode is determined moredynamically for parts of the whole application. Think of a job where youneed to do some batch preprocessing to build up some dictionary and thenuse it to enrich streaming data. During next year, Flink should be ableto fully exploit the data properties of streaming and batch tasks of thesame application. In Spark, they also seem to work towards supportingmore complex applications in continuous processing mode (so beyond thecurrent embarrassing parallel operations), for which they may also needto revise their scheduling model.
[1]https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html
On Fri, Oct 30, 2020 at 10:05 AM Yuta Morisawa<yu-moris...@kddi-research.jp <mailto:yu-moris...@kddi-research.jp>> wrote:
    Hello,

    I am wondering whether Flink operators synchronize their execution
    states like Apache Spark. In Apache Spark, the master decides
    everything, for example, it schedules jobs and assigns tasks to
    Executors so that each job is executed in a synchronized way. But Flink
    looks different. It appears that each TaskManagers are dedicated to
    specific operators and they asynchronously execute tasks. Is this
    understanding correct?

    In short, I want to know how Flink assigns tasks to TaskManagers and
    how
    manage them because I think it is important for performance tuning.
    Could you tell me If you have any detail documentation?

    Regards,
    Yuta
--
--

Arvid Heise| Senior Java Developer

<https://www.ververica.com/>


Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/>- The Apache FlinkConference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji(Toni) Cheng

Re: Does Flink operators synchronize states?

Reply via email to