Re: Need to understand the execution model of the Flink

Niclas Hedhman Sun, 18 Feb 2018 16:51:07 -0800

Do you really need the large single table created in step 2?

If not, what you typically do is that the Csv source first do the common
transformations. Then depending on whether the 10 outputs have different
processing paths or not, you either do a split() to do individual
processing depending on some criteria, or you just have the sink put each
record in separate tables.
You have full control, at each step along the transformation path whether
it can be parallelized or not, and if there are no sequential constraints
on your model, then you can easily fill all cores on all hosts quite easily.


Even if you need the step 2 table, I would still just treat that as a
split(), a branch ending in a Sink that does the storage there. No need to
read records from file over and over again, nor to store them first in step
2 table and read them out again.

Don't ask *me* about what happens in failure scenarios... I have myself not
figured that out yet.

HTH
Niclas

On Mon, Feb 19, 2018 at 3:11 AM, Darshan Singh <darshan.m...@gmail.com>
wrote:

> Hi I would like to understand the execution model.
>
> 1. I have a csv files which is say 10 GB.
> 2. I created a table from this file.
>
> 3. Now I have created filtered tables on this say 10 of these.
> 4. Now I created a writetosink for all these 10 filtered tables.
>
> Now my question is that are these 10 filetered tables be written in
> parallel (suppose i have 40 cores and set up parallelism to say 40 as well.
>
> Next question I have is that the table which I created form the csv file
> which is common wont be persisted by flink internally rather for all 10
> filtered tables it will read csv files and then apply the filter and write
> to sink.
>
> I think that for all 10 filtered tables it will read csv again and again
> in this case it will be read 10 times.  Is my understanding correct or I am
> missing something.
>
> What if I step 2 I change table to dataset and back?
>
> Thanks
>



-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Re: Need to understand the execution model of the Flink

Reply via email to