Do you really need the large single table created in step 2? If not, what you typically do is that the Csv source first do the common transformations. Then depending on whether the 10 outputs have different processing paths or not, you either do a split() to do individual processing depending on some criteria, or you just have the sink put each record in separate tables. You have full control, at each step along the transformation path whether it can be parallelized or not, and if there are no sequential constraints on your model, then you can easily fill all cores on all hosts quite easily.
Even if you need the step 2 table, I would still just treat that as a split(), a branch ending in a Sink that does the storage there. No need to read records from file over and over again, nor to store them first in step 2 table and read them out again. Don't ask *me* about what happens in failure scenarios... I have myself not figured that out yet. HTH Niclas On Mon, Feb 19, 2018 at 3:11 AM, Darshan Singh <darshan.m...@gmail.com> wrote: > Hi I would like to understand the execution model. > > 1. I have a csv files which is say 10 GB. > 2. I created a table from this file. > > 3. Now I have created filtered tables on this say 10 of these. > 4. Now I created a writetosink for all these 10 filtered tables. > > Now my question is that are these 10 filetered tables be written in > parallel (suppose i have 40 cores and set up parallelism to say 40 as well. > > Next question I have is that the table which I created form the csv file > which is common wont be persisted by flink internally rather for all 10 > filtered tables it will read csv files and then apply the filter and write > to sink. > > I think that for all 10 filtered tables it will read csv again and again > in this case it will be read 10 times. Is my understanding correct or I am > missing something. > > What if I step 2 I change table to dataset and back? > > Thanks > -- Niclas Hedhman, Software Developer http://zest.apache.org - New Energy for Java