Thanks for reply. I guess I am not looking for alternate. I am trying to understand what flink does in this scenario and if 10 tasks ar egoing in parallel I am sure they will be reading csv as there is no other way.
Thanks On Mon, Feb 19, 2018 at 12:48 AM, Niclas Hedhman <nic...@hedhman.org> wrote: > > Do you really need the large single table created in step 2? > > If not, what you typically do is that the Csv source first do the common > transformations. Then depending on whether the 10 outputs have different > processing paths or not, you either do a split() to do individual > processing depending on some criteria, or you just have the sink put each > record in separate tables. > You have full control, at each step along the transformation path whether > it can be parallelized or not, and if there are no sequential constraints > on your model, then you can easily fill all cores on all hosts quite easily. > > Even if you need the step 2 table, I would still just treat that as a > split(), a branch ending in a Sink that does the storage there. No need to > read records from file over and over again, nor to store them first in step > 2 table and read them out again. > > Don't ask *me* about what happens in failure scenarios... I have myself > not figured that out yet. > > HTH > Niclas > > On Mon, Feb 19, 2018 at 3:11 AM, Darshan Singh <darshan.m...@gmail.com> > wrote: > >> Hi I would like to understand the execution model. >> >> 1. I have a csv files which is say 10 GB. >> 2. I created a table from this file. >> >> 3. Now I have created filtered tables on this say 10 of these. >> 4. Now I created a writetosink for all these 10 filtered tables. >> >> Now my question is that are these 10 filetered tables be written in >> parallel (suppose i have 40 cores and set up parallelism to say 40 as well. >> >> Next question I have is that the table which I created form the csv file >> which is common wont be persisted by flink internally rather for all 10 >> filtered tables it will read csv files and then apply the filter and write >> to sink. >> >> I think that for all 10 filtered tables it will read csv again and again >> in this case it will be read 10 times. Is my understanding correct or I am >> missing something. >> >> What if I step 2 I change table to dataset and back? >> >> Thanks >> > > > > -- > Niclas Hedhman, Software Developer > http://polygene.apache.org - New Energy for Java >