Thanks for your quick replies, Nico and Kien. Since I am using Flink-1.3.0, I will try Nico's idea. I might bug you again for my future problems. 😊
Regards, Eranga Heshan *Undergraduate* Computer Science & Engineering University of Moratuwa Mobile: +94 71 138 2686 <%2B94%2071%20552%202087> Email: eranga....@gmail.com <https://www.facebook.com/erangaheshan> <https://twitter.com/erangaheshan> <https://www.linkedin.com/in/erangaheshan> On Mon, Aug 14, 2017 at 10:36 PM, Nico Kruber <n...@data-artisans.com> wrote: > Hi Eranga and Kien, > Flink supports asynchronous IO since version 1.2, see [1] for details. > > You basically pack your URL download into the asynchronous part and collect > the resulting string for further processing in your pipeline. > > > > Nico > > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.3/dev/stream/ > asyncio.html > > On Monday, 14 August 2017 17:50:47 CEST Kien Truong wrote: > > Hi, > > > > While this task is quite trivial to do with Flink Dataset API, using > > readTextFile to read the input and > > > > a flatMap function to perform the downloading, it might not be a good > idea. > > > > The download process is I/O bound, and will block the synchronous > > flatMap function, > > > > so the throughput will not be very good. > > > > > > Until Flink supports asynchronous functions, I suggest you looks > elsewhere. > > > > An example with master-workers architecture using Akka can be found here > > > > https://github.com/typesafehub/activator-akka-distributed-workers > > > > > > Regards, > > > > Kien > > > > On 8/14/2017 10:09 AM, Eranga Heshan wrote: > > > Hi all, > > > > > > I am fairly new to Flink. I have this project where I have a list of > > > URLs (In one node) which need to be crawled distributedly. Then for > > > each URL, I need the serialized crawled result to be written to a > > > single text file. > > > > > > I want to know if there are similar projects which I can look into or > > > an idea on how to implement this. > > > > > > Thanks & Regards, > > > > > > > > > > > > > > > Eranga Heshan > > > /Undergraduate/ > > > Computer Science & Engineering > > > University of Moratuwa > > > Mobile: +94 71 138 2686 <tel:%2B94%2071%20552%202087> > > > Email: eranga....@gmail.com <mailto:eranga....@gmail.com> > > > <https://www.facebook.com/erangaheshan> > > > <https://twitter.com/erangaheshan> > > > <https://www.linkedin.com/in/erangaheshan> > >