Hi, Admittedly, I have not suggested this because I thought it was not available for batch API.
Regards, Kien On Aug 15, 2017, 00:06, at 00:06, Nico Kruber <n...@data-artisans.com> wrote: >Hi Eranga and Kien, >Flink supports asynchronous IO since version 1.2, see [1] for details. > >You basically pack your URL download into the asynchronous part and >collect >the resulting string for further processing in your pipeline. > > > >Nico > > >[1] >https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/ >asyncio.html > >On Monday, 14 August 2017 17:50:47 CEST Kien Truong wrote: >> Hi, >> >> While this task is quite trivial to do with Flink Dataset API, using >> readTextFile to read the input and >> >> a flatMap function to perform the downloading, it might not be a good >idea. >> >> The download process is I/O bound, and will block the synchronous >> flatMap function, >> >> so the throughput will not be very good. >> >> >> Until Flink supports asynchronous functions, I suggest you looks >elsewhere. >> >> An example with master-workers architecture using Akka can be found >here >> >> https://github.com/typesafehub/activator-akka-distributed-workers >> >> >> Regards, >> >> Kien >> >> On 8/14/2017 10:09 AM, Eranga Heshan wrote: >> > Hi all, >> > >> > I am fairly new to Flink. I have this project where I have a list >of >> > URLs (In one node) which need to be crawled distributedly. Then for >> > each URL, I need the serialized crawled result to be written to a >> > single text file. >> > >> > I want to know if there are similar projects which I can look into >or >> > an idea on how to implement this. >> > >> > Thanks & Regards, >> > >> > >> > >> > >> > Eranga Heshan >> > /Undergraduate/ >> > Computer Science & Engineering >> > University of Moratuwa >> > Mobile: +94 71 138 2686 <tel:%2B94%2071%20552%202087> >> > Email: eranga....@gmail.com <mailto:eranga....@gmail.com> >> > <https://www.facebook.com/erangaheshan> >> > <https://twitter.com/erangaheshan> >> > <https://www.linkedin.com/in/erangaheshan>