Thank you Aljoscha :-) I actually need it for a Kafka stream, so I use DataStream API anyway.
Regards, Eranga Heshan *Undergraduate* Computer Science & Engineering University of Moratuwa Mobile: +94 71 138 2686 <%2B94%2071%20552%202087> Email: eranga....@gmail.com <https://www.facebook.com/erangaheshan> <https://twitter.com/erangaheshan> <https://www.linkedin.com/in/erangaheshan> On Fri, Aug 25, 2017 at 5:53 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > > It is not available for the Batch API, you would have to use the > DataStream API. > > Best, > Aljoscha > > On 15. Aug 2017, at 01:16, Kien Truong <duckientru...@gmail.com> wrote: > > Hi, > > Admittedly, I have not suggested this because I thought it was not > available for batch API. > > Regards, > Kien > On Aug 15, 2017, at 00:06, Nico Kruber <n...@data-artisans.com> wrote: >> >> Hi Eranga and Kien, >> Flink supports asynchronous IO since version 1.2, see [1] for details. >> >> You basically pack your URL download into the asynchronous part and collect >> the resulting string for further processing in your pipeline. >> >> >> >> Nico >> >> >> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/ >> asyncio.html >> >> On Monday, 14 August 2017 17:50:47 CEST Kien Truong wrote: >> >> >>> Hi, >>> >>> >>> >>> While this task is quite trivial to do with Flink Dataset API, using >>> >>> readTextFile to read the input and >>> >>> >>> >>> a flatMap function to perform the downloading, it might not be a good idea. >>> >>> >>> >>> The download process is I/O bound, and will block the synchronous >>> >>> flatMap function, >>> >>> >>> >>> so the throughput will not be very good. >>> >>> >>> >>> >>> >>> Until Flink supports asynchronous functions, I suggest you looks elsewhere. >>> >>> >>> >>> An example with master-workers architecture using Akka can be found here >>> >>> >>> >>> >>> https://github.com/typesafehub/activator-akka-distributed-workers >>> >>> >>> >>> >>> >>> Regards, >>> >>> >>> >>> Kien >>> >>> >>> >>> On 8/14/2017 10:09 AM, Eranga Heshan wrote: >>> >>> >>> >>>> Hi all, >>>> >>>> >>>> >>>> I am fairly new to Flink. I have this project where I have a list of >>>> >>>> URLs (In one node) which need to be crawled distributedly. Then for >>>> >>>> each URL, I need the serialized crawled result to be written to a >>>> >>>> single text file. >>>> >>>> >>>> >>>> I want to know if there are similar projects which I can look into or >>>> >>>> an idea on how to implement this. >>>> >>>> >>>> >>>> Thanks & Regards, >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Eranga Heshan >>>> >>>> /Undergraduate/ >>>> >>>> Computer Science & Engineering >>>> >>>> University of Moratuwa >>>> >>>> Mobile: +94 71 138 2686 <+94%2071%20138%202686> >>>> <tel:%2B94%2071%20552%202087 <%2B94%2071%20552%202087>> >>>> >>>> Email: eranga....@gmail.com <mailto:eranga....@gmail.com >>>> <eranga....@gmail.com>> >>>> >>>> < >>>> https://www.facebook.com/erangaheshan> >>>> >>>> < >>>> https://twitter.com/erangaheshan> >>>> >>>> < >>>> https://www.linkedin.com/in/erangaheshan> >>>> >>>> >>>> >>> >> >