Thanks for your quick replies, Nico and Kien. Since I am using Flink-1.3.0,
I will try Nico's idea. I might bug you again for my future problems. 😊

Regards,



Eranga Heshan
*Undergraduate*
Computer Science & Engineering
University of Moratuwa
Mobile:  +94 71 138 2686 <%2B94%2071%20552%202087>
Email: eranga....@gmail.com
<https://www.facebook.com/erangaheshan>   <https://twitter.com/erangaheshan>
   <https://www.linkedin.com/in/erangaheshan>

On Mon, Aug 14, 2017 at 10:36 PM, Nico Kruber <n...@data-artisans.com>
wrote:

> Hi Eranga and Kien,
> Flink supports asynchronous IO since version 1.2, see [1] for details.
>
> You basically pack your URL download into the asynchronous part and collect
> the resulting string for further processing in your pipeline.
>
>
>
> Nico
>
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/dev/stream/
> asyncio.html
>
> On Monday, 14 August 2017 17:50:47 CEST Kien Truong wrote:
> > Hi,
> >
> > While this task is quite trivial to do with Flink Dataset API, using
> > readTextFile to read the input and
> >
> > a flatMap function to perform the downloading, it might not be a good
> idea.
> >
> > The download process is I/O bound, and will block the synchronous
> > flatMap function,
> >
> > so the throughput will not be very good.
> >
> >
> > Until Flink supports asynchronous functions, I suggest you looks
> elsewhere.
> >
> > An example with master-workers architecture using Akka can be found here
> >
> > https://github.com/typesafehub/activator-akka-distributed-workers
> >
> >
> > Regards,
> >
> > Kien
> >
> > On 8/14/2017 10:09 AM, Eranga Heshan wrote:
> > > Hi all,
> > >
> > > I am fairly new to Flink. I have this project where I have a list of
> > > URLs (In one node) which need to be crawled distributedly. Then for
> > > each URL, I need the serialized crawled result to be written to a
> > > single text file.
> > >
> > > I want to know if there are similar projects which I can look into or
> > > an idea on how to implement this.
> > >
> > > Thanks & Regards,
> > >
> > >
> > >
> > >
> > > Eranga Heshan
> > > /Undergraduate/
> > > Computer Science & Engineering
> > > University of Moratuwa
> > > Mobile:     +94 71 138 2686 <tel:%2B94%2071%20552%202087>
> > > Email:      eranga....@gmail.com <mailto:eranga....@gmail.com>
> > > <https://www.facebook.com/erangaheshan>
> > > <https://twitter.com/erangaheshan>
> > > <https://www.linkedin.com/in/erangaheshan>
>
>

Reply via email to