Thank you Aljoscha :-) I actually need it for a Kafka stream, so I use
DataStream API anyway.

Regards,



Eranga Heshan
*Undergraduate*
Computer Science & Engineering
University of Moratuwa
Mobile:  +94 71 138 2686 <%2B94%2071%20552%202087>
Email: eranga....@gmail.com
<https://www.facebook.com/erangaheshan>   <https://twitter.com/erangaheshan>
   <https://www.linkedin.com/in/erangaheshan>

On Fri, Aug 25, 2017 at 5:53 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
>
> It is not available for the Batch API, you would have to use the
> DataStream API.
>
> Best,
> Aljoscha
>
> On 15. Aug 2017, at 01:16, Kien Truong <duckientru...@gmail.com> wrote:
>
> Hi,
>
> Admittedly, I have not suggested this because I thought it was not
> available for batch API.
>
> Regards,
> Kien
> On Aug 15, 2017, at 00:06, Nico Kruber <n...@data-artisans.com> wrote:
>>
>> Hi Eranga and Kien,
>> Flink supports asynchronous IO since version 1.2, see [1] for details.
>>
>> You basically pack your URL download into the asynchronous part and collect
>> the resulting string for further processing in your pipeline.
>>
>>
>>
>> Nico
>>
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/
>> asyncio.html
>>
>> On Monday, 14 August 2017 17:50:47 CEST Kien Truong wrote:
>>
>>
>>>    Hi,
>>>
>>>
>>>
>>>  While this task is quite trivial to do with Flink Dataset API, using
>>>
>>>  readTextFile to read the input and
>>>
>>>
>>>
>>>  a flatMap function to perform the downloading, it might not be a good idea.
>>>
>>>
>>>
>>>  The download process is I/O bound, and will block the synchronous
>>>
>>>  flatMap function,
>>>
>>>
>>>
>>>  so the throughput will not be very good.
>>>
>>>
>>>
>>>
>>>
>>>  Until Flink supports asynchronous functions, I suggest you looks elsewhere.
>>>
>>>
>>>
>>>  An example with master-workers architecture using Akka can be found here
>>>
>>>
>>>
>>>
>>>   https://github.com/typesafehub/activator-akka-distributed-workers
>>>
>>>
>>>
>>>
>>>
>>>  Regards,
>>>
>>>
>>>
>>>  Kien
>>>
>>>
>>>
>>>  On 8/14/2017 10:09 AM, Eranga Heshan wrote:
>>>
>>>
>>>
>>>>     Hi all,
>>>>
>>>>
>>>>
>>>>  I am fairly new to Flink. I have this project where I have a list of
>>>>
>>>>  URLs (In one node) which need to be crawled distributedly. Then for
>>>>
>>>>  each URL, I need the serialized crawled result to be written to a
>>>>
>>>>  single text file.
>>>>
>>>>
>>>>
>>>>  I want to know if there are similar projects which I can look into or
>>>>
>>>>  an idea on how to implement this.
>>>>
>>>>
>>>>
>>>>  Thanks & Regards,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  Eranga Heshan
>>>>
>>>>  /Undergraduate/
>>>>
>>>>  Computer Science & Engineering
>>>>
>>>>  University of Moratuwa
>>>>
>>>>  Mobile: +94 71 138 2686 <+94%2071%20138%202686> 
>>>> <tel:%2B94%2071%20552%202087 <%2B94%2071%20552%202087>>
>>>>
>>>>  Email: eranga....@gmail.com <mailto:eranga....@gmail.com 
>>>> <eranga....@gmail.com>>
>>>>
>>>>  <
>>>>    https://www.facebook.com/erangaheshan>
>>>>
>>>>  <
>>>>    https://twitter.com/erangaheshan>
>>>>
>>>>  <
>>>>    https://www.linkedin.com/in/erangaheshan>
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to