Re: Reading from multiple input files with fewer task slots

Pieter Hameete Mon, 05 Oct 2015 03:42:49 -0700

Hi Stephen,

it concerns the DataSet API.


The program im running can be found at
https://github.com/PHameete/dawn-flink/blob/development/src/main/scala/wis/dawnflink/performance/xmark/XMarkQuery11.scala
The Custom Input Format at
https://github.com/PHameete/dawn-flink/blob/development/src/main/scala/wis/dawnflink/parsing/xml/XML2DawnInputFormat.java

Cheers!

2015-10-05 12:38 GMT+02:00 Stephan Ewen <se...@apache.org>:

> I assume this concerns the streaming API?
>
> Can you share your program and/or the custom input format code?
>
> On Mon, Oct 5, 2015 at 12:33 PM, Pieter Hameete <phame...@gmail.com>
> wrote:
>
>> Hello Flinkers!
>>
>> I run into some strange behavior when reading from a folder of input
>> files.
>>
>> When the number of input files in the folder exceeds the number of task
>> slots I noticed that the size of my datasets varies with each run. It seems
>> as if the transformations don't wait for all input files to be read.
>>
>> When I have equal or more task slots than there are files, there are no
>> problems.
>>
>> I'm using a custom input format. Could there be a problem with my custom
>> input format, and if so what could I be forgetting?
>>
>> Kind regards and thank you for your time!
>>
>> Pieter
>>
>
>

Re: Reading from multiple input files with fewer task slots

Reply via email to