Indeed, using the same parallelism corrected the output. Thank you!

On Thu, Aug 11, 2016 at 2:34 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> The source runs parallel (n tasks), but the sink has a parallelism of 1.
> The sink hence has to merge the parallel streams from the source, which
> happens based on arrival speed of the streams, i.e., its not deterministic.
> That's why you see the lines being mixed.
>
> Try running source and sink with the same parallelism, then no merge of
> streams needs to happen. You'll see then that per output file, the lines
> are correct.
>
> Stephan
>
>
> On Thu, Aug 11, 2016 at 2:29 PM, Yassin Marzouki <yassmar...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> When I use out.collect() twice inside a faltMap, the output is sometimes
>> and randomly skewed. Take this example:
>>
>> final StreamExecutionEnvironment env = StreamExecutionEnvironment.cre
>> ateLocalEnvironment();
>>     env.generateSequence(1, 100000)
>>         .flatMap((Long t, Collector<String> out) -> {
>>             out.collect("line1");
>>             out.collect("line2");
>>         })
>>         .writeAsText("test",FileSystem.WriteMode.OVERWRITE).
>> setParallelism(1);
>> env.execute("Test");
>>
>> I expect the output to be
>> line1
>> line2
>> line1
>> line2
>> ...
>>
>> But some resulting lines (18 out of 200000) were:
>> line2
>> line2
>> and the same for line1.
>>
>> What could be the reason for this?
>>
>> Best,
>> Yassine
>>
>
>

Reply via email to