I could reproduce the problem Simone reported with the provided batch job. It is definitely a regression with respect to Flink 1.4.2. I suspect that it has something to do with spilling because it is independent of Flip-6 and only occurs if the input data size is big enough. Moreover, we made some changes to the SpillableSubpartition which might have introduced a problem with spilling.
Due to this regression I will cancel the current release candidate. Once we have fixed this problem, I will create a new RC. Cheers, Till On Thu, Apr 5, 2018 at 4:56 PM, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Simone, > > thanks for testing the RC and reporting the issue. I will investigate the > problem with the code you've provided. > > @Bowen, I will take a look at Flink-8837. > > Cheers, > Till > > On Thu, Apr 5, 2018 at 8:54 AM, Bowen Li <bowenl...@gmail.com> wrote: > >> Hi Till, >> >> FLINK-8837 <https://issues.apache.org/jira/browse/FLINK-8837> is marked >> as >> a blocker for release 1.5.0. I've opened a PR for it, can you please take >> a >> look at it? >> >> Bowen >> >> On Tue, Apr 3, 2018 at 8:45 AM, simone <simone.povosca...@gmail.com> >> wrote: >> >> > >> > On 03/04/2018 17:40, simone wrote: >> > >> >> >> >> I tried to run a simple batch job (it tooks about 5 minutes with flink >> >> 1.3.1 on single machine), but it seems to run forever with flink >> >> 1.5-SNAPSHOT (I stopped it after about 2-3 hours multiple times). If >> you >> >> want to try to replicate this, here are the java classes: >> >> https://github.com/xseris/Flink-test-union/tree/master/src/ >> >> main/java/okkam/it/flink/flink150SNAPSHOT (you first need to generate >> >> the test input csv [8 GB] through the RandomCsvGenerator.javaclass). >> >> >> >> With flink 1.3.1 all works. Is it only my behavior or is there any >> >> problem in the architecture introduced in 1.5? >> >> >> >> Thanks, >> >> Simone. >> >> >> >> >> > >> > >