-1 Changing the memory consumption between minor releases should not happen.

The good news: Robert ran a test with the latest 1.1 branch that
contains a fix for the changed RocksDB memory configuration and
reported stable behaviour.

@Flavio: I agree, but since we're already very late with this bugfix
release I would not like to wait for the PR to be merged. We can
include it in 1.1.5, which can follow very soon imo. I hope that's OK
for you.

On Fri, Dec 16, 2016 at 11:07 AM, Gyula Fóra <gyula.f...@gmail.com> wrote:
> @Robert
>
> -I am not sure if the RocksDB problems are closely related to the version
> upgrade, I have been experiencing similar problems for months. This is
> usually not a huge problem on YARN I think, it mostly hurts in standalone
> clusters.
> -Also the yarn memory limits are tricky to configure nicely as it depends a
> lot on how rocks handles native memory. It seems to grow quite a lot over
> time.
>
>
> Flavio Pompermaier <pomperma...@okkam.it> ezt írta (időpont: 2016. dec.
> 16., P, 10:56):
>
>> I personally think that it should be quite important to have a fix also for
>> the ES connector (https://issues.apache.org/jira/browse/FLINK-5122).
>>
>> Best,
>> Flavio
>>
>> On Fri, Dec 16, 2016 at 10:43 AM, Robert Metzger <rmetz...@apache.org>
>> wrote:
>>
>> > I'm not sure if we can release the release candidate like this, because
>> I'm
>> > running into two issues probably related to a recent rocksdb version
>> > upgrade.
>> >
>> > This is my list of points so far:
>> >
>> > - Checked the staging repository. Quickstarts and Hadoop 1 / 2 are okay.
>> > - Build a job against the staging repository
>> > - Binaries deploy on a kerberized HA YARN / HDFS setup. Ran the KMeans
>> and
>> > WordCount batch jobs
>> > - Executed a heavy, misbehaved streaming job for a few hours. While
>> running
>> > that job, I found that:
>> >   - Not all checkpoint directories are cleaned up in HDFS (I use the
>> async
>> > rocksdb statebackend)
>> >   -  segfaults from rocksdb (8 segfaults in ~3 hrs, but they were all
>> > happening in the last minutes)
>> >   - "beyond physical memory limits" container killings from YARN (I know
>> we
>> > can configure this, I just wonder what if we should change the default
>> > value)
>> >   -  the segfaults and memory limits caused the job to not run anymore in
>> > the end because it was in a constant retry loop.
>> >   - This is not a blocking issue I found during the testing:
>> > https://issues.apache.org/jira/browse/FLINK-5345
>> >   - This is also a non blocking issue for 1.1.4 (fixed for 1.2)
>> > https://issues.apache.org/jira/browse/FLINK-4631
>> >
>> >
>> > Let me know if we should release anyways or fix these issues first.
>> >
>> >
>> > On Tue, Dec 13, 2016 at 11:04 PM, Ufuk Celebi <u...@apache.org> wrote:
>> >
>> > > Dear Flink community,
>> > >
>> > > Please vote on releasing the following candidate as Apache Flink
>> version
>> > > 1.1.4.
>> > >
>> > > The commit to be voted on:
>> > > 2cd6579 (http://git-wip-us.apache.org/repos/asf/flink/commit/2cd6579)
>> > >
>> > > Branch:
>> > > release-1.1.4-rc3
>> > > (https://git1-us-west.apache.org/repos/asf/flink/repo?p=flin
>> > > k.git;a=shortlog;h=refs/heads/release-1.1.4-rc3)
>> > >
>> > > The release artifacts to be voted on can be found at:
>> > > http://people.apache.org/~uce/flink-1.1.4-rc3/
>> > >
>> > > The release artifacts are signed with the key with fingerprint
>> 9D403309:
>> > > http://www.apache.org/dist/flink/KEYS
>> > >
>> > > The staging repository for this release can be found at:
>> > > https://repository.apache.org/content/repositories/orgapacheflink-1109
>> > >
>> > > -------------------------------------------------------------
>> > >
>> > > The voting time is at least three days and the vote passes if a
>> > > majority of at least three +1 PMC votes are cast. The vote ends
>> earliest
>> > > on Friday, December 16th, 2016, at 11 PM (CET)/2 PM (PST).
>> > >
>> > > [ ] +1 Release this package as Apache Flink 1.1.4
>> > > [ ] -1 Do not release this package, because ...
>> > >
>> >
>>

Reply via email to