Hi Sammi, RATIS-2149 <https://issues.apache.org/jira/browse/RATIS-2149> is a minor problem which has been there for a very long time. It said that, when a server starts, it may start a leader election even if the server is not ready. The reporter said that they would call addGroup to a new server S before calling setConf to add S to the group. When setConf has failed (not sure why) and S has started a leader election, S could get a NOT_IN_CONF reply and then shut down.
It seems that they might have called addGroup incorrectly. If we call addGroup with an empty group, the new server will start with the initializing state but not the follower state, see [1]. Then, it won't start a leader election. [1] https://github.com/apache/ratis/blob/3a51121adaf2145e4ec020f4c24858f9f03745d2/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L398 Tsz-Wo On Fri, Oct 25, 2024 at 2:16 AM Sammi Chen <sammic...@apache.org> wrote: > Hi Tsz-Wo, > > What's the impact of https://issues.apache.org/jira/browse/RATIS-2149? > Will it cause OM HA leader audo election fail in some circumstances? > > Thanks, > Sammi > > On Wed, 23 Oct 2024 at 05:43, Tsz Wo Sze <szets...@gmail.com> wrote: > > > +1 for releasing Ozone 1.4.1 with Ratis 3.1.1. > > > > Tsz-Wo > > > > On Tue, Oct 22, 2024 at 1:10 PM Ethan Rose <er...@apache.org> wrote: > > > > > Hi, any updates on the current 1.4.1 progress? Ratis 3.1.1 should be in > > > Ozone now that HDDS-11504 < > > > https://issues.apache.org/jira/browse/HDDS-11504> > > > is resolved. I see there’s discussion of doing a Ratis 3.1.2 to fix > > > RATIS-2149 <https://issues.apache.org/jira/browse/RATIS-2149> and > > > RATIS-2172 > > > <https://issues.apache.org/jira/browse/RATIS-2172>, but our 1.4.1 > > release > > > has already been delayed for a while, so I think we should ship with > > Ratis > > > 3.1.1 and do a 1.4.2 release with just the patch version of Ratis if > > > necessary. > > > > > > I see some new fixes targeting the release like HDDS-11223 > > > <https://issues.apache.org/jira/browse/HDDS-11223> and HDDS-11136 > > > <https://issues.apache.org/jira/browse/HDDS-11136>, which is good. > What > > is > > > the overall status update? Are we ready for the next release candidate? > > > > > > > > > Ethan > > > > > > On Wed, Aug 21, 2024 at 12:33 PM Tsz Wo Sze <szets...@gmail.com> > wrote: > > > > > > > > (2) Key put fails for large files (> 20GB) due to a memory leak in > > > Ratis > > > > 3.1.0 > > > > ... > > > > > > > > Duong & Wei-chiu, > > > > > > > > Thanks for finding this problem! > > > > > > > > Agree that we should have a Ratis 3.1.1 release. > > > > BTW, "Memory leak" usually means that memory was allocated but not > > > > released; see https://en.wikipedia.org/wiki/Memory_leak . In this > > case, > > > we > > > > are not having such a problem. Our problem is unnecessarily using too > > > much > > > > memory. > > > > > > > > Tsz-Wo > > > > > > > > > > > > On Tue, Aug 20, 2024 at 6:20 PM Duong Nguyen > > <du...@cloudera.com.invalid > > > > > > > > wrote: > > > > > > > > > I also filed https://issues.apache.org/jira/browse/RATIS-2141 to > > track > > > > the > > > > > memory leak issue. > > > > > > > > > > Thanks, > > > > > Duong > > > > > > > > > > On Tue, Aug 20, 2024 at 6:17 PM Duong Nguyen <du...@cloudera.com> > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > I just started a thread to discuss releasing Ratis 3.1.1 with the > > > fixes > > > > > of > > > > > > the mentioned issues. > > > > > > > > > > > > Duong > > > > > > > > > > > > On Tue, Aug 20, 2024 at 5:30 PM Uma Maheswara Rao Gangumalla < > > > > > > umaganguma...@gmail.com> wrote: > > > > > > > > > > > >> Hi Wei-Chiu, > > > > > >> > > > > > >> Thank you and Duong for the important update on RC1. > > > > > >> > > > > > >> @Duong would you be notifying this to Ratis community if they > can > > > > make a > > > > > >> quick release with just above 2 fixes? > > > > > >> > > > > > >> Regards, > > > > > >> Uma > > > > > >> > > > > > >> > > > > > >> On Tue, Aug 20, 2024 at 4:51 PM Wei-Chiu Chuang < > > weic...@apache.org > > > > > > > > > >> wrote: > > > > > >> > > > > > >>> Hi thanks for the effort, > > > > > >>> We are testing the latest Ozone master and Ratis 3.1.0 > > internally, > > > > and > > > > > >>> found a few critical issues. > > > > > >>> > > > > > >>> (1) RATIS-2132 < > https://issues.apache.org/jira/browse/RATIS-2132 > > > > > > > > which > > > > > >>> has > > > > > >>> about 10% performance regression penalty. > > > > > >>> (2) Key put fails for large files (> 20GB) due to a memory leak > > in > > > > > Ratis > > > > > >>> 3.1.0: it was a haft-done feature of RATIS-1931. DataNode could > > > crash > > > > > due > > > > > >>> to out of memory. > > > > > >>> > > > > > >>> Both of them can only be fixed in Ratis. > > > > > >>> I'd suggest to not use Ratis 3.1.0 in Ozone 1.4.1 release. > > > > > >>> > > > > > >>> If we can, I'd ask the Ratis community to release Ratis 3.1.1 > > with > > > > the > > > > > >>> above two fixes. > > > > > >>> > > > > > >>> cc: @Duong Nguyen <du...@cloudera.com> who helped root cause > the > > > two > > > > > >>> issues. > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> On Tue, Aug 20, 2024 at 3:31 PM Siyao Meng <si...@apache.org> > > > wrote: > > > > > >>> > > > > > >>> > +1 (binding) > > > > > >>> > > > > > > >>> > > > > > > >>> > - Verified signatures > > > > > >>> > - Verified checksums > > > > > >>> > - Checked ./bin/ozone version output from binary tarball > > > > > >>> > - Checked ./bin/ozone checknative output from binary > tarball > > > > > >>> > - rocks_tools_native lib check is missing, filed > > HDDS-11347 > > > > > >>> > <https://issues.apache.org/jira/browse/HDDS-11347>, > > > > > >>> non-blocking. > > > > > >>> > - Checked source tarball content matched repo tag > > > > > ozone-1.4.1-RC1 > > > > > >>> > - Built from source (without native libs support) > > > > > >>> > - Verified compose/ozone Docker dev cluster boots up > > correctly > > > > > with > > > > > >>> 3 > > > > > >>> > Ozone datanodes. > > > > > >>> > - Verified basic volume, bucket, key creation and deletion > > > works > > > > > in > > > > > >>> > Docker dev cluster. > > > > > >>> > - Volume recursive deletion prompt is incorrect, filed > > > > > HDDS-11346 > > > > > >>> > <https://issues.apache.org/jira/browse/HDDS-11346>, > > > > > >>> non-blocking. > > > > > >>> > > > > > > >>> > > > > > > >>> > -Siyao > > > > > >>> > > > > > > >>> > On Aug 19, 2024 at 6:39:08 AM, Ayush Saxena < > > ayush...@gmail.com> > > > > > >>> wrote: > > > > > >>> > > > > > > >>> > > +1 (Binding), some minor stuff which we should fix in next > > > > release > > > > > >>> > > > > > > > >>> > > * Built from source > > > > > >>> > > * Verified Checksums > > > > > >>> > > * Verified Signatures > > > > > >>> > > * All source files have apache header > > > > > >>> > > * No code diff b/w the git tag & the contents of src tar > > > > > >>> > > (dependency-reduced-pom only in src tar, maybe that ain't > > > > required > > > > > >>> > > there) > > > > > >>> > > * Verified the output of ozone version > > > > > >>> > > * Ran some basic shell commands > > > > > >>> > > * Checked the NOTICE file: The year is *wrong*, it says > 2022, > > > it > > > > > >>> > > should be 2024 [1], should correct in next release > > > > > >>> > > * The NOTICE file inside the packaged Jars is *wrong*, It > > > > mentions > > > > > >>> > > *Apache Hadoop* & Copyright since 2006, it should be Apache > > > > Ozone, > > > > > >>> > > should fix in the next release. > > > > > >>> > > It currently prints: > > > > > >>> > > ``` > > > > > >>> > > Apache Hadoop > > > > > >>> > > Copyright 2006 and onwards The Apache Software Foundation. > > > > > >>> > > . > > > > > >>> > > . > > > > > >>> > > Hadoop Yarn Server Web Proxy uses the BouncyCastle Java > > > > > >>> > > cryptography APIs written by the Legion of the Bouncy > Castle > > > Inc. > > > > > >>> > > > > > > > >>> > > ``` > > > > > >>> > > Can try something like to validate: > > > > > >>> > > jar xf share/ozone/lib/ozone-client-1.4.1.jar > > > META-INF/NOTICE.txt > > > > > >>> > > cat META-INF/NOTICE.txt > > > > > >>> > > > > > > > >>> > > Thanx Xi Chen for driving the release, Good Luck!!! > > > > > >>> > > > > > > > >>> > > -Ayush > > > > > >>> > > > > > > > >>> > > [1] > > > > > >>> > > > > > > > > https://github.com/apache/ozone/blob/ozone-1.4.1-RC1/NOTICE.txt#L1-L2 > > > > > >>> > > > > > > > >>> > > On Mon, 19 Aug 2024 at 11:20, Sammi Chen < > > sammic...@apache.org > > > > > > > > > >>> wrote: > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > +1 (binding) > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > * Verified the signature and checksums > > > > > >>> > > > > > > > >>> > > * Verified tag > > > > > >>> > > > > > > > >>> > > * Build from source > > > > > >>> > > > > > > > >>> > > * Run ozonesecure acceptance test > > > > > >>> > > > > > > > >>> > > * Start a cluster using bin package > > > > > >>> > > > > > > > >>> > > * Run freon rk command with data verification > > > > > >>> > > > > > > > >>> > > * Verified information displayed on Recon UI, for both > empty > > > > > cluster > > > > > >>> and > > > > > >>> > > > > > > > >>> > > cluster with data > > > > > >>> > > > > > > > >>> > > Sammi > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > On Fri, 16 Aug 2024 at 13:13, mrchenx <mrch...@126.com> > > wrote: > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > Dear Ozone Devs, As discussed in the last email, I am > > > > calling > > > > > >>> for a > > > > > >>> > > > > > > > >>> > > > vote on Apache Ozone 1.4.1 RC1. > > > > > >>> > > > > > > > >>> > > > We have released 1.4.0 on Jan 19th. Now there are 177 > > new > > > > > >>> commits > > > > > >>> > > > > > > > >>> > > > already landed on 1.4.1 branch, Includes Ratis upgrade > > > (upgrade > > > > > to > > > > > >>> > Ratis > > > > > >>> > > > > > > > >>> > > > 3.1.0), some bug fixes, as well as performance > > optimizations, > > > > and > > > > > >>> some > > > > > >>> > > > > > > > >>> > > > necessary dependencies. I am calling for a vote on > > Apache > > > > > Ozone > > > > > >>> > 1.4.1 > > > > > >>> > > > > > > > >>> > > > RC1. - The RC1 tag can be found on Github at: > > > > > >>> > > > > > > > >>> > > > - > > > > > >>> https://github.com/apache/ozone/releases/tag/ozone-1.4.1-RC1 > > > > > >>> > > > > > > > >>> > > > - 177 Jiras were cherry-pick for ozone-1.4.1 > > > > > >>> > > > > > > > >>> > > > - > > > > > >>> > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20HDDS%20AND%20fixVersion%20%3D%201.4.1 > > > > > >>> > > > > > > > >>> > > > - The source and binary tarballs can be found at: > > > > > >>> > > > > > > > >>> > > > - > > > > > https://dist.apache.org/repos/dist/dev/ozone/1.4.1-rc1/ > > > > > >>> > > > > > > > >>> > > > - Maven artifacts are staged at: > > > > > >>> > > > > > > > >>> > > > - > > > > > >>> > > > > > > > >>> > > > > > > > > >>> > > > > > https://repository.apache.org/content/repositories/orgapacheozone-1024 > > > > > >>> > > > > > > > >>> > > > - The public key used to sign the artifacts can be > found > > > at: > > > > > >>> > > > > > > > >>> > > > - > > > > https://dist.apache.org/repos/dist/release/ozone/KEYS > > > > > >>> > > > > > > > >>> > > > - The fingerprint of the key used to sign the > artifacts > > > is: > > > > > >>> > > > > > > > >>> > > > - 0D8C19F5514E2786007936F758C87003FF9A1A38 > > > > > >>> > > > > > > > >>> > > > The vote will run for 7 days, ending on Aug 23th 2024 > at > > > > 13:10 > > > > > >>> pm > > > > > >>> > > UTC+8. > > > > > >>> > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > Thanks > > > > > >>> > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > Xi Chen > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > > --------------------------------------------------------------------- > > > > > >>> > > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > > > > > >>> > > For additional commands, e-mail: dev-h...@ozone.apache.org > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >> > > > > > > > > > > > > > > >