Hi Yongjun, Automatic failover sure needs to be fixed (see HDFS-14130 and HDFS-13182). Along with all other outstanding issues. We plan to continue this on trunk. The feature is usable now without this issues (see HDFS-14067). And we would like to get it in, so that people could have early access, and so that newly developed features were aware of this functionality. Let us know if you have other suggestions.
Thanks, --Konstantin On Wed, Dec 5, 2018 at 11:24 PM Yongjun Zhang <yzh...@cloudera.com> wrote: > Great work guys. > > Wonder if we can elaborate what's impact of not having #2 fixed, and why > #2 is not needed for the feature to complete? > 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't > know about ObserverNodes trying to convert them to SBNs. > > Thanks. > --Yongjun > > > On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.had...@gmail.com> > wrote: > >> Hi Hadoop developers, >> >> I would like to propose to merge to trunk the feature branch HDFS-12943 >> for >> Consistent Reads from Standby Node. The feature is intended to scale read >> RPC workloads. On large clusters reads comprise 95% of all RPCs to the >> NameNode. We should be able to accommodate higher overall RPC workloads >> (up >> to 4x by some estimates) by adding multiple ObserverNodes. >> >> The main functionality has been implemented see sub-tasks of HDFS-12943. >> We followed up with the test plan. Testing was done on two independent >> clusters (see HDFS-14058 and HDFS-14059) with security enabled. >> We ran standard HDFS commands, MR jobs, admin commands including manual >> failover. >> We know of one cluster running this feature in production. >> >> There are a few outstanding issues: >> 1. Need to provide proper documentation - a user guide for the new feature >> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't >> know about ObserverNodes trying to convert them to SBNs. >> 3. Scale testing and performance fine-tuning >> 4. As testing progresses, we continue fixing non-critical bugs like >> HDFS-14116. >> >> I attached a unified patch to the umbrella jira for the review and Jenkins >> build. >> Please vote on this thread. The vote will run for 7 days until Wed Dec 12. >> >> Thanks, >> --Konstantin >> >