Thank you all.
- All tags for the subtasks in JIRA, such as Fix Version, Components, Labels, Target Version, and Flags, have been updated accordingly. - The failed unit tests are unrelated to this PR. - With four +1 votes received and no objections, I will proceed to initiate the official voting process. On Mon, 6 Jan 2025 at 12:00, Xiaoqiao He <hexiaoq...@apache.org> wrote: > Thanks all for your great work and the big step progress here. > > Some nit comments before check in, > a. Please check the failed unit tests if related to this PR at > https://github.com/apache/hadoop/pull/6762. > It is better to execute and get a green result before check in. > b. Please mark the correct tag `fix version`, `Component/s`, `Labels` and > `Flags` for the subtask in JIRA. > Some examples are [1][2]. > Good luck! > > Best Regards, > - He Xiaoqiao > > [1] https://issues.apache.org/jira/browse/HDFS-13891 > [2] https://issues.apache.org/jira/browse/HDFS-17531 > > > On Thu, Jan 2, 2025 at 10:54 AM Zhanghaobo <hfutzhan...@163.com> wrote: > >> Thanks for your great work! +1 for merging phase 1 codes. >> >> My product clusters have been running phase 1 codes for several months, >> it looks good. >> >> Hope to push this feature forward. >> >> >> >> 张浩博 >> hfutzhan...@163.com >> >> <https://dashi.163.com/projects/signature-manager/detail/index.html?ftlId=1&name=%E5%BC%A0%E6%B5%A9%E5%8D%9A&uid=hfutzhanghb%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsmc804eb39b0e7885aa8801c3bb66e497d.jpg&items=%5B%22hfutzhanghb%40163.com%22%5D> >> >> ---- Replied Message ---- >> From haiyang hu<haiyang87...@gmail.com> <haiyang87...@gmail.com> >> Date 12/31/2024 23:08 >> To Ayush Saxena<ayush...@gmail.com> <ayush...@gmail.com> >> Cc Hui Fei<feihui.u...@gmail.com> , >> <feihui.u...@gmail.com> ZanderXu<zande...@apache.org> , >> <zande...@apache.org> Hdfs-dev<hdfs-dev@hadoop.apache.org> , >> <hdfs-dev@hadoop.apache.org> <priv...@hadoop.apache.org> , >> <priv...@hadoop.apache.org> Xiaoqiao He<hexiaoq...@apache.org> , >> <hexiaoq...@apache.org> slfan1989<slfan1...@apache.org> , >> <slfan1...@apache.org> <xuzq_zan...@163.com> <xuzq_zan...@163.com> >> Subject Re: Discussion about NameNode Fine-grained locking >> Thanks for your hard work and push it forward. >> It looks good, +1 for merging phase 1 codes, hope we can work together to >> promote this major HDFS optimization, >> so that more companies can benefit from it. >> >> Thanks everyone~ >> >> Ayush Saxena <ayush...@gmail.com> 于2024年12月31日周二 20:33写道: >> >> +1, >> Thanx folks for your efforts on this! I didn't have time to review >> everything thoroughly, but my initial pass suggests it looks good or >> atleast is safe to merge. >> If I find some spare time, I'll test it further and submit a ticket or >> so if I encounter any issues. >> >> Good Luck!!! >> >> -Ayush >> >> On Tue, 31 Dec 2024 at 16:39, Hui Fei <feihui.u...@gmail.com> wrote: >> >> >> Thanks Zander for bringing this discussion again and trying your best to >> >> push it forward. It's really a long time since last discussion. >> >> >> It’s indeed time, +1 for merging phase 1 codes based on the following >> >> points >> >> - The phase 1 feature has been running at scale within companies for a >> >> long time >> >> - The long-term plan is clear, and also addressed some questions raised >> >> by the community >> >> - The testing result of future features on memory and performance >> >> ZanderXu <zande...@apache.org> 于2024年12月31日周二 15:36写道: >> >> >> Hi, everyone: >> >> Time to Merge FGL Phase I >> >> The PR for FGL Phase I is ready for merging! Please take a moment to >> >> review and cast your vote: https://github.com/apache/hadoop/pull/6762. >> >> >> The FGL Phase I has been running successfully in production for over >> >> six months at Shopee and BOSS Zhipin, with no reported performance or >> stability issues. It’s now the right time to merge it into the trunk >> branch, allowing us to move forward with Phase II. >> >> >> The global lock remains the default lock mode, but users can enable FGL >> >> by configuring >> >> dfs.namenode.lock.model.provider.class=org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock. >> >> >> If there are no objections within 7 days, I will propose an official >> >> vote. >> >> >> Performance and Memory Usage of Phase I >> >> Conclusion: >> >> Fine-grained locks do not lead to significant performance improvements. >> >> Fine-grained locks do not result in additional memory consumption >> >> Reasons: >> >> BM operations heavily depend on FS operations: IBR and BR still acquire >> >> the global lock (FSLock and BMLock). >> >> >> FS operations depend on BM operations: Common operations (create, >> >> addBlock, getBlockLocations) also acquire the global lock (FSLock and >> BMLock). >> >> >> Phase II will bring significant performance improvements by decoupling >> >> FS and BM dependencies and replacing the global FSLock with a fine-grained >> IIPLock. >> >> >> Addressing Common Questions >> >> Thank you all for raising meaningful questions! >> >> I have rewritten the design document to improve clarity. >> >> >> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?usp=sharing >> >> >> Below is a summary of frequently asked questions and answers: >> >> Summary of Questions: >> >> Question 1: How is the performance of LockPoolManager? >> >> Performance Report: >> >> Time to acquire a cached lock: 194 ns >> >> Time to acquire a non-cached lock: 1044 ns >> >> Time to release an in-use lock: 88 ns >> >> Time to release an unused lock: 112 ns >> >> Overall Performance: >> >> QPS: Over 10 million >> >> Time to acquire the IIP lock for a path with depth 10: >> >> Fully uncached: 10440 ns + 1120 ns (≈ 11 μs) >> >> Fully cached: 1940 ns + 1120 ns (≈ 3 μs) >> >> In global lock scenarios, lock wait times are typically in the >> >> millisecond range. Therefor, the cost of acquiring and releasing >> fine-grained locks can be ignored. >> >> >> Question 2: How much memory does the FGL consume? >> >> Memory Consumption: >> >> A single LockResource contains a read-write lock and a counter, >> >> totaling approximately 200 bytes: >> >> >> LockResource: 24 bytes >> >> ReentrantReadWriteLock: 150 bytes >> >> AtomicInteger: 16 bytes >> >> Memory Usage Estimates: >> >> 10-level directory depth, 100 handlers >> >> 1000 lock resources, approximately 200 KB >> >> 10-level directory depth, 1000 handlers >> >> 10000 lock resources, approximately 2 MB >> >> 1, 000,000 lock resources, approximately 200 MB >> >> Conclusion: Memory consumption is negligible. >> >> Question 3: What happens if no lock is available in the LockPoolManager? >> >> If there are not any available LockResources, two solutions are >> >> available: >> >> >> Return a RetryException, prompting the client to retry later. >> >> Temporarily increase the lock entity limit, allocate more locks to meet >> >> client requests, and use an asynchronous thread to recycle locks >> periodically. >> >> >> We can provide multiple LockPoolManager implementations for users to >> >> choose from based on production environments. >> >> >> Question 4: Regarding the IIPLock lock depth issue, can we consider >> >> holding only the first 3 or 4 levels of directory locks? >> >> >> This approach is not recommended for the following reasons: >> >> Cannot maximize concurrency. >> >> Limited savings in lock acquisition/release time and memory usage, >> >> yielding insignificant benefits. >> >> >> Question 5: How should attributes like StoragePolicy, ErasureCoding, >> >> and ACL, which can be set on parent or ancestor directory nodes, be >> handled? >> >> >> ErasureCoding and ACL: >> >> When changing node attributes, hold the corresponding INode’s write >> >> lock. >> >> >> When using ancestor node attributes, hold the corresponding INode’s >> >> read lock. >> >> >> StoragePolicy: >> >> More complex due to its impact on both directory tree operations and >> >> Block operations. >> >> >> To improve performance, commonly used block-related operations (such as >> >> BR/IBR) should not acquire IIPLock >> >> >> Detailed design documentation: >> >> >> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.96lztsl4mwfk >> >> >> Question 6: How should FGL be implemented for the SNAPSHOT feature? >> >> Since the Rename operation on the SNAPSHOT directory is supported, >> >> holding only the write lock of the SNAPSHOT root directory cannot cover >> the >> rename situation, so the thread safety of SNAPSHOT-related operations >> cannot be guaranteed >> >> >> It is recommended to use global FS lock to ensure thread safety. >> >> Detailed design documentation: >> >> >> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.sm36p6bfcpec >> >> >> Question 7: How should FGL be implemented for the Symlinks feature? >> >> The Target path of Symlinks is a string, and the client performs a >> >> second forward access to the Target path. So the fine-grained lock project >> requires no special handling >> >> >> For the createSymlink RPC, the FGL needs to acquire the IIPLocks for >> >> both target and link paths. >> >> >> Question 8: How should FGL be implemented for the reserved feature? >> >> The Reserved feature has two usage modes: >> >> /.reserved/iNodes/${inode id} >> >> /.reserved/raw/${path} >> >> INodeId Mode: During the resolvePath phase, obtain the real IIPLock >> >> lock via INodeId. >> >> >> Path Mode: During the resolvePath phase, obtain the real IIPLock lock >> >> via path. >> >> >> Detailed design documentation: >> >> >> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.h6rcpzkbpanf >> >> >> Question 9: Why is INodeFileLock used as the FGL for BlockInfo? >> >> INodeFile and Block have mutual dependencies: >> >> INodeFile depends on Block for state and size. >> >> Block depends on INodeFile for state and storage policy. >> >> Therefore, using INodeFileLock as the fine-grained lock for BlockInfo >> >> is a reasonable choice. >> >> >> Detailed design documentation: >> >> >> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.zesd6omuu3kr >> >> >> Seeking Community Feedback >> >> Your questions and concerns are always welcome. >> >> We can discuss them in detail on the Slack Channel: >> >> https://app.slack.com/client/T4S1WH2J3/C06UDTBQ2SH >> >> >> Let’s work together to advance the Fine-Grained Lock project. I believe >> >> this initiative will deliver significant performance improvements to the >> HDFS community and help reinvigorate its activity. >> >> >> Wishing everyone a Happy New Year 2025! >> >> >> On Wed, 5 Jun 2024 at 16:17, ZanderXu <zande...@apache.org> wrote: >> >> >> I plan to hold a meeting on 2024-06-06 from 3:00 PM - 4:00 PM to share >> >> the FGL's motivations and some concerns in detail in Chinese. >> >> >> The doc is : NameNode Fine-Grained Locking Based On Directory Tree (II) >> >> The meeting URL is: https://sea.zoom.us/j/94168001269 >> >> You are welcome to this meeting. >> >> On Mon, 6 May 2024 at 23:57, Hui Fei <feihui.u...@gmail.com> wrote: >> >> >> BTW, there is a Slack channel hdfs-fgl for this feature. can join it >> >> and discuss more details. >> >> >> Is it necessary to hold a meeting to discuss this? So that we can >> >> push it forward quickly. Agreed with ZanderXu, it seems inefficient to >> discuss details via email list. >> >> >> >> Hui Fei <feihui.u...@gmail.com> 于2024年5月6日周一 23:50写道: >> >> >> Thanks all >> >> Seems all concerns are related to the stage 2. We can address these >> >> and make it more clear before we start it. >> >> >> From development experience, I think it is reasonable to split the >> >> big feature into several stages. And stage 1 is also independent and it >> also can be as a minor feature that uses fs and bm locks instead of the >> global lock. >> >> >> >> ZanderXu <zande...@apache.org> 于2024年4月29日周一 15:17写道: >> >> >> Thanks @Ayush Saxena <ayush...@gmail.com> and @Xiaoqiao He >> <hexiaoq...@apache.org> for your nice questions. >> >> Let me summarize your concerns and corresponding solutions: >> >> *1. Questions about the Snapshot feature* >> It's difficult to apply the FGL to Snapshot feature, but we can >> >> just using >> >> the global FS write lock to make it thread safe. >> So if we can identity if a path contains the snapshot feature, we >> >> can just >> >> using the global FS write lock to protect it. >> >> You can refer to HDFS-17479 >> <https://issues.apache.org/jira/browse/HDFS-17479> to get how to >> >> identify >> >> it. >> >> Regarding performance of the operations related to the snapshot >> >> features, >> >> we can discuss it in two categories: >> Read operations involves snapshots: >> The FGL branch uses the global write lock to protect them, the >> >> GLOBAL >> >> branch uses the global read lock to protect them. It's hard to >> >> conclude >> >> which version has better performance, it depends on the global lock >> competition. >> >> Write operations involves snapshots: >> Both FGL and GLOBAL branch use the global write lock to protect >> >> them. It's >> >> hard to conclude which version has better performance, it depends >> >> on the >> >> global lock competition too. >> >> So I think if namenode load is low, the GLOBAL branch will have a >> >> better >> >> performance than FGL; If namenode load is high, the FGL branch may >> >> have a >> >> better performance than the GLOBAL, which also depends on the ratio >> >> of read >> >> and write operations on the SNAPSHOT feature. >> >> We can do somethings to let end-user to choose a branch with a >> >> better >> >> branch according to their business: >> First, we need to make the lock mode can be selectable, so that >> >> end-user >> >> can choose to use FGL of GLOBAL. >> Second, using the global write lock to make operations related to >> >> snapshot >> >> thread safe as I described in HDFS-17479. >> >> >> *2. Questions about the Symlinks feature* >> If Symlink is related to snapshot, we can refer to the solution of >> >> the >> >> snapshot; If Symlink is not related to snapshot, I think it's easy >> >> to meet >> >> the FGL. >> Only createSymlink involves two paths, FGL just need to lock them >> >> in the >> >> order to make this operation thread. For other operations, it is >> >> the same >> >> as other normal iNode, right? >> >> If I missed difficult points, please let me know. >> >> >> *3. Questions about Memory Usage of iNode locks* >> I think there are too many solutions to limit the memory usage of >> >> these >> >> iNode locks, such as: Using a limit capacity lock pool to ensure the >> maximum memory usage, Just holding iNode locks for fixed depth of >> directories, etc. >> >> We can just abstract this LockManager first and then support its >> implementation with different ideas, so that we can limit the >> >> maximum >> >> memory usage of these iNode locks. >> FGL can acquire or lease iNode locks through LockManager. >> >> >> *4. Questions about Performance of acquiring and releasing iNode >> >> locks* >> >> We can add some benchmark for LockManager, to test the performance >> >> or >> >> acquire and release unblocked locks. >> >> >> *5. Questions about StoragePolicy, ECPolicy, ACL, Quota, etc.* >> These policies may be sot on an ancestor node and used by some >> >> children >> >> files. The set operation for these policies will be protected by >> >> the >> >> directory tree, since there are all file-related operations. In >> >> addition >> >> to Quota and StoragePolicy, the use of other policies will also be >> protected by directory tree, such as ECPolicy and ACL. >> >> Quota is a little special since its update operations may not be >> >> protected >> >> by the directory tree, we can assign a locks to each QuotaFeature >> >> and use >> >> these locks to make updating operations thread safe. you can refer >> >> to >> >> HDFS-17473 <https://issues.apache.org/jira/browse/HDFS-17473> to >> >> get some >> >> detailed information. >> >> StoragePolicy is a little special since it is used not only by >> >> file-related >> >> operations but also block-related operations. >> >> ProcessExtraRedundancyBlock >> >> uses storage policy to choose redundancy replicas and >> BlockReconstructionWork uses storage policy to choose target DNs. >> >> In order >> >> to maximize the performance improvement, BR and IBR should only >> >> involve the >> >> iNodeFile to which the current processing block belongs. These >> >> redundancy >> >> blocks can be processed by the Redundancy monitor while holding the >> directory tree locks. You can refer to HDFS-17505 >> <https://issues.apache.org/jira/browse/HDFS-17505> to get more >> >> detailed >> >> informations. >> >> *6. Performance of the phase 1* >> HDFS-17506 <https://issues.apache.org/jira/browse/HDFS-17506> is >> >> used to do >> >> some performance testing for phase 1, and I will complete it later. >> >> >> Discuss solution through mails is not efficient, you can create one >> sub-tasks under HDFS-17366 >> <https://issues.apache.org/jira/browse/HDFS-17366> to describe your >> concerns and I will try to give some answers. >> >> Thanks @Ayush Saxena <ayush...@gmail.com> and @Xiaoqiao He >> <hexiaoq...@apache.org> again. >> >> >> >> On Mon, 29 Apr 2024 at 02:00, Ayush Saxena <ayush...@gmail.com> >> >> wrote: >> >> >> Thanx Everyone for chasing this, Great to see some momentum >> >> around FGL, >> >> that should be a great improvement. >> >> I have some two broad categories: >> ** About the process:* >> I think in the above mails, there are mentions that phase one is >> >> complete >> >> in a feature branch & we are gonna merge that to trunk. If I am >> >> catching it >> >> right, then you can't hit the merge button like that. To merge a >> >> feature >> >> branch. You need to call for a Vote specific to that branch & it >> >> requires 3 >> >> binding votes to merge, unlike any other code change which >> >> requires 1. It >> >> is there in our Bylaws. >> >> So, do follow the process. >> >> ** About the feature itself:* (A very quick look at the doc and >> >> the Jira, >> >> so please take it with a grain of salt) >> * The Google Drive link that you folks shared as part of the >> >> first mail. I >> >> don't have access to that. So, please open up the permissions for >> >> that doc >> >> or share the new link >> * Chasing the design doc present on the Jira >> * I think we only have Phase-1 ready, so can you share some >> >> metrics just >> >> for that? Perf improvements just with splitting the FS & BM Locks >> * The memory implications of Phase-1? I don't think there should >> >> be any >> >> major impact on the memory in case of just phase-1 >> * Regarding the snapshot stuff, you mentioned taking lock on the >> >> root >> >> itself? Does just taking lock on the snapshot root rather than >> >> the FS root >> >> works? >> * Secondly about the usage of Snapshot or Symlinks, I don't think >> >> we >> >> should operate under the assumptions that they aren't widely used >> >> or not, >> >> we might just not know folks who don't use it widely or they are >> >> just users >> >> not the ones contributing. We can just accept for now, that in >> >> those cases >> >> it isn't optimised and we just lock the entire FS space, which it >> >> does even >> >> today, so no regressions there. >> * Regarding memory usage: Do you have some numbers on how much >> >> the memory >> >> footprint increases? >> * Under the Lock Pool: I think you are assuming there would be >> >> very few >> >> inodes where lock would be required at any given time, so there >> >> won't be >> >> too much heap consumption? I think you are compromising on the >> >> Horizontal >> >> Scalability here. I doubt if your assumption doesn't hold true, >> >> under heavy >> >> read load by concurrent clients accessing different inodes, the >> >> Namenode >> >> will start giving memory troubles, that would do more harm than >> >> good. >> >> Anyway Namenode heap is way bigger problem than anything, so we >> >> should be >> >> very careful increasing load over there. >> * For the Locks on the inodes: Do you plan to have locs for each >> >> inode? >> >> Can we somehow limit that to the depth of the tree? Like >> >> currently we take >> >> lock on the root, have a config which makes us take lock at >> >> Level-2 or 3 >> >> (configurable), that might fetch some perf benefits and can be >> >> used to >> >> control the memory usage as well? >> * What is the cost of creating these inode locks? If the lock >> >> isn't >> >> already cached it would incur some cost? Do you have some numbers >> >> around >> >> that? Say I disable caching altogether & then let a test load >> >> run, what >> >> does the perf numbers look like in that case >> * I think we need to limit the size of INodeLockPool, we can't >> >> let it grow >> >> infinitely in case of heavy loads and we need to have some auto >> throttling mechanism for it >> * I didn't catch your Storage Policy problem. If I decode it >> >> right, the >> >> problem is like the policy could be set on an ancestor node & the >> >> children >> >> abide by that & this is the problem, if that is the case then >> >> isn't that >> >> the case with ErasureCoding policies or even ACLs or so? Can you >> >> elaborate >> >> a bit on that. >> >> >> Anyway, regarding the Phase-1. If you share (the perf numbers >> >> with proper >> >> details + Impact on memory if any) for just phase 1 & if they are >> >> good, >> >> then if you call for a branch merge vote for Phase-1 FGL, you >> >> have my vote, >> >> however you'll need to sway the rest of the folks on your own :-) >> >> Good Luck, Nice Work Guys!!! >> >> -Ayush >> >> >> On Sun, 28 Apr 2024 at 18:32, Xiaoqiao He <hexiaoq...@apache.org> >> >> wrote: >> >> >> Thanks ZanderXu and Hui Fei for your work on this feature. It >> >> will be >> >> a very helpful improvement for the HDFS module in the next >> >> journal. >> >> >> 1. If we need any more review bandwidth, I would like to be >> >> involved >> >> to help review if possible. >> 2. From the design document there are still missing some detailed >> descriptions such as snapshot, symbolic link and reserved etc as >> >> mentioned >> >> above. I think it will be helpful for newbies who want to be >> >> involved >> >> if all corner >> cases are considered and described. >> 3. From slack, we plan to check into the trunk at this phase. I >> >> am not >> >> sure >> If it is the proper time, following the dev plan there are two >> >> steps left >> >> to >> finish this feature from the design document, right? If that, I >> >> think we >> >> should >> postpone checking in when all plans are ready. Considering that >> >> there are >> >> many unfinished tries for this feature in history, I think >> >> postpone >> >> checking >> will be the safe way, another way it will involve more rebase >> >> cost if you >> >> keep >> separate dev branch, however I think It is not one difficult >> >> thing for >> >> you. >> >> Good luck and look forward to making that happen soon! >> >> Best Regards, >> - He Xiaoqiao >> >> On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <feihui.u...@gmail.com> >> >> wrote: >> >> >> Thanks for interest and advice on this. >> >> Just would like to share some info here >> >> ZanderXu leads this feature and he has spent a lot of time on >> >> it. He is >> >> the main developer in stage 1. Yuanboliu and Kokonguyen191 also >> >> took some >> >> tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 >> >> RocMarshal >> >> kokonguyen191) helped review PRs. (Forgive me if I missed >> >> someone) >> >> >> Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very >> >> familiar with this feature. We discussed many details offline. >> >> >> Welcome to more people interested in joining the development >> >> and review >> >> of the stage 2 and 3. >> >> >> >> Zengqiang XU <xuzengqiang5...@gmail.com> 于2024年4月26日周五 >> >> 14:56写道: >> >> >> Thanks Shilun for your response: >> >> 1. This is a big and very useful feature, so it really needs >> >> more >> >> developers to get on board. >> 2. This fine grained lock has been implemented based on >> >> internal >> >> branches >> >> and has gained benefits by many companies, such as: Meituan, >> >> Kuaishou, >> >> Bytedance, etc. But it has not been contributed to the >> >> community due >> >> to >> >> various reasons, such as there is a big difference between >> >> the version >> >> of >> >> the internal branch and the community trunk branch, the >> >> internal >> >> branch may >> >> ignore some functions to make FGL clear, and the contribution >> >> needs a >> >> lot >> >> of work and will take many times. It means that this solution >> >> has >> >> already >> >> been practiced in their prod environment. We have also >> >> practiced it in >> >> our >> >> prod environment and gained benefits, and we are also willing >> >> to spend >> >> a >> >> lot of time contributing to the community. >> 3. Regarding the benchmark testing, we don't need to pay more >> >> attention to >> >> whether the performance is improved by 5 times, 10 times or >> >> 20 times, >> >> because there are too many factors that affect it. >> 4. As I described above, this solution is already being >> >> practiced by >> >> many >> >> companies. Right now, we just need to think about how to >> >> implement it >> >> with >> >> high quality and more comprehensively. >> 5. I firmly believe that all problems can be solved as long >> >> as the >> >> overall >> >> solution is right. >> 6. I can spend a lot of time leading the promotion of this >> >> entire >> >> feature >> >> and I hope more people can join us in promoting it. >> 7. You are always welcome to raise your concerns. >> >> >> Thanks Shilun again, I hope you can help review designs and >> >> PRs. Thanks >> >> >> On Fri, 26 Apr 2024 at 08:00, slfan1989 <slfan1...@apache.org> >> >> wrote: >> >> >> Thank you for your hard work! This is a very meaningful >> >> improvement, >> >> and >> >> from the design document, we can see a significant increase >> >> in HDFS >> >> read/write throughput. >> >> I am happy to see the progress made on HDFS-17384. >> >> However, I still have some concerns, which roughly involve >> >> the >> >> following >> >> aspects: >> >> 1. While ZanderXu and Hui Fei have deep expertise in HDFS >> >> and are >> >> familiar >> >> with related development details, we still need more >> >> community >> >> member to >> >> review the code to ensure that the relevant upgrades meet >> >> expectations. >> >> >> 2. We need more details on benchmarks to ensure that test >> >> results >> >> can be >> >> reproduced and to allow more community member to >> >> participate in the >> >> testing >> >> process. >> >> Looking forward to everything going smoothly in the future. >> >> Best Regards, >> - Shilun Fan. >> >> On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He < >> >> hexiaoq...@apache.org> >> >> wrote: >> >> >> cc private@h.a.o. >> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu < >> >> zande...@apache.org> >> >> wrote: >> >> >> Here are some summaries about the first phase: >> 1. There are no big changes in this phase >> 2. This phase just uses FS lock and BM lock to replace >> >> the >> >> original >> >> global >> >> lock >> 3. It's useful to improve the performance, since some >> >> operations >> >> just >> >> need >> >> to hold FS lock or BM lock instead of the global lock >> 4. This feature is turned off by default, you can enable >> >> it by >> >> setting >> >> dfs.namenode.lock.model.provider.class to >> >> >> org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock >> >> 5. This phase is very import for the ongoing development >> >> of the >> >> entire >> >> FGL >> >> >> Here I would like to express my special thanks to >> >> @kokonguyen191 >> >> and >> >> @yuanboliu for their contributions. And you are also >> >> welcome to >> >> join us >> >> and complete it together. >> >> >> On Wed, 24 Apr 2024 at 14:54, ZanderXu < >> >> zande...@apache.org> >> >> wrote: >> >> >> Hi everyone >> >> All subtasks of the first phase of the FGL have been >> >> completed >> >> and I >> >> plan >> >> to merge them into the trunk and start the second >> >> phase based >> >> on the >> >> trunk. >> >> >> Here is the PR that used to merge the first phases >> >> into trunk: >> >> https://github.com/apache/hadoop/pull/6762 >> Here is the ticket: >> >> https://issues.apache.org/jira/browse/HDFS-17384 >> >> >> I hope you can help to review this PR when you are >> >> available >> >> and give >> >> some >> >> ideas. >> >> >> HDFS-17385 < >> >> https://issues.apache.org/jira/browse/HDFS-17385> >> >> is >> >> used for >> >> the second phase and I have created some subtasks to >> >> describe >> >> solutions for >> >> some problems, such as: snapshot, getListing, quota. >> You are welcome to join us to complete it together. >> >> >> ---------- Forwarded message --------- >> From: Zengqiang XU <zande...@apache.org> >> Date: Fri, 2 Feb 2024 at 11:07 >> Subject: Discussion about NameNode Fine-grained locking >> To: <hdfs-dev@hadoop.apache.org> >> Cc: Zengqiang XU <xuzengqiang5...@gmail.com> >> >> >> Hi everyone >> >> I have started a discussion about NameNode >> >> Fine-grained Locking >> >> to >> >> improve >> >> performance of write operations in NameNode. >> >> I started this discussion again for serval main >> >> reasons: >> >> 1. We have implemented it and gained nearly 7x >> >> performance >> >> improvement in >> >> our prod environment >> 2. Many other companies made similar improvements >> >> based on their >> >> internal >> >> branch. >> 3. This topic has been discussed for a long time, but >> >> still >> >> without >> >> any >> >> results. >> >> I hope we can push this important improvement in the >> >> community >> >> so >> >> that all >> >> end-users can enjoy this significant improvement. >> >> I'd really appreciate you can join in and work with me >> >> to push >> >> this >> >> feature forward. >> >> Thanks very much. >> >> Ticket: HDFS-17366 < >> >> https://issues.apache.org/jira/browse/HDFS-17366> >> >> Design: NameNode Fine-grained locking based on >> >> directory tree >> >> < >> >> >> >> >> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: >> >> private-unsubscr...@hadoop.apache.org >> >> For additional commands, e-mail: >> >> private-h...@hadoop.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >> >> >>