Thanks Zander for bringing this discussion again and trying your best to push it forward. It's really a long time since last discussion.
It’s indeed time, +1 for merging phase 1 codes based on the following points - The phase 1 feature has been running at scale within companies for a long time - The long-term plan is clear, and also addressed some questions raised by the community - The testing result of future features on memory and performance ZanderXu <zande...@apache.org> 于2024年12月31日周二 15:36写道: > Hi, everyone: > Time to Merge FGL Phase I > > The PR for *FGL Phase I* is ready for merging! Please take a moment to > review and cast your vote: https://github.com/apache/hadoop/pull/6762. > > The *FGL Phase I* has been running successfully in production for over > six months at *Shopee* and *BOSS Zhipin*, with no reported performance or > stability issues. It’s now the right time to merge it into the trunk > branch, allowing us to move forward with Phase II. > > The global lock remains the default lock mode, but users can enable FGL by > configuring > dfs.namenode.lock.model.provider.class=org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock > . > > If there are no objections within 7 days, I will propose an official vote. > Performance and Memory Usage of Phase I > > Conclusion: > > 1. > > Fine-grained locks do not lead to significant performance improvements. > 2. > > Fine-grained locks do not result in additional memory consumption > > Reasons: > > - > > *BM operations heavily depend on FS operations*: IBR and BR still > acquire the global lock (FSLock and BMLock). > - > > *FS operations depend on BM operations*: Common operations (create, > addBlock, getBlockLocations) also acquire the global lock (FSLock and > BMLock). > > Phase II will bring significant performance improvements by decoupling FS > and BM dependencies and replacing the global FSLock with a fine-grained > IIPLock. > > Addressing Common Questions > > Thank you all for raising meaningful questions! > > I have rewritten the design document to improve clarity. > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?usp=sharing > > Below is a summary of frequently asked questions and answers: > Summary of Questions:*Question 1: How is the performance of > LockPoolManager?* > > - > > *Performance Report*: > - > > Time to acquire a cached lock: 194 ns > - > > Time to acquire a non-cached lock: 1044 ns > - > > Time to release an in-use lock: 88 ns > - > > Time to release an unused lock: 112 ns > - > > *Overall Performance*: > - > > *QPS*: Over 10 million > - > > Time to acquire the IIP lock for a path with depth 10: > - > > Fully uncached: 10440 ns + 1120 ns (≈ 11 μs) > - > > Fully cached: 1940 ns + 1120 ns (≈ 3 μs) > - > > In *global lock scenarios*, lock wait times are typically in the > millisecond range. Therefor, the cost of acquiring and releasing > fine-grained locks can be ignored. > > *Question 2: How much memory does the FGL consume?* > > - > > *Memory Consumption*: > - > > A single LockResource contains a read-write lock and a counter, > totaling approximately 200 bytes: > - > > LockResource: 24 bytes > - > > ReentrantReadWriteLock: 150 bytes > - > > AtomicInteger: 16 bytes > - > > *Memory Usage Estimates*: > - > > 10-level directory depth, 100 handlers > - > > 1000 lock resources, approximately 200 KB > - > > 10-level directory depth, 1000 handlers > - > > 10000 lock resources, approximately 2 MB > - > > 1, 000,000 lock resources, approximately 200 MB > > *Conclusion*: Memory consumption is negligible. > *Question 3: What happens if no lock is available in the LockPoolManager?* > > If there are not any available LockResources, two solutions are available: > > 1. > > Return a *RetryException*, prompting the client to retry later. > 2. > > Temporarily increase the lock entity limit, allocate more locks to > meet client requests, and use an asynchronous thread to recycle locks > periodically. > > We can provide multiple LockPoolManager implementations for users to > choose from based on production environments. > *Question 4: Regarding the IIPLock lock depth issue, can we consider > holding only the first 3 or 4 levels of directory locks?* > > This approach is not recommended for the following reasons: > > 1. > > *Cannot maximize concurrency*. > 2. > > *Limited savings in lock acquisition/release time and memory usage*, > yielding insignificant benefits. > > *Question 5: How should attributes like StoragePolicy, ErasureCoding, and > ACL, which can be set on parent or ancestor directory nodes, be handled?* > > - > > *ErasureCoding and ACL*: > - > > When changing node attributes, hold the corresponding INode’s write > lock. > - > > When using ancestor node attributes, hold the corresponding INode’s > read lock. > - > > *StoragePolicy*: > - > > More complex due to its impact on both directory tree operations > and Block operations. > - > > To improve performance, commonly used block-related operations > (such as BR/IBR) should not acquire IIPLock > - > > Detailed design documentation: > > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.96lztsl4mwfk > > *Question 6: How should FGL be implemented for the SNAPSHOT feature?* > > - > > Since the Rename operation on the SNAPSHOT directory is supported, > holding only the write lock of the SNAPSHOT root directory cannot cover the > rename situation, so the thread safety of SNAPSHOT-related operations > cannot be guaranteed > - > > It is recommended to use *global FS lock* to ensure thread safety. > - > > Detailed design documentation: > > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.sm36p6bfcpec > > *Question 7: How should FGL be implemented for the Symlinks feature?* > > - > > The Target path of Symlinks is a string, and the client performs a > second forward access to the Target path. So the fine-grained lock project > requires no special handling > - > > For the createSymlink RPC, the FGL needs to acquire the IIPLocks for > both target and link paths. > > *Question 8: How should FGL be implemented for the reserved feature?* > > The Reserved feature has two usage modes: > > 1. > > /.reserved/iNodes/${inode id} > 2. > > /.reserved/raw/${path} > > > - > > *INodeId Mode*: During the resolvePath phase, obtain the real IIPLock > lock via INodeId. > - > > *Path Mode*: During the resolvePath phase, obtain the real IIPLock > lock via path. > - > > Detailed design documentation: > > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.h6rcpzkbpanf > > *Question 9: Why is INodeFileLock used as the FGL for BlockInfo?* > > INodeFile and Block have mutual dependencies: > > - > > *INodeFile depends on Block* for state and size. > - > > *Block depends on INodeFile* for state and storage policy. > > Therefore, using INodeFileLock as the fine-grained lock for BlockInfo is a > reasonable choice. > > Detailed design documentation: > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.zesd6omuu3kr > > Seeking Community Feedback > > Your questions and concerns are always welcome. > > We can discuss them in detail on the Slack Channel: > https://app.slack.com/client/T4S1WH2J3/C06UDTBQ2SH > > Let’s work together to advance the Fine-Grained Lock project. I believe > this initiative will deliver significant performance improvements to the > HDFS community and help reinvigorate its activity. > > Wishing everyone a Happy New Year 2025! > > On Wed, 5 Jun 2024 at 16:17, ZanderXu <zande...@apache.org> wrote: > >> I plan to hold a meeting on 2024-06-06 from 3:00 PM - 4:00 PM to share >> the FGL's motivations and some concerns in detail in Chinese. >> >> The doc is : NameNode Fine-Grained Locking Based On Directory Tree (II) >> <https://docs.google.com/document/d/1QGLM67u6tWjj00gOWYqgxHqghb43g4dmH8QcUZtSrYE/edit?usp=sharing> >> >> The meeting URL is: https://sea.zoom.us/j/94168001269 >> >> You are welcome to this meeting. >> >> On Mon, 6 May 2024 at 23:57, Hui Fei <feihui.u...@gmail.com> wrote: >> >>> BTW, there is a Slack channel hdfs-fgl for this feature. can join it and >>> discuss more details. >>> >>> Is it necessary to hold a meeting to discuss this? So that we can push >>> it forward quickly. Agreed with ZanderXu, it seems inefficient to discuss >>> details via email list. >>> >>> >>> Hui Fei <feihui.u...@gmail.com> 于2024年5月6日周一 23:50写道: >>> >>>> Thanks all >>>> >>>> Seems all concerns are related to the stage 2. We can address these and >>>> make it more clear before we start it. >>>> >>>> From development experience, I think it is reasonable to split the big >>>> feature into several stages. And stage 1 is also independent and it also >>>> can be as a minor feature that uses fs and bm locks instead of the global >>>> lock. >>>> >>>> >>>> ZanderXu <zande...@apache.org> 于2024年4月29日周一 15:17写道: >>>> >>>>> Thanks @Ayush Saxena <ayush...@gmail.com> and @Xiaoqiao He >>>>> <hexiaoq...@apache.org> for your nice questions. >>>>> >>>>> Let me summarize your concerns and corresponding solutions: >>>>> >>>>> *1. Questions about the Snapshot feature* >>>>> It's difficult to apply the FGL to Snapshot feature, but we can just >>>>> using >>>>> the global FS write lock to make it thread safe. >>>>> So if we can identity if a path contains the snapshot feature, we can >>>>> just >>>>> using the global FS write lock to protect it. >>>>> >>>>> You can refer to HDFS-17479 >>>>> <https://issues.apache.org/jira/browse/HDFS-17479> to get how to >>>>> identify >>>>> it. >>>>> >>>>> Regarding performance of the operations related to the snapshot >>>>> features, >>>>> we can discuss it in two categories: >>>>> Read operations involves snapshots: >>>>> The FGL branch uses the global write lock to protect them, the GLOBAL >>>>> branch uses the global read lock to protect them. It's hard to conclude >>>>> which version has better performance, it depends on the global lock >>>>> competition. >>>>> >>>>> Write operations involves snapshots: >>>>> Both FGL and GLOBAL branch use the global write lock to protect them. >>>>> It's >>>>> hard to conclude which version has better performance, it depends on >>>>> the >>>>> global lock competition too. >>>>> >>>>> So I think if namenode load is low, the GLOBAL branch will have a >>>>> better >>>>> performance than FGL; If namenode load is high, the FGL branch may >>>>> have a >>>>> better performance than the GLOBAL, which also depends on the ratio of >>>>> read >>>>> and write operations on the SNAPSHOT feature. >>>>> >>>>> We can do somethings to let end-user to choose a branch with a better >>>>> branch according to their business: >>>>> First, we need to make the lock mode can be selectable, so that >>>>> end-user >>>>> can choose to use FGL of GLOBAL. >>>>> Second, using the global write lock to make operations related to >>>>> snapshot >>>>> thread safe as I described in HDFS-17479. >>>>> >>>>> >>>>> *2. Questions about the Symlinks feature* >>>>> If Symlink is related to snapshot, we can refer to the solution of the >>>>> snapshot; If Symlink is not related to snapshot, I think it's easy to >>>>> meet >>>>> the FGL. >>>>> Only createSymlink involves two paths, FGL just need to lock them in >>>>> the >>>>> order to make this operation thread. For other operations, it is the >>>>> same >>>>> as other normal iNode, right? >>>>> >>>>> If I missed difficult points, please let me know. >>>>> >>>>> >>>>> *3. Questions about Memory Usage of iNode locks* >>>>> I think there are too many solutions to limit the memory usage of these >>>>> iNode locks, such as: Using a limit capacity lock pool to ensure the >>>>> maximum memory usage, Just holding iNode locks for fixed depth of >>>>> directories, etc. >>>>> >>>>> We can just abstract this LockManager first and then support its >>>>> implementation with different ideas, so that we can limit the maximum >>>>> memory usage of these iNode locks. >>>>> FGL can acquire or lease iNode locks through LockManager. >>>>> >>>>> >>>>> *4. Questions about Performance of acquiring and releasing iNode locks* >>>>> We can add some benchmark for LockManager, to test the performance or >>>>> acquire and release unblocked locks. >>>>> >>>>> >>>>> *5. Questions about StoragePolicy, ECPolicy, ACL, Quota, etc.* >>>>> These policies may be sot on an ancestor node and used by some children >>>>> files. The set operation for these policies will be protected by the >>>>> directory tree, since there are all file-related operations. In >>>>> addition >>>>> to Quota and StoragePolicy, the use of other policies will also be >>>>> protected by directory tree, such as ECPolicy and ACL. >>>>> >>>>> Quota is a little special since its update operations may not be >>>>> protected >>>>> by the directory tree, we can assign a locks to each QuotaFeature and >>>>> use >>>>> these locks to make updating operations thread safe. you can refer to >>>>> HDFS-17473 <https://issues.apache.org/jira/browse/HDFS-17473> to get >>>>> some >>>>> detailed information. >>>>> >>>>> StoragePolicy is a little special since it is used not only by >>>>> file-related >>>>> operations but also block-related operations. >>>>> ProcessExtraRedundancyBlock >>>>> uses storage policy to choose redundancy replicas and >>>>> BlockReconstructionWork uses storage policy to choose target DNs. In >>>>> order >>>>> to maximize the performance improvement, BR and IBR should only >>>>> involve the >>>>> iNodeFile to which the current processing block belongs. These >>>>> redundancy >>>>> blocks can be processed by the Redundancy monitor while holding the >>>>> directory tree locks. You can refer to HDFS-17505 >>>>> <https://issues.apache.org/jira/browse/HDFS-17505> to get more >>>>> detailed >>>>> informations. >>>>> >>>>> *6. Performance of the phase 1* >>>>> HDFS-17506 <https://issues.apache.org/jira/browse/HDFS-17506> is used >>>>> to do >>>>> some performance testing for phase 1, and I will complete it later. >>>>> >>>>> >>>>> Discuss solution through mails is not efficient, you can create one >>>>> sub-tasks under HDFS-17366 >>>>> <https://issues.apache.org/jira/browse/HDFS-17366> to describe your >>>>> concerns and I will try to give some answers. >>>>> >>>>> Thanks @Ayush Saxena <ayush...@gmail.com> and @Xiaoqiao He >>>>> <hexiaoq...@apache.org> again. >>>>> >>>>> >>>>> >>>>> On Mon, 29 Apr 2024 at 02:00, Ayush Saxena <ayush...@gmail.com> wrote: >>>>> >>>>> > Thanx Everyone for chasing this, Great to see some momentum around >>>>> FGL, >>>>> > that should be a great improvement. >>>>> > >>>>> > I have some two broad categories: >>>>> > ** About the process:* >>>>> > I think in the above mails, there are mentions that phase one is >>>>> complete >>>>> > in a feature branch & we are gonna merge that to trunk. If I am >>>>> catching it >>>>> > right, then you can't hit the merge button like that. To merge a >>>>> feature >>>>> > branch. You need to call for a Vote specific to that branch & it >>>>> requires 3 >>>>> > binding votes to merge, unlike any other code change which requires >>>>> 1. It >>>>> > is there in our Bylaws. >>>>> > >>>>> > So, do follow the process. >>>>> > >>>>> > ** About the feature itself:* (A very quick look at the doc and the >>>>> Jira, >>>>> > so please take it with a grain of salt) >>>>> > * The Google Drive link that you folks shared as part of the first >>>>> mail. I >>>>> > don't have access to that. So, please open up the permissions for >>>>> that doc >>>>> > or share the new link >>>>> > * Chasing the design doc present on the Jira >>>>> > * I think we only have Phase-1 ready, so can you share some metrics >>>>> just >>>>> > for that? Perf improvements just with splitting the FS & BM Locks >>>>> > * The memory implications of Phase-1? I don't think there should be >>>>> any >>>>> > major impact on the memory in case of just phase-1 >>>>> > * Regarding the snapshot stuff, you mentioned taking lock on the root >>>>> > itself? Does just taking lock on the snapshot root rather than the >>>>> FS root >>>>> > works? >>>>> > * Secondly about the usage of Snapshot or Symlinks, I don't think we >>>>> > should operate under the assumptions that they aren't widely used or >>>>> not, >>>>> > we might just not know folks who don't use it widely or they are >>>>> just users >>>>> > not the ones contributing. We can just accept for now, that in those >>>>> cases >>>>> > it isn't optimised and we just lock the entire FS space, which it >>>>> does even >>>>> > today, so no regressions there. >>>>> > * Regarding memory usage: Do you have some numbers on how much the >>>>> memory >>>>> > footprint increases? >>>>> > * Under the Lock Pool: I think you are assuming there would be very >>>>> few >>>>> > inodes where lock would be required at any given time, so there >>>>> won't be >>>>> > too much heap consumption? I think you are compromising on the >>>>> Horizontal >>>>> > Scalability here. I doubt if your assumption doesn't hold true, >>>>> under heavy >>>>> > read load by concurrent clients accessing different inodes, the >>>>> Namenode >>>>> > will start giving memory troubles, that would do more harm than good. >>>>> > Anyway Namenode heap is way bigger problem than anything, so we >>>>> should be >>>>> > very careful increasing load over there. >>>>> > * For the Locks on the inodes: Do you plan to have locs for each >>>>> inode? >>>>> > Can we somehow limit that to the depth of the tree? Like currently >>>>> we take >>>>> > lock on the root, have a config which makes us take lock at Level-2 >>>>> or 3 >>>>> > (configurable), that might fetch some perf benefits and can be used >>>>> to >>>>> > control the memory usage as well? >>>>> > * What is the cost of creating these inode locks? If the lock isn't >>>>> > already cached it would incur some cost? Do you have some numbers >>>>> around >>>>> > that? Say I disable caching altogether & then let a test load run, >>>>> what >>>>> > does the perf numbers look like in that case >>>>> > * I think we need to limit the size of INodeLockPool, we can't let >>>>> it grow >>>>> > infinitely in case of heavy loads and we need to have some auto >>>>> > throttling mechanism for it >>>>> > * I didn't catch your Storage Policy problem. If I decode it right, >>>>> the >>>>> > problem is like the policy could be set on an ancestor node & the >>>>> children >>>>> > abide by that & this is the problem, if that is the case then isn't >>>>> that >>>>> > the case with ErasureCoding policies or even ACLs or so? Can you >>>>> elaborate >>>>> > a bit on that. >>>>> > >>>>> > >>>>> > Anyway, regarding the Phase-1. If you share (the perf numbers with >>>>> proper >>>>> > details + Impact on memory if any) for just phase 1 & if they are >>>>> good, >>>>> > then if you call for a branch merge vote for Phase-1 FGL, you have >>>>> my vote, >>>>> > however you'll need to sway the rest of the folks on your own :-) >>>>> > >>>>> > Good Luck, Nice Work Guys!!! >>>>> > >>>>> > -Ayush >>>>> > >>>>> > >>>>> > On Sun, 28 Apr 2024 at 18:32, Xiaoqiao He <hexiaoq...@apache.org> >>>>> wrote: >>>>> > >>>>> >> Thanks ZanderXu and Hui Fei for your work on this feature. It will >>>>> be >>>>> >> a very helpful improvement for the HDFS module in the next journal. >>>>> >> >>>>> >> 1. If we need any more review bandwidth, I would like to be involved >>>>> >> to help review if possible. >>>>> >> 2. From the design document there are still missing some detailed >>>>> >> descriptions such as snapshot, symbolic link and reserved etc as >>>>> mentioned >>>>> >> above. I think it will be helpful for newbies who want to be >>>>> involved >>>>> >> if all corner >>>>> >> cases are considered and described. >>>>> >> 3. From slack, we plan to check into the trunk at this phase. I am >>>>> not >>>>> >> sure >>>>> >> If it is the proper time, following the dev plan there are two >>>>> steps left >>>>> >> to >>>>> >> finish this feature from the design document, right? If that, I >>>>> think we >>>>> >> should >>>>> >> postpone checking in when all plans are ready. Considering that >>>>> there are >>>>> >> many unfinished tries for this feature in history, I think postpone >>>>> >> checking >>>>> >> will be the safe way, another way it will involve more rebase cost >>>>> if you >>>>> >> keep >>>>> >> separate dev branch, however I think It is not one difficult thing >>>>> for >>>>> >> you. >>>>> >> >>>>> >> Good luck and look forward to making that happen soon! >>>>> >> >>>>> >> Best Regards, >>>>> >> - He Xiaoqiao >>>>> >> >>>>> >> On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <feihui.u...@gmail.com> >>>>> wrote: >>>>> >> > >>>>> >> > Thanks for interest and advice on this. >>>>> >> > >>>>> >> > Just would like to share some info here >>>>> >> > >>>>> >> > ZanderXu leads this feature and he has spent a lot of time on it. >>>>> He is >>>>> >> the main developer in stage 1. Yuanboliu and Kokonguyen191 also >>>>> took some >>>>> >> tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 >>>>> RocMarshal >>>>> >> kokonguyen191) helped review PRs. (Forgive me if I missed someone) >>>>> >> > >>>>> >> > Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very >>>>> >> familiar with this feature. We discussed many details offline. >>>>> >> > >>>>> >> > Welcome to more people interested in joining the development and >>>>> review >>>>> >> of the stage 2 and 3. >>>>> >> > >>>>> >> > >>>>> >> > Zengqiang XU <xuzengqiang5...@gmail.com> 于2024年4月26日周五 14:56写道: >>>>> >> >> >>>>> >> >> Thanks Shilun for your response: >>>>> >> >> >>>>> >> >> 1. This is a big and very useful feature, so it really needs more >>>>> >> >> developers to get on board. >>>>> >> >> 2. This fine grained lock has been implemented based on internal >>>>> >> branches >>>>> >> >> and has gained benefits by many companies, such as: Meituan, >>>>> Kuaishou, >>>>> >> >> Bytedance, etc. But it has not been contributed to the >>>>> community due >>>>> >> to >>>>> >> >> various reasons, such as there is a big difference between the >>>>> version >>>>> >> of >>>>> >> >> the internal branch and the community trunk branch, the internal >>>>> >> branch may >>>>> >> >> ignore some functions to make FGL clear, and the contribution >>>>> needs a >>>>> >> lot >>>>> >> >> of work and will take many times. It means that this solution has >>>>> >> already >>>>> >> >> been practiced in their prod environment. We have also practiced >>>>> it in >>>>> >> our >>>>> >> >> prod environment and gained benefits, and we are also willing to >>>>> spend >>>>> >> a >>>>> >> >> lot of time contributing to the community. >>>>> >> >> 3. Regarding the benchmark testing, we don't need to pay more >>>>> >> attention to >>>>> >> >> whether the performance is improved by 5 times, 10 times or 20 >>>>> times, >>>>> >> >> because there are too many factors that affect it. >>>>> >> >> 4. As I described above, this solution is already being >>>>> practiced by >>>>> >> many >>>>> >> >> companies. Right now, we just need to think about how to >>>>> implement it >>>>> >> with >>>>> >> >> high quality and more comprehensively. >>>>> >> >> 5. I firmly believe that all problems can be solved as long as >>>>> the >>>>> >> overall >>>>> >> >> solution is right. >>>>> >> >> 6. I can spend a lot of time leading the promotion of this entire >>>>> >> feature >>>>> >> >> and I hope more people can join us in promoting it. >>>>> >> >> 7. You are always welcome to raise your concerns. >>>>> >> >> >>>>> >> >> >>>>> >> >> Thanks Shilun again, I hope you can help review designs and PRs. >>>>> Thanks >>>>> >> >> >>>>> >> >> On Fri, 26 Apr 2024 at 08:00, slfan1989 <slfan1...@apache.org> >>>>> wrote: >>>>> >> >> >>>>> >> >> > Thank you for your hard work! This is a very meaningful >>>>> improvement, >>>>> >> and >>>>> >> >> > from the design document, we can see a significant increase in >>>>> HDFS >>>>> >> >> > read/write throughput. >>>>> >> >> > >>>>> >> >> > I am happy to see the progress made on HDFS-17384. >>>>> >> >> > >>>>> >> >> > However, I still have some concerns, which roughly involve the >>>>> >> following >>>>> >> >> > aspects: >>>>> >> >> > >>>>> >> >> > 1. While ZanderXu and Hui Fei have deep expertise in HDFS and >>>>> are >>>>> >> familiar >>>>> >> >> > with related development details, we still need more community >>>>> >> member to >>>>> >> >> > review the code to ensure that the relevant upgrades meet >>>>> >> expectations. >>>>> >> >> > >>>>> >> >> > 2. We need more details on benchmarks to ensure that test >>>>> results >>>>> >> can be >>>>> >> >> > reproduced and to allow more community member to participate >>>>> in the >>>>> >> testing >>>>> >> >> > process. >>>>> >> >> > >>>>> >> >> > Looking forward to everything going smoothly in the future. >>>>> >> >> > >>>>> >> >> > Best Regards, >>>>> >> >> > - Shilun Fan. >>>>> >> >> > >>>>> >> >> > On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He < >>>>> hexiaoq...@apache.org> >>>>> >> wrote: >>>>> >> >> > >>>>> >> >> >> cc private@h.a.o. >>>>> >> >> >> >>>>> >> >> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <zande...@apache.org >>>>> > >>>>> >> wrote: >>>>> >> >> >> > >>>>> >> >> >> > Here are some summaries about the first phase: >>>>> >> >> >> > 1. There are no big changes in this phase >>>>> >> >> >> > 2. This phase just uses FS lock and BM lock to replace the >>>>> >> original >>>>> >> >> >> global >>>>> >> >> >> > lock >>>>> >> >> >> > 3. It's useful to improve the performance, since some >>>>> operations >>>>> >> just >>>>> >> >> >> need >>>>> >> >> >> > to hold FS lock or BM lock instead of the global lock >>>>> >> >> >> > 4. This feature is turned off by default, you can enable it >>>>> by >>>>> >> setting >>>>> >> >> >> > dfs.namenode.lock.model.provider.class to >>>>> >> >> >> > >>>>> >> >>>>> org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock >>>>> >> >> >> > 5. This phase is very import for the ongoing development of >>>>> the >>>>> >> entire >>>>> >> >> >> FGL >>>>> >> >> >> > >>>>> >> >> >> > Here I would like to express my special thanks to >>>>> @kokonguyen191 >>>>> >> and >>>>> >> >> >> > @yuanboliu for their contributions. And you are also >>>>> welcome to >>>>> >> join us >>>>> >> >> >> > and complete it together. >>>>> >> >> >> > >>>>> >> >> >> > >>>>> >> >> >> > On Wed, 24 Apr 2024 at 14:54, ZanderXu <zande...@apache.org >>>>> > >>>>> >> wrote: >>>>> >> >> >> > >>>>> >> >> >> > > Hi everyone >>>>> >> >> >> > > >>>>> >> >> >> > > All subtasks of the first phase of the FGL have been >>>>> completed >>>>> >> and I >>>>> >> >> >> plan >>>>> >> >> >> > > to merge them into the trunk and start the second phase >>>>> based >>>>> >> on the >>>>> >> >> >> trunk. >>>>> >> >> >> > > >>>>> >> >> >> > > Here is the PR that used to merge the first phases into >>>>> trunk: >>>>> >> >> >> > > https://github.com/apache/hadoop/pull/6762 >>>>> >> >> >> > > Here is the ticket: >>>>> >> https://issues.apache.org/jira/browse/HDFS-17384 >>>>> >> >> >> > > >>>>> >> >> >> > > I hope you can help to review this PR when you are >>>>> available >>>>> >> and give >>>>> >> >> >> some >>>>> >> >> >> > > ideas. >>>>> >> >> >> > > >>>>> >> >> >> > > >>>>> >> >> >> > > HDFS-17385 < >>>>> https://issues.apache.org/jira/browse/HDFS-17385> >>>>> >> is >>>>> >> >> >> used for >>>>> >> >> >> > > the second phase and I have created some subtasks to >>>>> describe >>>>> >> >> >> solutions for >>>>> >> >> >> > > some problems, such as: snapshot, getListing, quota. >>>>> >> >> >> > > You are welcome to join us to complete it together. >>>>> >> >> >> > > >>>>> >> >> >> > > >>>>> >> >> >> > > ---------- Forwarded message --------- >>>>> >> >> >> > > From: Zengqiang XU <zande...@apache.org> >>>>> >> >> >> > > Date: Fri, 2 Feb 2024 at 11:07 >>>>> >> >> >> > > Subject: Discussion about NameNode Fine-grained locking >>>>> >> >> >> > > To: <hdfs-dev@hadoop.apache.org> >>>>> >> >> >> > > Cc: Zengqiang XU <xuzengqiang5...@gmail.com> >>>>> >> >> >> > > >>>>> >> >> >> > > >>>>> >> >> >> > > Hi everyone >>>>> >> >> >> > > >>>>> >> >> >> > > I have started a discussion about NameNode Fine-grained >>>>> Locking >>>>> >> to >>>>> >> >> >> improve >>>>> >> >> >> > > performance of write operations in NameNode. >>>>> >> >> >> > > >>>>> >> >> >> > > I started this discussion again for serval main reasons: >>>>> >> >> >> > > 1. We have implemented it and gained nearly 7x performance >>>>> >> >> >> improvement in >>>>> >> >> >> > > our prod environment >>>>> >> >> >> > > 2. Many other companies made similar improvements based >>>>> on their >>>>> >> >> >> internal >>>>> >> >> >> > > branch. >>>>> >> >> >> > > 3. This topic has been discussed for a long time, but >>>>> still >>>>> >> without >>>>> >> >> >> any >>>>> >> >> >> > > results. >>>>> >> >> >> > > >>>>> >> >> >> > > I hope we can push this important improvement in the >>>>> community >>>>> >> so >>>>> >> >> >> that all >>>>> >> >> >> > > end-users can enjoy this significant improvement. >>>>> >> >> >> > > >>>>> >> >> >> > > I'd really appreciate you can join in and work with me to >>>>> push >>>>> >> this >>>>> >> >> >> > > feature forward. >>>>> >> >> >> > > >>>>> >> >> >> > > Thanks very much. >>>>> >> >> >> > > >>>>> >> >> >> > > Ticket: HDFS-17366 < >>>>> >> https://issues.apache.org/jira/browse/HDFS-17366> >>>>> >> >> >> > > Design: NameNode Fine-grained locking based on directory >>>>> tree >>>>> >> >> >> > > < >>>>> >> >> >> >>>>> >> >>>>> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing >>>>> >> >> >> > >>>>> >> >> >> > > >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >>>>> --------------------------------------------------------------------- >>>>> >> >> >> To unsubscribe, e-mail: private-unsubscr...@hadoop.apache.org >>>>> >> >> >> For additional commands, e-mail: >>>>> private-h...@hadoop.apache.org >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >>>>> >> >>>>> --------------------------------------------------------------------- >>>>> >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >>>>> >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >>>>> >> >>>>> >> >>>>> >>>>