Thanks for your hard work and push it forward. It looks good, +1 for merging phase 1 codes, hope we can work together to promote this major HDFS optimization, so that more companies can benefit from it.
Thanks everyone~ Ayush Saxena <ayush...@gmail.com> 于2024年12月31日周二 20:33写道: > +1, > Thanx folks for your efforts on this! I didn't have time to review > everything thoroughly, but my initial pass suggests it looks good or > atleast is safe to merge. > If I find some spare time, I'll test it further and submit a ticket or > so if I encounter any issues. > > Good Luck!!! > > -Ayush > > On Tue, 31 Dec 2024 at 16:39, Hui Fei <feihui.u...@gmail.com> wrote: > > > > Thanks Zander for bringing this discussion again and trying your best to > push it forward. It's really a long time since last discussion. > > > > It’s indeed time, +1 for merging phase 1 codes based on the following > points > > - The phase 1 feature has been running at scale within companies for a > long time > > - The long-term plan is clear, and also addressed some questions raised > by the community > > - The testing result of future features on memory and performance > > > > ZanderXu <zande...@apache.org> 于2024年12月31日周二 15:36写道: > >> > >> Hi, everyone: > >> > >> Time to Merge FGL Phase I > >> > >> The PR for FGL Phase I is ready for merging! Please take a moment to > review and cast your vote: https://github.com/apache/hadoop/pull/6762. > >> > >> The FGL Phase I has been running successfully in production for over > six months at Shopee and BOSS Zhipin, with no reported performance or > stability issues. It’s now the right time to merge it into the trunk > branch, allowing us to move forward with Phase II. > >> > >> The global lock remains the default lock mode, but users can enable FGL > by configuring > dfs.namenode.lock.model.provider.class=org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock. > >> > >> If there are no objections within 7 days, I will propose an official > vote. > >> > >> Performance and Memory Usage of Phase I > >> > >> Conclusion: > >> > >> Fine-grained locks do not lead to significant performance improvements. > >> > >> Fine-grained locks do not result in additional memory consumption > >> > >> Reasons: > >> > >> BM operations heavily depend on FS operations: IBR and BR still acquire > the global lock (FSLock and BMLock). > >> > >> FS operations depend on BM operations: Common operations (create, > addBlock, getBlockLocations) also acquire the global lock (FSLock and > BMLock). > >> > >> Phase II will bring significant performance improvements by decoupling > FS and BM dependencies and replacing the global FSLock with a fine-grained > IIPLock. > >> > >> Addressing Common Questions > >> > >> Thank you all for raising meaningful questions! > >> > >> I have rewritten the design document to improve clarity. > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?usp=sharing > >> > >> Below is a summary of frequently asked questions and answers: > >> > >> Summary of Questions: > >> > >> Question 1: How is the performance of LockPoolManager? > >> > >> Performance Report: > >> > >> Time to acquire a cached lock: 194 ns > >> > >> Time to acquire a non-cached lock: 1044 ns > >> > >> Time to release an in-use lock: 88 ns > >> > >> Time to release an unused lock: 112 ns > >> > >> Overall Performance: > >> > >> QPS: Over 10 million > >> > >> Time to acquire the IIP lock for a path with depth 10: > >> > >> Fully uncached: 10440 ns + 1120 ns (≈ 11 μs) > >> > >> Fully cached: 1940 ns + 1120 ns (≈ 3 μs) > >> > >> In global lock scenarios, lock wait times are typically in the > millisecond range. Therefor, the cost of acquiring and releasing > fine-grained locks can be ignored. > >> > >> Question 2: How much memory does the FGL consume? > >> > >> Memory Consumption: > >> > >> A single LockResource contains a read-write lock and a counter, > totaling approximately 200 bytes: > >> > >> LockResource: 24 bytes > >> > >> ReentrantReadWriteLock: 150 bytes > >> > >> AtomicInteger: 16 bytes > >> > >> Memory Usage Estimates: > >> > >> 10-level directory depth, 100 handlers > >> > >> 1000 lock resources, approximately 200 KB > >> > >> 10-level directory depth, 1000 handlers > >> > >> 10000 lock resources, approximately 2 MB > >> > >> 1, 000,000 lock resources, approximately 200 MB > >> > >> Conclusion: Memory consumption is negligible. > >> > >> Question 3: What happens if no lock is available in the LockPoolManager? > >> > >> If there are not any available LockResources, two solutions are > available: > >> > >> Return a RetryException, prompting the client to retry later. > >> > >> Temporarily increase the lock entity limit, allocate more locks to meet > client requests, and use an asynchronous thread to recycle locks > periodically. > >> > >> We can provide multiple LockPoolManager implementations for users to > choose from based on production environments. > >> > >> Question 4: Regarding the IIPLock lock depth issue, can we consider > holding only the first 3 or 4 levels of directory locks? > >> > >> This approach is not recommended for the following reasons: > >> > >> Cannot maximize concurrency. > >> > >> Limited savings in lock acquisition/release time and memory usage, > yielding insignificant benefits. > >> > >> Question 5: How should attributes like StoragePolicy, ErasureCoding, > and ACL, which can be set on parent or ancestor directory nodes, be handled? > >> > >> ErasureCoding and ACL: > >> > >> When changing node attributes, hold the corresponding INode’s write > lock. > >> > >> When using ancestor node attributes, hold the corresponding INode’s > read lock. > >> > >> StoragePolicy: > >> > >> More complex due to its impact on both directory tree operations and > Block operations. > >> > >> To improve performance, commonly used block-related operations (such as > BR/IBR) should not acquire IIPLock > >> > >> Detailed design documentation: > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.96lztsl4mwfk > >> > >> Question 6: How should FGL be implemented for the SNAPSHOT feature? > >> > >> Since the Rename operation on the SNAPSHOT directory is supported, > holding only the write lock of the SNAPSHOT root directory cannot cover the > rename situation, so the thread safety of SNAPSHOT-related operations > cannot be guaranteed > >> > >> It is recommended to use global FS lock to ensure thread safety. > >> > >> Detailed design documentation: > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.sm36p6bfcpec > >> > >> Question 7: How should FGL be implemented for the Symlinks feature? > >> > >> The Target path of Symlinks is a string, and the client performs a > second forward access to the Target path. So the fine-grained lock project > requires no special handling > >> > >> For the createSymlink RPC, the FGL needs to acquire the IIPLocks for > both target and link paths. > >> > >> Question 8: How should FGL be implemented for the reserved feature? > >> > >> The Reserved feature has two usage modes: > >> > >> /.reserved/iNodes/${inode id} > >> > >> /.reserved/raw/${path} > >> > >> INodeId Mode: During the resolvePath phase, obtain the real IIPLock > lock via INodeId. > >> > >> Path Mode: During the resolvePath phase, obtain the real IIPLock lock > via path. > >> > >> Detailed design documentation: > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.h6rcpzkbpanf > >> > >> Question 9: Why is INodeFileLock used as the FGL for BlockInfo? > >> > >> INodeFile and Block have mutual dependencies: > >> > >> INodeFile depends on Block for state and size. > >> > >> Block depends on INodeFile for state and storage policy. > >> > >> Therefore, using INodeFileLock as the fine-grained lock for BlockInfo > is a reasonable choice. > >> > >> Detailed design documentation: > https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.zesd6omuu3kr > >> > >> Seeking Community Feedback > >> > >> Your questions and concerns are always welcome. > >> > >> We can discuss them in detail on the Slack Channel: > https://app.slack.com/client/T4S1WH2J3/C06UDTBQ2SH > >> > >> Let’s work together to advance the Fine-Grained Lock project. I believe > this initiative will deliver significant performance improvements to the > HDFS community and help reinvigorate its activity. > >> > >> Wishing everyone a Happy New Year 2025! > >> > >> > >> On Wed, 5 Jun 2024 at 16:17, ZanderXu <zande...@apache.org> wrote: > >>> > >>> I plan to hold a meeting on 2024-06-06 from 3:00 PM - 4:00 PM to share > the FGL's motivations and some concerns in detail in Chinese. > >>> > >>> The doc is : NameNode Fine-Grained Locking Based On Directory Tree (II) > >>> > >>> The meeting URL is: https://sea.zoom.us/j/94168001269 > >>> > >>> You are welcome to this meeting. > >>> > >>> On Mon, 6 May 2024 at 23:57, Hui Fei <feihui.u...@gmail.com> wrote: > >>>> > >>>> BTW, there is a Slack channel hdfs-fgl for this feature. can join it > and discuss more details. > >>>> > >>>> Is it necessary to hold a meeting to discuss this? So that we can > push it forward quickly. Agreed with ZanderXu, it seems inefficient to > discuss details via email list. > >>>> > >>>> > >>>> Hui Fei <feihui.u...@gmail.com> 于2024年5月6日周一 23:50写道: > >>>>> > >>>>> Thanks all > >>>>> > >>>>> Seems all concerns are related to the stage 2. We can address these > and make it more clear before we start it. > >>>>> > >>>>> From development experience, I think it is reasonable to split the > big feature into several stages. And stage 1 is also independent and it > also can be as a minor feature that uses fs and bm locks instead of the > global lock. > >>>>> > >>>>> > >>>>> ZanderXu <zande...@apache.org> 于2024年4月29日周一 15:17写道: > >>>>>> > >>>>>> Thanks @Ayush Saxena <ayush...@gmail.com> and @Xiaoqiao He > >>>>>> <hexiaoq...@apache.org> for your nice questions. > >>>>>> > >>>>>> Let me summarize your concerns and corresponding solutions: > >>>>>> > >>>>>> *1. Questions about the Snapshot feature* > >>>>>> It's difficult to apply the FGL to Snapshot feature, but we can > just using > >>>>>> the global FS write lock to make it thread safe. > >>>>>> So if we can identity if a path contains the snapshot feature, we > can just > >>>>>> using the global FS write lock to protect it. > >>>>>> > >>>>>> You can refer to HDFS-17479 > >>>>>> <https://issues.apache.org/jira/browse/HDFS-17479> to get how to > identify > >>>>>> it. > >>>>>> > >>>>>> Regarding performance of the operations related to the snapshot > features, > >>>>>> we can discuss it in two categories: > >>>>>> Read operations involves snapshots: > >>>>>> The FGL branch uses the global write lock to protect them, the > GLOBAL > >>>>>> branch uses the global read lock to protect them. It's hard to > conclude > >>>>>> which version has better performance, it depends on the global lock > >>>>>> competition. > >>>>>> > >>>>>> Write operations involves snapshots: > >>>>>> Both FGL and GLOBAL branch use the global write lock to protect > them. It's > >>>>>> hard to conclude which version has better performance, it depends > on the > >>>>>> global lock competition too. > >>>>>> > >>>>>> So I think if namenode load is low, the GLOBAL branch will have a > better > >>>>>> performance than FGL; If namenode load is high, the FGL branch may > have a > >>>>>> better performance than the GLOBAL, which also depends on the ratio > of read > >>>>>> and write operations on the SNAPSHOT feature. > >>>>>> > >>>>>> We can do somethings to let end-user to choose a branch with a > better > >>>>>> branch according to their business: > >>>>>> First, we need to make the lock mode can be selectable, so that > end-user > >>>>>> can choose to use FGL of GLOBAL. > >>>>>> Second, using the global write lock to make operations related to > snapshot > >>>>>> thread safe as I described in HDFS-17479. > >>>>>> > >>>>>> > >>>>>> *2. Questions about the Symlinks feature* > >>>>>> If Symlink is related to snapshot, we can refer to the solution of > the > >>>>>> snapshot; If Symlink is not related to snapshot, I think it's easy > to meet > >>>>>> the FGL. > >>>>>> Only createSymlink involves two paths, FGL just need to lock them > in the > >>>>>> order to make this operation thread. For other operations, it is > the same > >>>>>> as other normal iNode, right? > >>>>>> > >>>>>> If I missed difficult points, please let me know. > >>>>>> > >>>>>> > >>>>>> *3. Questions about Memory Usage of iNode locks* > >>>>>> I think there are too many solutions to limit the memory usage of > these > >>>>>> iNode locks, such as: Using a limit capacity lock pool to ensure the > >>>>>> maximum memory usage, Just holding iNode locks for fixed depth of > >>>>>> directories, etc. > >>>>>> > >>>>>> We can just abstract this LockManager first and then support its > >>>>>> implementation with different ideas, so that we can limit the > maximum > >>>>>> memory usage of these iNode locks. > >>>>>> FGL can acquire or lease iNode locks through LockManager. > >>>>>> > >>>>>> > >>>>>> *4. Questions about Performance of acquiring and releasing iNode > locks* > >>>>>> We can add some benchmark for LockManager, to test the performance > or > >>>>>> acquire and release unblocked locks. > >>>>>> > >>>>>> > >>>>>> *5. Questions about StoragePolicy, ECPolicy, ACL, Quota, etc.* > >>>>>> These policies may be sot on an ancestor node and used by some > children > >>>>>> files. The set operation for these policies will be protected by > the > >>>>>> directory tree, since there are all file-related operations. In > addition > >>>>>> to Quota and StoragePolicy, the use of other policies will also be > >>>>>> protected by directory tree, such as ECPolicy and ACL. > >>>>>> > >>>>>> Quota is a little special since its update operations may not be > protected > >>>>>> by the directory tree, we can assign a locks to each QuotaFeature > and use > >>>>>> these locks to make updating operations thread safe. you can refer > to > >>>>>> HDFS-17473 <https://issues.apache.org/jira/browse/HDFS-17473> to > get some > >>>>>> detailed information. > >>>>>> > >>>>>> StoragePolicy is a little special since it is used not only by > file-related > >>>>>> operations but also block-related operations. > ProcessExtraRedundancyBlock > >>>>>> uses storage policy to choose redundancy replicas and > >>>>>> BlockReconstructionWork uses storage policy to choose target DNs. > In order > >>>>>> to maximize the performance improvement, BR and IBR should only > involve the > >>>>>> iNodeFile to which the current processing block belongs. These > redundancy > >>>>>> blocks can be processed by the Redundancy monitor while holding the > >>>>>> directory tree locks. You can refer to HDFS-17505 > >>>>>> <https://issues.apache.org/jira/browse/HDFS-17505> to get more > detailed > >>>>>> informations. > >>>>>> > >>>>>> *6. Performance of the phase 1* > >>>>>> HDFS-17506 <https://issues.apache.org/jira/browse/HDFS-17506> is > used to do > >>>>>> some performance testing for phase 1, and I will complete it later. > >>>>>> > >>>>>> > >>>>>> Discuss solution through mails is not efficient, you can create one > >>>>>> sub-tasks under HDFS-17366 > >>>>>> <https://issues.apache.org/jira/browse/HDFS-17366> to describe your > >>>>>> concerns and I will try to give some answers. > >>>>>> > >>>>>> Thanks @Ayush Saxena <ayush...@gmail.com> and @Xiaoqiao He > >>>>>> <hexiaoq...@apache.org> again. > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Mon, 29 Apr 2024 at 02:00, Ayush Saxena <ayush...@gmail.com> > wrote: > >>>>>> > >>>>>> > Thanx Everyone for chasing this, Great to see some momentum > around FGL, > >>>>>> > that should be a great improvement. > >>>>>> > > >>>>>> > I have some two broad categories: > >>>>>> > ** About the process:* > >>>>>> > I think in the above mails, there are mentions that phase one is > complete > >>>>>> > in a feature branch & we are gonna merge that to trunk. If I am > catching it > >>>>>> > right, then you can't hit the merge button like that. To merge a > feature > >>>>>> > branch. You need to call for a Vote specific to that branch & it > requires 3 > >>>>>> > binding votes to merge, unlike any other code change which > requires 1. It > >>>>>> > is there in our Bylaws. > >>>>>> > > >>>>>> > So, do follow the process. > >>>>>> > > >>>>>> > ** About the feature itself:* (A very quick look at the doc and > the Jira, > >>>>>> > so please take it with a grain of salt) > >>>>>> > * The Google Drive link that you folks shared as part of the > first mail. I > >>>>>> > don't have access to that. So, please open up the permissions for > that doc > >>>>>> > or share the new link > >>>>>> > * Chasing the design doc present on the Jira > >>>>>> > * I think we only have Phase-1 ready, so can you share some > metrics just > >>>>>> > for that? Perf improvements just with splitting the FS & BM Locks > >>>>>> > * The memory implications of Phase-1? I don't think there should > be any > >>>>>> > major impact on the memory in case of just phase-1 > >>>>>> > * Regarding the snapshot stuff, you mentioned taking lock on the > root > >>>>>> > itself? Does just taking lock on the snapshot root rather than > the FS root > >>>>>> > works? > >>>>>> > * Secondly about the usage of Snapshot or Symlinks, I don't think > we > >>>>>> > should operate under the assumptions that they aren't widely used > or not, > >>>>>> > we might just not know folks who don't use it widely or they are > just users > >>>>>> > not the ones contributing. We can just accept for now, that in > those cases > >>>>>> > it isn't optimised and we just lock the entire FS space, which it > does even > >>>>>> > today, so no regressions there. > >>>>>> > * Regarding memory usage: Do you have some numbers on how much > the memory > >>>>>> > footprint increases? > >>>>>> > * Under the Lock Pool: I think you are assuming there would be > very few > >>>>>> > inodes where lock would be required at any given time, so there > won't be > >>>>>> > too much heap consumption? I think you are compromising on the > Horizontal > >>>>>> > Scalability here. I doubt if your assumption doesn't hold true, > under heavy > >>>>>> > read load by concurrent clients accessing different inodes, the > Namenode > >>>>>> > will start giving memory troubles, that would do more harm than > good. > >>>>>> > Anyway Namenode heap is way bigger problem than anything, so we > should be > >>>>>> > very careful increasing load over there. > >>>>>> > * For the Locks on the inodes: Do you plan to have locs for each > inode? > >>>>>> > Can we somehow limit that to the depth of the tree? Like > currently we take > >>>>>> > lock on the root, have a config which makes us take lock at > Level-2 or 3 > >>>>>> > (configurable), that might fetch some perf benefits and can be > used to > >>>>>> > control the memory usage as well? > >>>>>> > * What is the cost of creating these inode locks? If the lock > isn't > >>>>>> > already cached it would incur some cost? Do you have some numbers > around > >>>>>> > that? Say I disable caching altogether & then let a test load > run, what > >>>>>> > does the perf numbers look like in that case > >>>>>> > * I think we need to limit the size of INodeLockPool, we can't > let it grow > >>>>>> > infinitely in case of heavy loads and we need to have some auto > >>>>>> > throttling mechanism for it > >>>>>> > * I didn't catch your Storage Policy problem. If I decode it > right, the > >>>>>> > problem is like the policy could be set on an ancestor node & the > children > >>>>>> > abide by that & this is the problem, if that is the case then > isn't that > >>>>>> > the case with ErasureCoding policies or even ACLs or so? Can you > elaborate > >>>>>> > a bit on that. > >>>>>> > > >>>>>> > > >>>>>> > Anyway, regarding the Phase-1. If you share (the perf numbers > with proper > >>>>>> > details + Impact on memory if any) for just phase 1 & if they are > good, > >>>>>> > then if you call for a branch merge vote for Phase-1 FGL, you > have my vote, > >>>>>> > however you'll need to sway the rest of the folks on your own :-) > >>>>>> > > >>>>>> > Good Luck, Nice Work Guys!!! > >>>>>> > > >>>>>> > -Ayush > >>>>>> > > >>>>>> > > >>>>>> > On Sun, 28 Apr 2024 at 18:32, Xiaoqiao He <hexiaoq...@apache.org> > wrote: > >>>>>> > > >>>>>> >> Thanks ZanderXu and Hui Fei for your work on this feature. It > will be > >>>>>> >> a very helpful improvement for the HDFS module in the next > journal. > >>>>>> >> > >>>>>> >> 1. If we need any more review bandwidth, I would like to be > involved > >>>>>> >> to help review if possible. > >>>>>> >> 2. From the design document there are still missing some detailed > >>>>>> >> descriptions such as snapshot, symbolic link and reserved etc as > mentioned > >>>>>> >> above. I think it will be helpful for newbies who want to be > involved > >>>>>> >> if all corner > >>>>>> >> cases are considered and described. > >>>>>> >> 3. From slack, we plan to check into the trunk at this phase. I > am not > >>>>>> >> sure > >>>>>> >> If it is the proper time, following the dev plan there are two > steps left > >>>>>> >> to > >>>>>> >> finish this feature from the design document, right? If that, I > think we > >>>>>> >> should > >>>>>> >> postpone checking in when all plans are ready. Considering that > there are > >>>>>> >> many unfinished tries for this feature in history, I think > postpone > >>>>>> >> checking > >>>>>> >> will be the safe way, another way it will involve more rebase > cost if you > >>>>>> >> keep > >>>>>> >> separate dev branch, however I think It is not one difficult > thing for > >>>>>> >> you. > >>>>>> >> > >>>>>> >> Good luck and look forward to making that happen soon! > >>>>>> >> > >>>>>> >> Best Regards, > >>>>>> >> - He Xiaoqiao > >>>>>> >> > >>>>>> >> On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <feihui.u...@gmail.com> > wrote: > >>>>>> >> > > >>>>>> >> > Thanks for interest and advice on this. > >>>>>> >> > > >>>>>> >> > Just would like to share some info here > >>>>>> >> > > >>>>>> >> > ZanderXu leads this feature and he has spent a lot of time on > it. He is > >>>>>> >> the main developer in stage 1. Yuanboliu and Kokonguyen191 also > took some > >>>>>> >> tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99 > RocMarshal > >>>>>> >> kokonguyen191) helped review PRs. (Forgive me if I missed > someone) > >>>>>> >> > > >>>>>> >> > Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very > >>>>>> >> familiar with this feature. We discussed many details offline. > >>>>>> >> > > >>>>>> >> > Welcome to more people interested in joining the development > and review > >>>>>> >> of the stage 2 and 3. > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > Zengqiang XU <xuzengqiang5...@gmail.com> 于2024年4月26日周五 > 14:56写道: > >>>>>> >> >> > >>>>>> >> >> Thanks Shilun for your response: > >>>>>> >> >> > >>>>>> >> >> 1. This is a big and very useful feature, so it really needs > more > >>>>>> >> >> developers to get on board. > >>>>>> >> >> 2. This fine grained lock has been implemented based on > internal > >>>>>> >> branches > >>>>>> >> >> and has gained benefits by many companies, such as: Meituan, > Kuaishou, > >>>>>> >> >> Bytedance, etc. But it has not been contributed to the > community due > >>>>>> >> to > >>>>>> >> >> various reasons, such as there is a big difference between > the version > >>>>>> >> of > >>>>>> >> >> the internal branch and the community trunk branch, the > internal > >>>>>> >> branch may > >>>>>> >> >> ignore some functions to make FGL clear, and the contribution > needs a > >>>>>> >> lot > >>>>>> >> >> of work and will take many times. It means that this solution > has > >>>>>> >> already > >>>>>> >> >> been practiced in their prod environment. We have also > practiced it in > >>>>>> >> our > >>>>>> >> >> prod environment and gained benefits, and we are also willing > to spend > >>>>>> >> a > >>>>>> >> >> lot of time contributing to the community. > >>>>>> >> >> 3. Regarding the benchmark testing, we don't need to pay more > >>>>>> >> attention to > >>>>>> >> >> whether the performance is improved by 5 times, 10 times or > 20 times, > >>>>>> >> >> because there are too many factors that affect it. > >>>>>> >> >> 4. As I described above, this solution is already being > practiced by > >>>>>> >> many > >>>>>> >> >> companies. Right now, we just need to think about how to > implement it > >>>>>> >> with > >>>>>> >> >> high quality and more comprehensively. > >>>>>> >> >> 5. I firmly believe that all problems can be solved as long > as the > >>>>>> >> overall > >>>>>> >> >> solution is right. > >>>>>> >> >> 6. I can spend a lot of time leading the promotion of this > entire > >>>>>> >> feature > >>>>>> >> >> and I hope more people can join us in promoting it. > >>>>>> >> >> 7. You are always welcome to raise your concerns. > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> Thanks Shilun again, I hope you can help review designs and > PRs. Thanks > >>>>>> >> >> > >>>>>> >> >> On Fri, 26 Apr 2024 at 08:00, slfan1989 <slfan1...@apache.org> > wrote: > >>>>>> >> >> > >>>>>> >> >> > Thank you for your hard work! This is a very meaningful > improvement, > >>>>>> >> and > >>>>>> >> >> > from the design document, we can see a significant increase > in HDFS > >>>>>> >> >> > read/write throughput. > >>>>>> >> >> > > >>>>>> >> >> > I am happy to see the progress made on HDFS-17384. > >>>>>> >> >> > > >>>>>> >> >> > However, I still have some concerns, which roughly involve > the > >>>>>> >> following > >>>>>> >> >> > aspects: > >>>>>> >> >> > > >>>>>> >> >> > 1. While ZanderXu and Hui Fei have deep expertise in HDFS > and are > >>>>>> >> familiar > >>>>>> >> >> > with related development details, we still need more > community > >>>>>> >> member to > >>>>>> >> >> > review the code to ensure that the relevant upgrades meet > >>>>>> >> expectations. > >>>>>> >> >> > > >>>>>> >> >> > 2. We need more details on benchmarks to ensure that test > results > >>>>>> >> can be > >>>>>> >> >> > reproduced and to allow more community member to > participate in the > >>>>>> >> testing > >>>>>> >> >> > process. > >>>>>> >> >> > > >>>>>> >> >> > Looking forward to everything going smoothly in the future. > >>>>>> >> >> > > >>>>>> >> >> > Best Regards, > >>>>>> >> >> > - Shilun Fan. > >>>>>> >> >> > > >>>>>> >> >> > On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He < > hexiaoq...@apache.org> > >>>>>> >> wrote: > >>>>>> >> >> > > >>>>>> >> >> >> cc private@h.a.o. > >>>>>> >> >> >> > >>>>>> >> >> >> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu < > zande...@apache.org> > >>>>>> >> wrote: > >>>>>> >> >> >> > > >>>>>> >> >> >> > Here are some summaries about the first phase: > >>>>>> >> >> >> > 1. There are no big changes in this phase > >>>>>> >> >> >> > 2. This phase just uses FS lock and BM lock to replace > the > >>>>>> >> original > >>>>>> >> >> >> global > >>>>>> >> >> >> > lock > >>>>>> >> >> >> > 3. It's useful to improve the performance, since some > operations > >>>>>> >> just > >>>>>> >> >> >> need > >>>>>> >> >> >> > to hold FS lock or BM lock instead of the global lock > >>>>>> >> >> >> > 4. This feature is turned off by default, you can enable > it by > >>>>>> >> setting > >>>>>> >> >> >> > dfs.namenode.lock.model.provider.class to > >>>>>> >> >> >> > > >>>>>> >> > org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock > >>>>>> >> >> >> > 5. This phase is very import for the ongoing development > of the > >>>>>> >> entire > >>>>>> >> >> >> FGL > >>>>>> >> >> >> > > >>>>>> >> >> >> > Here I would like to express my special thanks to > @kokonguyen191 > >>>>>> >> and > >>>>>> >> >> >> > @yuanboliu for their contributions. And you are also > welcome to > >>>>>> >> join us > >>>>>> >> >> >> > and complete it together. > >>>>>> >> >> >> > > >>>>>> >> >> >> > > >>>>>> >> >> >> > On Wed, 24 Apr 2024 at 14:54, ZanderXu < > zande...@apache.org> > >>>>>> >> wrote: > >>>>>> >> >> >> > > >>>>>> >> >> >> > > Hi everyone > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > All subtasks of the first phase of the FGL have been > completed > >>>>>> >> and I > >>>>>> >> >> >> plan > >>>>>> >> >> >> > > to merge them into the trunk and start the second > phase based > >>>>>> >> on the > >>>>>> >> >> >> trunk. > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > Here is the PR that used to merge the first phases > into trunk: > >>>>>> >> >> >> > > https://github.com/apache/hadoop/pull/6762 > >>>>>> >> >> >> > > Here is the ticket: > >>>>>> >> https://issues.apache.org/jira/browse/HDFS-17384 > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > I hope you can help to review this PR when you are > available > >>>>>> >> and give > >>>>>> >> >> >> some > >>>>>> >> >> >> > > ideas. > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > HDFS-17385 < > https://issues.apache.org/jira/browse/HDFS-17385> > >>>>>> >> is > >>>>>> >> >> >> used for > >>>>>> >> >> >> > > the second phase and I have created some subtasks to > describe > >>>>>> >> >> >> solutions for > >>>>>> >> >> >> > > some problems, such as: snapshot, getListing, quota. > >>>>>> >> >> >> > > You are welcome to join us to complete it together. > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > ---------- Forwarded message --------- > >>>>>> >> >> >> > > From: Zengqiang XU <zande...@apache.org> > >>>>>> >> >> >> > > Date: Fri, 2 Feb 2024 at 11:07 > >>>>>> >> >> >> > > Subject: Discussion about NameNode Fine-grained locking > >>>>>> >> >> >> > > To: <hdfs-dev@hadoop.apache.org> > >>>>>> >> >> >> > > Cc: Zengqiang XU <xuzengqiang5...@gmail.com> > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > Hi everyone > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > I have started a discussion about NameNode > Fine-grained Locking > >>>>>> >> to > >>>>>> >> >> >> improve > >>>>>> >> >> >> > > performance of write operations in NameNode. > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > I started this discussion again for serval main > reasons: > >>>>>> >> >> >> > > 1. We have implemented it and gained nearly 7x > performance > >>>>>> >> >> >> improvement in > >>>>>> >> >> >> > > our prod environment > >>>>>> >> >> >> > > 2. Many other companies made similar improvements > based on their > >>>>>> >> >> >> internal > >>>>>> >> >> >> > > branch. > >>>>>> >> >> >> > > 3. This topic has been discussed for a long time, but > still > >>>>>> >> without > >>>>>> >> >> >> any > >>>>>> >> >> >> > > results. > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > I hope we can push this important improvement in the > community > >>>>>> >> so > >>>>>> >> >> >> that all > >>>>>> >> >> >> > > end-users can enjoy this significant improvement. > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > I'd really appreciate you can join in and work with me > to push > >>>>>> >> this > >>>>>> >> >> >> > > feature forward. > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > Thanks very much. > >>>>>> >> >> >> > > > >>>>>> >> >> >> > > Ticket: HDFS-17366 < > >>>>>> >> https://issues.apache.org/jira/browse/HDFS-17366> > >>>>>> >> >> >> > > Design: NameNode Fine-grained locking based on > directory tree > >>>>>> >> >> >> > > < > >>>>>> >> >> >> > >>>>>> >> > https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing > >>>>>> >> >> >> > > >>>>>> >> >> >> > > > >>>>>> >> >> >> > >>>>>> >> >> >> > >>>>>> >> > --------------------------------------------------------------------- > >>>>>> >> >> >> To unsubscribe, e-mail: > private-unsubscr...@hadoop.apache.org > >>>>>> >> >> >> For additional commands, e-mail: > private-h...@hadoop.apache.org > >>>>>> >> >> >> > >>>>>> >> >> >> > >>>>>> >> > >>>>>> >> > --------------------------------------------------------------------- > >>>>>> >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > >>>>>> >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >>>>>> >> > >>>>>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >