Re: Discussion about NameNode Fine-grained locking

Xiaoqiao He Sun, 05 Jan 2025 20:00:53 -0800

Thanks all for your great work and the big step progress here.

Some nit comments before check in,
a. Please check the failed unit tests if related to this PR at
https://github.com/apache/hadoop/pull/6762.
It is better to execute and get a green result before check in.
b. Please mark the correct tag `fix version`, `Component/s`, `Labels` and
`Flags` for the subtask in JIRA.
Some examples are [1][2].
Good luck!


Best Regards,
- He Xiaoqiao

[1] https://issues.apache.org/jira/browse/HDFS-13891
[2] https://issues.apache.org/jira/browse/HDFS-17531


On Thu, Jan 2, 2025 at 10:54 AM Zhanghaobo <[email protected]> wrote:

> Thanks for your great work!  +1 for merging phase 1 codes.
>
> My product clusters have been running phase 1 codes for several months, it
> looks good.
>
> Hope to push this feature forward.
>
>
>
> 张浩博
> [email protected]
>
> <https://dashi.163.com/projects/signature-manager/detail/index.html?ftlId=1&name=%E5%BC%A0%E6%B5%A9%E5%8D%9A&uid=hfutzhanghb%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsmc804eb39b0e7885aa8801c3bb66e497d.jpg&items=%5B%22hfutzhanghb%40163.com%22%5D>
>
> ---- Replied Message ----
> From haiyang hu<[email protected]> <[email protected]>
> Date 12/31/2024 23:08
> To Ayush Saxena<[email protected]> <[email protected]>
> Cc Hui Fei<[email protected]> ,
> <[email protected]> ZanderXu<[email protected]> ,
> <[email protected]> Hdfs-dev<[email protected]> ,
> <[email protected]> <[email protected]> ,
> <[email protected]> Xiaoqiao He<[email protected]> ,
> <[email protected]> slfan1989<[email protected]> ,
> <[email protected]> <[email protected]> <[email protected]>
> Subject Re: Discussion about NameNode Fine-grained locking
> Thanks for your hard work and push it forward.
> It looks good, +1 for merging phase 1 codes, hope we can work together to
> promote this major HDFS optimization,
> so that more companies can benefit from it.
>
> Thanks everyone~
>
> Ayush Saxena <[email protected]> 于2024年12月31日周二 20:33写道：
>
> +1,
> Thanx folks for your efforts on this! I didn't have time to review
> everything thoroughly, but my initial pass suggests it looks good or
> atleast is safe to merge.
> If I find some spare time, I'll test it further and submit a ticket or
> so if I encounter any issues.
>
> Good Luck!!!
>
> -Ayush
>
> On Tue, 31 Dec 2024 at 16:39, Hui Fei <[email protected]> wrote:
>
>
> Thanks Zander for bringing this discussion again and trying your best to
>
> push it forward. It's really a long time since last discussion.
>
>
> It’s indeed time, +1 for merging phase 1 codes based on the following
>
> points
>
> - The phase 1 feature has been running at scale within companies for a
>
> long time
>
> - The long-term plan is clear, and also addressed some questions raised
>
> by the community
>
> - The testing result of future features on memory and performance
>
> ZanderXu <[email protected]> 于2024年12月31日周二 15:36写道：
>
>
> Hi, everyone:
>
> Time to Merge FGL Phase I
>
> The PR for FGL Phase I is ready for merging! Please take a moment to
>
> review and cast your vote: https://github.com/apache/hadoop/pull/6762.
>
>
> The FGL Phase I has been running successfully in production for over
>
> six months at Shopee and BOSS Zhipin, with no reported performance or
> stability issues. It’s now the right time to merge it into the trunk
> branch, allowing us to move forward with Phase II.
>
>
> The global lock remains the default lock mode, but users can enable FGL
>
> by configuring
>
> dfs.namenode.lock.model.provider.class=org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock.
>
>
> If there are no objections within 7 days, I will propose an official
>
> vote.
>
>
> Performance and Memory Usage of Phase I
>
> Conclusion：
>
> Fine-grained locks do not lead to significant performance improvements.
>
> Fine-grained locks do not result in additional memory consumption
>
> Reasons:
>
> BM operations heavily depend on FS operations: IBR and BR still acquire
>
> the global lock (FSLock and BMLock).
>
>
> FS operations depend on BM operations: Common operations (create,
>
> addBlock, getBlockLocations) also acquire the global lock (FSLock and
> BMLock).
>
>
> Phase II will bring significant performance improvements by decoupling
>
> FS and BM dependencies and replacing the global FSLock with a fine-grained
> IIPLock.
>
>
> Addressing Common Questions
>
> Thank you all for raising meaningful questions!
>
> I have rewritten the design document to improve clarity.
>
>
> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?usp=sharing
>
>
> Below is a summary of frequently asked questions and answers:
>
> Summary of Questions:
>
> Question 1: How is the performance of LockPoolManager?
>
> Performance Report:
>
> Time to acquire a cached lock: 194 ns
>
> Time to acquire a non-cached lock: 1044 ns
>
> Time to release an in-use lock: 88 ns
>
> Time to release an unused lock: 112 ns
>
> Overall Performance:
>
> QPS: Over 10 million
>
> Time to acquire the IIP lock for a path with depth 10:
>
> Fully uncached: 10440 ns + 1120 ns (≈ 11 μs)
>
> Fully cached: 1940 ns + 1120 ns (≈ 3 μs)
>
> In global lock scenarios, lock wait times are typically in the
>
> millisecond range. Therefor, the cost of acquiring and releasing
> fine-grained locks can be ignored.
>
>
> Question 2: How much memory does the FGL consume?
>
> Memory Consumption:
>
> A single LockResource contains a read-write lock and a counter,
>
> totaling approximately 200 bytes:
>
>
> LockResource: 24 bytes
>
> ReentrantReadWriteLock: 150 bytes
>
> AtomicInteger: 16 bytes
>
> Memory Usage Estimates:
>
> 10-level directory depth, 100 handlers
>
> 1000 lock resources, approximately 200 KB
>
> 10-level directory depth, 1000 handlers
>
> 10000 lock resources, approximately 2 MB
>
> 1, 000,000 lock resources, approximately 200 MB
>
> Conclusion: Memory consumption is negligible.
>
> Question 3: What happens if no lock is available in the LockPoolManager?
>
> If there are not any available LockResources, two solutions are
>
> available:
>
>
> Return a RetryException, prompting the client to retry later.
>
> Temporarily increase the lock entity limit, allocate more locks to meet
>
> client requests, and use an asynchronous thread to recycle locks
> periodically.
>
>
> We can provide multiple LockPoolManager implementations for users to
>
> choose from based on production environments.
>
>
> Question 4: Regarding the IIPLock lock depth issue, can we consider
>
> holding only the first 3 or 4 levels of directory locks?
>
>
> This approach is not recommended for the following reasons:
>
> Cannot maximize concurrency.
>
> Limited savings in lock acquisition/release time and memory usage,
>
> yielding insignificant benefits.
>
>
> Question 5: How should attributes like StoragePolicy, ErasureCoding,
>
> and ACL, which can be set on parent or ancestor directory nodes, be
> handled?
>
>
> ErasureCoding and ACL:
>
> When changing node attributes, hold the corresponding INode’s write
>
> lock.
>
>
> When using ancestor node attributes, hold the corresponding INode’s
>
> read lock.
>
>
> StoragePolicy:
>
> More complex due to its impact on both directory tree operations and
>
> Block operations.
>
>
> To improve performance, commonly used block-related operations (such as
>
> BR/IBR) should not acquire IIPLock
>
>
> Detailed design documentation:
>
>
> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.96lztsl4mwfk
>
>
> Question 6: How should FGL be implemented for the SNAPSHOT feature?
>
> Since the Rename operation on the SNAPSHOT directory is supported,
>
> holding only the write lock of the SNAPSHOT root directory cannot cover the
> rename situation, so the thread safety of SNAPSHOT-related operations
> cannot be guaranteed
>
>
> It is recommended to use global FS lock to ensure thread safety.
>
> Detailed design documentation:
>
>
> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.sm36p6bfcpec
>
>
> Question 7: How should FGL be implemented for the Symlinks feature?
>
> The Target path of Symlinks is a string, and the client performs a
>
> second forward access to the Target path. So the fine-grained lock project
> requires no special handling
>
>
> For the createSymlink RPC, the FGL needs to acquire the IIPLocks for
>
> both target and link paths.
>
>
> Question 8: How should FGL be implemented for the reserved feature?
>
> The Reserved feature has two usage modes:
>
> /.reserved/iNodes/${inode id}
>
> /.reserved/raw/${path}
>
> INodeId Mode: During the resolvePath phase, obtain the real IIPLock
>
> lock via INodeId.
>
>
> Path Mode: During the resolvePath phase, obtain the real IIPLock lock
>
> via path.
>
>
> Detailed design documentation:
>
>
> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.h6rcpzkbpanf
>
>
> Question 9: Why is INodeFileLock used as the FGL for BlockInfo?
>
> INodeFile and Block have mutual dependencies:
>
> INodeFile depends on Block for state and size.
>
> Block depends on INodeFile for state and storage policy.
>
> Therefore, using INodeFileLock as the fine-grained lock for BlockInfo
>
> is a reasonable choice.
>
>
> Detailed design documentation:
>
>
> https://docs.google.com/document/d/1DXkiVxef9wCmICjpZyIQO-yxsgwc4wnf2lTKQ3UXe30/edit?tab=t.0#heading=h.zesd6omuu3kr
>
>
> Seeking Community Feedback
>
> Your questions and concerns are always welcome.
>
> We can discuss them in detail on the Slack Channel:
>
> https://app.slack.com/client/T4S1WH2J3/C06UDTBQ2SH
>
>
> Let’s work together to advance the Fine-Grained Lock project. I believe
>
> this initiative will deliver significant performance improvements to the
> HDFS community and help reinvigorate its activity.
>
>
> Wishing everyone a Happy New Year 2025!
>
>
> On Wed, 5 Jun 2024 at 16:17, ZanderXu <[email protected]> wrote:
>
>
> I plan to hold a meeting on 2024-06-06 from 3:00 PM - 4:00 PM to share
>
> the FGL's motivations and some concerns in detail in Chinese.
>
>
> The doc is : NameNode Fine-Grained Locking Based On Directory Tree (II)
>
> The meeting URL is: https://sea.zoom.us/j/94168001269
>
> You are welcome to this meeting.
>
> On Mon, 6 May 2024 at 23:57, Hui Fei <[email protected]> wrote:
>
>
> BTW, there is a Slack channel hdfs-fgl for this feature. can join it
>
> and discuss more details.
>
>
> Is it necessary to hold a meeting to discuss this? So that we can
>
> push it forward quickly. Agreed with ZanderXu, it seems inefficient to
> discuss details via email list.
>
>
>
> Hui Fei <[email protected]> 于2024年5月6日周一 23:50写道：
>
>
> Thanks all
>
> Seems all concerns are related to the stage 2. We can address these
>
> and make it more clear before we start it.
>
>
> From development experience, I think it is reasonable to split the
>
> big feature into several stages. And stage 1 is also independent and it
> also can be as a minor feature that uses fs and bm locks instead of the
> global lock.
>
>
>
> ZanderXu <[email protected]> 于2024年4月29日周一 15:17写道：
>
>
> Thanks @Ayush Saxena <[email protected]> and @Xiaoqiao He
> <[email protected]> for your nice questions.
>
> Let me summarize your concerns and corresponding solutions:
>
> *1. Questions about the Snapshot feature*
> It's difficult to apply the FGL to Snapshot feature,  but we can
>
> just using
>
> the global FS write lock to make it thread safe.
> So if we can identity if a path contains the snapshot feature, we
>
> can just
>
> using the global FS write lock to protect it.
>
> You can refer to HDFS-17479
> <https://issues.apache.org/jira/browse/HDFS-17479> to get how to
>
> identify
>
> it.
>
> Regarding performance of the operations related to the snapshot
>
> features,
>
> we can discuss it in two categories:
> Read operations involves snapshots:
> The FGL branch uses the global write lock to protect them, the
>
> GLOBAL
>
> branch uses the global read lock to protect them. It's hard to
>
> conclude
>
> which version has better performance, it depends on the global lock
> competition.
>
> Write operations involves snapshots:
> Both FGL and GLOBAL branch use the global write lock to protect
>
> them. It's
>
> hard to conclude which version has better performance, it depends
>
> on the
>
> global lock competition too.
>
> So I think if namenode load is low, the GLOBAL branch will have a
>
> better
>
> performance than FGL; If namenode load is high, the FGL branch may
>
> have a
>
> better performance than the GLOBAL, which also depends on the ratio
>
> of read
>
> and write operations on the SNAPSHOT feature.
>
> We can do somethings to let end-user to choose a branch with a
>
> better
>
> branch according to their business:
> First, we need to make the lock mode can be selectable, so that
>
> end-user
>
> can choose to use FGL of GLOBAL.
> Second, using the global write lock to make operations related to
>
> snapshot
>
> thread safe as I described in HDFS-17479.
>
>
> *2. Questions about the Symlinks feature*
> If Symlink is related to snapshot, we can refer to the solution of
>
> the
>
> snapshot;  If Symlink is not related to snapshot, I think it's easy
>
> to meet
>
> the FGL.
> Only createSymlink involves two paths, FGL just need to lock them
>
> in the
>
> order to make this operation thread. For other operations, it is
>
> the same
>
> as other normal iNode, right?
>
> If I missed difficult points, please let me know.
>
>
> *3. Questions about Memory Usage of iNode locks*
> I think there are too many solutions to limit the memory usage of
>
> these
>
> iNode locks, such as: Using a limit capacity lock pool to ensure the
> maximum memory usage,  Just holding iNode locks for fixed depth of
> directories, etc.
>
> We can just abstract this LockManager first and then support its
> implementation with different ideas, so that we can limit the
>
> maximum
>
> memory usage of these iNode locks.
> FGL can acquire or lease iNode locks through LockManager.
>
>
> *4. Questions about Performance of acquiring and releasing iNode
>
> locks*
>
> We can add some benchmark for LockManager, to test the performance
>
> or
>
> acquire and release unblocked locks.
>
>
> *5. Questions about StoragePolicy, ECPolicy, ACL, Quota, etc.*
> These policies may be sot on an ancestor node and used by some
>
> children
>
> files.  The set operation for these policies will be protected by
>
> the
>
> directory tree, since there are all file-related operations.  In
>
> addition
>
> to Quota and StoragePolicy, the use of other policies will also be
> protected by directory tree, such as ECPolicy and ACL.
>
> Quota is a little special since its update operations may not be
>
> protected
>
> by the directory tree, we can assign a locks to each QuotaFeature
>
> and use
>
> these locks to make updating operations thread safe. you can refer
>
> to
>
> HDFS-17473 <https://issues.apache.org/jira/browse/HDFS-17473> to
>
> get some
>
> detailed information.
>
> StoragePolicy is a little special since it is used not only by
>
> file-related
>
> operations but also block-related operations.
>
> ProcessExtraRedundancyBlock
>
> uses storage policy to choose redundancy replicas and
> BlockReconstructionWork uses storage policy to choose target DNs.
>
> In order
>
> to maximize the performance improvement, BR and IBR should only
>
> involve the
>
> iNodeFile to which the current processing block belongs. These
>
> redundancy
>
> blocks can be processed by the Redundancy monitor while holding the
> directory tree locks. You can refer to HDFS-17505
> <https://issues.apache.org/jira/browse/HDFS-17505> to get more
>
> detailed
>
> informations.
>
> *6. Performance of the phase 1*
> HDFS-17506 <https://issues.apache.org/jira/browse/HDFS-17506> is
>
> used to do
>
> some performance testing for phase 1, and I will complete it later.
>
>
> Discuss solution through mails is not efficient, you can create one
> sub-tasks under HDFS-17366
> <https://issues.apache.org/jira/browse/HDFS-17366> to describe your
> concerns and I will try to give some answers.
>
> Thanks @Ayush Saxena <[email protected]>  and @Xiaoqiao He
> <[email protected]> again.
>
>
>
> On Mon, 29 Apr 2024 at 02:00, Ayush Saxena <[email protected]>
>
> wrote:
>
>
> Thanx Everyone for chasing this, Great to see some momentum
>
> around FGL,
>
> that should be a great improvement.
>
> I have some two broad categories:
> ** About the process:*
> I think in the above mails, there are mentions that phase one is
>
> complete
>
> in a feature branch & we are gonna merge that to trunk. If I am
>
> catching it
>
> right, then you can't hit the merge button like that. To merge a
>
> feature
>
> branch. You need to call for a Vote specific to that branch & it
>
> requires 3
>
> binding votes to merge, unlike any other code change which
>
> requires 1. It
>
> is there in our Bylaws.
>
> So, do follow the process.
>
> ** About the feature itself:* (A very quick look at the doc and
>
> the Jira,
>
> so please take it with a grain of salt)
> * The Google Drive link that you folks shared as part of the
>
> first mail. I
>
> don't have access to that. So, please open up the permissions for
>
> that doc
>
> or share the new link
> * Chasing the design doc present on the Jira
> * I think we only have Phase-1 ready, so can you share some
>
> metrics just
>
> for that? Perf improvements just with splitting the FS & BM Locks
> * The memory implications of Phase-1? I don't think there should
>
> be any
>
> major impact on the memory in case of just phase-1
> * Regarding the snapshot stuff, you mentioned taking lock on the
>
> root
>
> itself? Does just taking lock on the snapshot root rather than
>
> the FS root
>
> works?
> * Secondly about the usage of Snapshot or Symlinks, I don't think
>
> we
>
> should operate under the assumptions that they aren't widely used
>
> or not,
>
> we might just not know folks who don't use it widely or they are
>
> just users
>
> not the ones contributing. We can just accept for now, that in
>
> those cases
>
> it isn't optimised and we just lock the entire FS space, which it
>
> does even
>
> today, so no regressions there.
> * Regarding memory usage: Do you have some numbers on how much
>
> the memory
>
> footprint increases?
> * Under the Lock Pool: I think you are assuming there would be
>
> very few
>
> inodes where lock would be required at any given time, so there
>
> won't be
>
> too much heap consumption? I think you are compromising on the
>
> Horizontal
>
> Scalability here. I doubt if your assumption doesn't hold true,
>
> under heavy
>
> read load by concurrent clients accessing different inodes, the
>
> Namenode
>
> will start giving memory troubles, that would do more harm than
>
> good.
>
> Anyway Namenode heap is way bigger problem than anything, so we
>
> should be
>
> very careful increasing load over there.
> * For the Locks on the inodes: Do you plan to have locs for each
>
> inode?
>
> Can we somehow limit that to the depth of the tree? Like
>
> currently we take
>
> lock on the root, have a config which makes us take lock at
>
> Level-2 or 3
>
> (configurable), that might fetch some perf benefits and can be
>
> used to
>
> control the memory usage as well?
> * What is the cost of creating these inode locks? If the lock
>
> isn't
>
> already cached it would incur some cost? Do you have some numbers
>
> around
>
> that? Say I disable caching altogether & then let a test load
>
> run, what
>
> does the perf numbers look like in that case
> * I think we need to limit the size of INodeLockPool, we can't
>
> let it grow
>
> infinitely in case of heavy loads and we need to have some auto
> throttling mechanism for it
> * I didn't catch your Storage Policy problem. If I decode it
>
> right, the
>
> problem is like the policy could be set on an ancestor node & the
>
> children
>
> abide by that & this is the problem, if that is the case then
>
> isn't that
>
> the case with ErasureCoding policies or even ACLs or so? Can you
>
> elaborate
>
> a bit on that.
>
>
> Anyway, regarding the Phase-1. If you share (the perf numbers
>
> with proper
>
> details + Impact on memory if any) for just phase 1 & if they are
>
> good,
>
> then if you call for a branch merge vote for Phase-1 FGL, you
>
> have my vote,
>
> however you'll need to sway the rest of the folks on your own :-)
>
> Good Luck, Nice Work Guys!!!
>
> -Ayush
>
>
> On Sun, 28 Apr 2024 at 18:32, Xiaoqiao He <[email protected]>
>
> wrote:
>
>
> Thanks ZanderXu and Hui Fei for your work on this feature. It
>
> will be
>
> a very helpful improvement for the HDFS module in the next
>
> journal.
>
>
> 1. If we need any more review bandwidth, I would like to be
>
> involved
>
> to help review if possible.
> 2. From the design document there are still missing some detailed
> descriptions such as snapshot, symbolic link and reserved etc as
>
> mentioned
>
> above. I think it will be helpful for newbies who want to be
>
> involved
>
> if all corner
> cases are considered and described.
> 3. From slack, we plan to check into the trunk at this phase. I
>
> am not
>
> sure
> If it is the proper time, following the dev plan there are two
>
> steps left
>
> to
> finish this feature from the design document, right? If that, I
>
> think we
>
> should
> postpone checking in when all plans are ready. Considering that
>
> there are
>
> many unfinished tries for this feature in history, I think
>
> postpone
>
> checking
> will be the safe way, another way it will involve more rebase
>
> cost if you
>
> keep
> separate dev branch, however I think It is not one difficult
>
> thing for
>
> you.
>
> Good luck and look forward to making that happen soon!
>
> Best Regards,
> - He Xiaoqiao
>
> On Fri, Apr 26, 2024 at 3:50 PM Hui Fei <[email protected]>
>
> wrote:
>
>
> Thanks for interest and advice on this.
>
> Just would like to share some info here
>
> ZanderXu leads this feature and he has spent a lot of time on
>
> it. He is
>
> the main developer in stage 1.  Yuanboliu and Kokonguyen191 also
>
> took some
>
> tasks. Other developers (slfan1989 haiyang1987 huangzhaobo99
>
> RocMarshal
>
> kokonguyen191) helped review PRs. (Forgive me if I missed
>
> someone)
>
>
> Actually haiyang1987, Yuanboliu and Kokonguyen191 are also very
>
> familiar with this feature. We discussed many details offline.
>
>
> Welcome to more people interested in joining the development
>
> and review
>
> of the stage 2 and 3.
>
>
>
> Zengqiang XU <[email protected]> 于2024年4月26日周五
>
> 14:56写道：
>
>
> Thanks Shilun for your response:
>
> 1. This is a big and very useful feature, so it really needs
>
> more
>
> developers to get on board.
> 2. This fine grained lock has been implemented based on
>
> internal
>
> branches
>
> and has gained benefits by many companies, such as: Meituan,
>
> Kuaishou,
>
> Bytedance, etc.  But it has not been contributed to the
>
> community due
>
> to
>
> various reasons, such as there is a big difference between
>
> the version
>
> of
>
> the internal branch and the community trunk branch, the
>
> internal
>
> branch may
>
> ignore some functions to make FGL clear, and the contribution
>
> needs a
>
> lot
>
> of work and will take many times. It means that this solution
>
> has
>
> already
>
> been practiced in their prod environment. We have also
>
> practiced it in
>
> our
>
> prod environment and gained benefits, and we are also willing
>
> to spend
>
> a
>
> lot of time contributing to the community.
> 3. Regarding the benchmark testing, we don't need to pay more
>
> attention to
>
> whether the performance is improved by 5 times, 10 times or
>
> 20 times,
>
> because there are too many factors that affect it.
> 4. As I described above, this solution is already  being
>
> practiced by
>
> many
>
> companies. Right now, we just need to think about how to
>
> implement it
>
> with
>
> high quality and more comprehensively.
> 5. I firmly believe that all problems can be solved as long
>
> as the
>
> overall
>
> solution is right.
> 6. I can spend a lot of time leading the promotion of this
>
> entire
>
> feature
>
> and I hope more people can join us in promoting it.
> 7. You are always welcome to raise your concerns.
>
>
> Thanks Shilun again, I hope you can help review designs and
>
> PRs. Thanks
>
>
> On Fri, 26 Apr 2024 at 08:00, slfan1989 <[email protected]>
>
> wrote:
>
>
> Thank you for your hard work! This is a very meaningful
>
> improvement,
>
> and
>
> from the design document, we can see a significant increase
>
> in HDFS
>
> read/write throughput.
>
> I am happy to see the progress made on HDFS-17384.
>
> However, I still have some concerns, which roughly involve
>
> the
>
> following
>
> aspects:
>
> 1. While ZanderXu and Hui Fei have deep expertise in HDFS
>
> and are
>
> familiar
>
> with related development details, we still need more
>
> community
>
> member to
>
> review the code to ensure that the relevant upgrades meet
>
> expectations.
>
>
> 2. We need more details on benchmarks to ensure that test
>
> results
>
> can be
>
> reproduced and to allow more community member to
>
> participate in the
>
> testing
>
> process.
>
> Looking forward to everything going smoothly in the future.
>
> Best Regards,
> - Shilun Fan.
>
> On Wed, Apr 24, 2024 at 3:51 PM Xiaoqiao He <
>
> [email protected]>
>
> wrote:
>
>
> cc [email protected].
>
> On Wed, Apr 24, 2024 at 3:35 PM ZanderXu <
>
> [email protected]>
>
> wrote:
>
>
> Here are some summaries about the first phase:
> 1. There are no big changes in this phase
> 2. This phase just uses FS lock and BM lock to replace
>
> the
>
> original
>
> global
>
> lock
> 3. It's useful to improve the performance, since some
>
> operations
>
> just
>
> need
>
> to hold FS lock or BM lock instead of the global lock
> 4. This feature is turned off by default, you can enable
>
> it by
>
> setting
>
> dfs.namenode.lock.model.provider.class to
>
>
> org.apache.hadoop.hdfs.server.namenode.fgl.FineGrainedFSNamesystemLock
>
> 5. This phase is very import for the ongoing development
>
> of the
>
> entire
>
> FGL
>
>
> Here I would like to express my special thanks to
>
> @kokonguyen191
>
> and
>
> @yuanboliu for their contributions.  And you are also
>
> welcome to
>
> join us
>
> and complete it together.
>
>
> On Wed, 24 Apr 2024 at 14:54, ZanderXu <
>
> [email protected]>
>
> wrote:
>
>
> Hi everyone
>
> All subtasks of the first phase of the FGL have been
>
> completed
>
> and I
>
> plan
>
> to merge them into the trunk and start the second
>
> phase based
>
> on the
>
> trunk.
>
>
> Here is the PR that used to merge the first phases
>
> into trunk:
>
> https://github.com/apache/hadoop/pull/6762
> Here is the ticket:
>
> https://issues.apache.org/jira/browse/HDFS-17384
>
>
> I hope you can help to review this PR when you are
>
> available
>
> and give
>
> some
>
> ideas.
>
>
> HDFS-17385 <
>
> https://issues.apache.org/jira/browse/HDFS-17385>
>
> is
>
> used for
>
> the second phase and I have created some subtasks to
>
> describe
>
> solutions for
>
> some problems, such as: snapshot, getListing, quota.
> You are welcome to join us to complete it together.
>
>
> ---------- Forwarded message ---------
> From: Zengqiang XU <[email protected]>
> Date: Fri, 2 Feb 2024 at 11:07
> Subject: Discussion about NameNode Fine-grained locking
> To: <[email protected]>
> Cc: Zengqiang XU <[email protected]>
>
>
> Hi everyone
>
> I have started a discussion about NameNode
>
> Fine-grained Locking
>
> to
>
> improve
>
> performance of write operations in NameNode.
>
> I started this discussion again for serval main
>
> reasons:
>
> 1. We have implemented it and gained nearly 7x
>
> performance
>
> improvement in
>
> our prod environment
> 2. Many other companies made similar improvements
>
> based on their
>
> internal
>
> branch.
> 3. This topic has been discussed for a long time, but
>
> still
>
> without
>
> any
>
> results.
>
> I hope we can push this important improvement in the
>
> community
>
> so
>
> that all
>
> end-users can enjoy this significant improvement.
>
> I'd really appreciate you can join in and work with me
>
> to push
>
> this
>
> feature forward.
>
> Thanks very much.
>
> Ticket: HDFS-17366 <
>
> https://issues.apache.org/jira/browse/HDFS-17366>
>
> Design: NameNode Fine-grained locking based on
>
> directory tree
>
> <
>
>
>
>
> https://docs.google.com/document/d/1X499gHxT0WSU1fj8uo4RuF3GqKxWkWXznXx4tspTBLY/edit?usp=sharing
>
>
>
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail:
>
> [email protected]
>
> For additional commands, e-mail:
>
> [email protected]
>
>
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>

Re: Discussion about NameNode Fine-grained locking

Reply via email to