Re: [PR] HDDS-7544. ContainerStateMap uses HashSet to store container replica lists [ozone]

via GitHub Tue, 02 Sep 2025 20:32:04 -0700


cchung100m commented on PR #8990:
URL: https://github.com/apache/ozone/pull/8990#issuecomment-3247540277


   Hi @sodonnel 
   
   Thank you so much for the thoughtful and detailed feedback.
   
   You're right regarding the replica count — the original Jira description was 
inaccurate in stating a max of 3 replicas. As you pointed out, in the case of 
EC 10-4, it can indeed go up to 14 or more with over-replication. I've updated 
the related code to reflect this more accurately.
   
   I also agree with your concern about the performance impact when switching 
from a hash-based lookup to list iteration. To address this, I’ve revised the 
implementation to use a HashSet for replica existence checks, restoring 
constant-time lookup. This should help preserve efficiency, especially during 
container report processing and dead node handling.
   
   Regarding the initial size of the HashSet, that's a great suggestion. I've 
adjusted the default initial capacity to 8, which seems like a good balance to 
reduce memory overhead without significantly impacting performance.
   
   Thanks again for the great feedback — it’s helped make the solution cleaner 
and more robust. Let me know if you have any other thoughts!
   
   > The original Jira states that the set can hold 3 replicas max, which is 
not true. In the case of EC 10-4, it will hold 14 replicas. If there is over 
replication it could even be a few more than that.
   > 
   > This change may be fine, but my initial thought is that we are going from 
a hash lookup to check for existence of a replica for additional or removal 
(container report processing, or dead node handling) to a list iteration, and 
therefore the runtime complexity is higher. Could this slow down container 
report processing?
   > 
   > Do we have any option to reduce the initial hashSet initial size to 6 or 8 
to waste less memory?
   > 
   > I also haven't check where this code is called, but its unfortunate that a 
new hashSet needs to be constructed each time the replicas are retrieved. I 
understand why this is the case, as the list could otherwise be modified by 
container reporting or dead node handling after the copy has been retrieved. 
However, it would be relatively rare for the replicas for a container to change 
after the cluster has reached a stead state. Perhaps we can make the entries an 
un-modifiable set / list that can be returned to the caller directly, and then 
if the set / list needs to changed a new un-modifiable copy is created with the 
change to replace the original.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-7544. ContainerStateMap uses HashSet to store container replica lists [ozone]

Reply via email to