Re: [PR] HDDS-12779. Parallelize table iteration across different ranges [ozone]

via GitHub Wed, 09 Apr 2025 09:39:07 -0700


szetszwo commented on PR #8243:
URL: https://github.com/apache/ozone/pull/8243#issuecomment-2790268291


   > ... Thus indirectly the table would be iterated based on the number of sst 
files as there are on the DB. ...
   
   What are the assumptions of the performance improvement? 
   - Reading multiple files using multiple threads is faster than using a 
single thread?
   
   We have 
   - iterator reading files (rocksdb) -> processing the entries (Ozone)
   
   Instead of having multi-thread reading files, it is better to have 
multi-thread processing data.
   
   Rocksdb itself is already very good for parallelism.  It is unlikely Ozone 
could use the internal details in rocksdb to improve the performance.  Also, 
Ozone should use only the public APIs in Rocksdb. It is hard to maintain such 
code.  It may even causes data corruption silently.
   
   BTW, you may consider parallelizing your pull requests -- having multiple 
small PRs instead of having a single large PR.  Then, different people can 
review different PRs at the same time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-12779. Parallelize table iteration across different ranges [ozone]

Reply via email to