Re: [PR] HDDS-12223. Add metrics for OzoneManager SPI and OMTask in Recon [ozone]

via GitHub Mon, 17 Feb 2025 22:21:54 -0800


devabhishekpal commented on code in PR #7835:
URL: https://github.com/apache/ozone/pull/7835#discussion_r1959134820



##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java:
##########
@@ -87,45 +96,49 @@ public ContainerKeyMapperTask(ReconContainerMetadataManager
   @Override
   public Pair<String, Boolean> reprocess(OMMetadataManager omMetadataManager) {
     long omKeyCount = 0;
-
+    metrics.incrTaskReprocessCount();
     // In-memory maps for fast look up and batch write
     // (container, key) -> count
     Map<ContainerKeyPrefix, Integer> containerKeyMap = new HashMap<>();
     // containerId -> key count
     Map<Long, Long> containerKeyCountMap = new HashMap<>();
     try {
       LOG.debug("Starting a 'reprocess' run of ContainerKeyMapperTask.");
-      Instant start = Instant.now();
 
       // initialize new container DB
       reconContainerMetadataManager
               .reinitWithNewContainerDataFromOm(new HashMap<>());
 
       // loop over both key table and file table
-      for (BucketLayout layout : Arrays.asList(BucketLayout.LEGACY,
-          BucketLayout.FILE_SYSTEM_OPTIMIZED)) {
-        // (HDDS-8580) Since "reprocess" iterate over the whole key table,
-        // containerKeyMap needs to be incrementally flushed to DB based on
-        // configured batch threshold.
-        // containerKeyCountMap can be flushed at the end since the number
-        // of containers in a cluster will not have significant memory 
overhead.
-        Table<String, OmKeyInfo> omKeyInfoTable =
-            omMetadataManager.getKeyTable(layout);
-        try (
-            TableIterator<String, ? extends Table.KeyValue<String, OmKeyInfo>>
-                keyIter = omKeyInfoTable.iterator()) {
-          while (keyIter.hasNext()) {
-            Table.KeyValue<String, OmKeyInfo> kv = keyIter.next();
-            OmKeyInfo omKeyInfo = kv.getValue();
-            handleKeyReprocess(kv.getKey(), omKeyInfo, containerKeyMap,
-                containerKeyCountMap);
-            if (!checkAndCallFlushToDB(containerKeyMap)) {
-              LOG.error("Unable to flush containerKey information to the DB");
-              return new ImmutablePair<>(getTaskName(), false);
+      long startTime = Time.monotonicNow();
+      try {
+        for (BucketLayout layout : Arrays.asList(BucketLayout.LEGACY,
+            BucketLayout.FILE_SYSTEM_OPTIMIZED)) {
+          // (HDDS-8580) Since "reprocess" iterate over the whole key table,
+          // containerKeyMap needs to be incrementally flushed to DB based on
+          // configured batch threshold.
+          // containerKeyCountMap can be flushed at the end since the number
+          // of containers in a cluster will not have significant memory 
overhead.
+          Table<String, OmKeyInfo> omKeyInfoTable =
+              omMetadataManager.getKeyTable(layout);
+          try (
+              TableIterator<String, ? extends Table.KeyValue<String, 
OmKeyInfo>>
+                  keyIter = omKeyInfoTable.iterator()) {
+            while (keyIter.hasNext()) {
+              Table.KeyValue<String, OmKeyInfo> kv = keyIter.next();
+              OmKeyInfo omKeyInfo = kv.getValue();
+              handleKeyReprocess(kv.getKey(), omKeyInfo, containerKeyMap,
+                  containerKeyCountMap);
+              if (!checkAndCallFlushToDB(containerKeyMap)) {
+                LOG.error("Unable to flush containerKey information to the 
DB");
+                return new ImmutablePair<>(getTaskName(), false);
+              }
+              omKeyCount++;
             }
-            omKeyCount++;
           }
         }
+      } finally {

Review Comment:
   So for this:
   - Writing to DB is a part of the reprocess task as well, so I think we 
should add the DB write latency as a part of the overall time taken for 
reprocess. We can figure out the reprocess time taken from the total time taken 
to run reprocess() at time `x` and finding difference from the DB write latency 
at the same time `x`. But in my opinion if we separate out the DB write latency 
then:
     - We will have to do something like:
     ```
     long reprocessLatency = 0L;
     for (BucketLayout layout : Arrays.asList(BucketLayout.LEGACY,
                 BucketLayout.FILE_SYSTEM_OPTIMIZED)) {
       ...
       try (
           TableIterator<String, ? extends Table.KeyValue<String, OmKeyInfo>>
               keyIter = omKeyInfoTable.iterator()) {
         while (keyIter.hasNext()) {
           ...
           long startTime = Time.monotonicNow();
           handleKeyReprocess(kv.getKey(), omKeyInfo, containerKeyMap,
               containerKeyCountMap);
           reprocessLatency += Time.monotonicNow() - startTime;
           // DB flush part
           if (!checkAndCallFlushToDB(containerKeyMap)) {
             ...
           }
           ...
         }
       }
     }
     ...
     metrics.updateTaskReprocessLatency(reprocessLatency);
     ```
     But this add extra code for calculating the `handleKeyReprocess()` 
duration.
   
   - For the point of capturing the final flush it is correct, but then we 
might miss out scenarios like some flush taking longer time. Say we flushed the 
data three times, out of which for some reason the 2nd flush took a lot longer.
   If we only track the final flush we will miss this information. Same for if 
we aggregate the total time taken for all the flushes. In aggregation we can do 
something like the above where we keep adding the flush latency to a variable 
and then write the final sum but we will miss if some intermediate flush took a 
lot longer than others.
   
   
   Inputs on this would be great
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-12223. Add metrics for OzoneManager SPI and OMTask in Recon [ozone]

Reply via email to