devabhishekpal commented on code in PR #7835:
URL: https://github.com/apache/ozone/pull/7835#discussion_r1959134820
##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java:
##########
@@ -87,45 +96,49 @@ public ContainerKeyMapperTask(ReconContainerMetadataManager
@Override
public Pair<String, Boolean> reprocess(OMMetadataManager omMetadataManager) {
long omKeyCount = 0;
-
+ metrics.incrTaskReprocessCount();
// In-memory maps for fast look up and batch write
// (container, key) -> count
Map<ContainerKeyPrefix, Integer> containerKeyMap = new HashMap<>();
// containerId -> key count
Map<Long, Long> containerKeyCountMap = new HashMap<>();
try {
LOG.debug("Starting a 'reprocess' run of ContainerKeyMapperTask.");
- Instant start = Instant.now();
// initialize new container DB
reconContainerMetadataManager
.reinitWithNewContainerDataFromOm(new HashMap<>());
// loop over both key table and file table
- for (BucketLayout layout : Arrays.asList(BucketLayout.LEGACY,
- BucketLayout.FILE_SYSTEM_OPTIMIZED)) {
- // (HDDS-8580) Since "reprocess" iterate over the whole key table,
- // containerKeyMap needs to be incrementally flushed to DB based on
- // configured batch threshold.
- // containerKeyCountMap can be flushed at the end since the number
- // of containers in a cluster will not have significant memory
overhead.
- Table<String, OmKeyInfo> omKeyInfoTable =
- omMetadataManager.getKeyTable(layout);
- try (
- TableIterator<String, ? extends Table.KeyValue<String, OmKeyInfo>>
- keyIter = omKeyInfoTable.iterator()) {
- while (keyIter.hasNext()) {
- Table.KeyValue<String, OmKeyInfo> kv = keyIter.next();
- OmKeyInfo omKeyInfo = kv.getValue();
- handleKeyReprocess(kv.getKey(), omKeyInfo, containerKeyMap,
- containerKeyCountMap);
- if (!checkAndCallFlushToDB(containerKeyMap)) {
- LOG.error("Unable to flush containerKey information to the DB");
- return new ImmutablePair<>(getTaskName(), false);
+ long startTime = Time.monotonicNow();
+ try {
+ for (BucketLayout layout : Arrays.asList(BucketLayout.LEGACY,
+ BucketLayout.FILE_SYSTEM_OPTIMIZED)) {
+ // (HDDS-8580) Since "reprocess" iterate over the whole key table,
+ // containerKeyMap needs to be incrementally flushed to DB based on
+ // configured batch threshold.
+ // containerKeyCountMap can be flushed at the end since the number
+ // of containers in a cluster will not have significant memory
overhead.
+ Table<String, OmKeyInfo> omKeyInfoTable =
+ omMetadataManager.getKeyTable(layout);
+ try (
+ TableIterator<String, ? extends Table.KeyValue<String,
OmKeyInfo>>
+ keyIter = omKeyInfoTable.iterator()) {
+ while (keyIter.hasNext()) {
+ Table.KeyValue<String, OmKeyInfo> kv = keyIter.next();
+ OmKeyInfo omKeyInfo = kv.getValue();
+ handleKeyReprocess(kv.getKey(), omKeyInfo, containerKeyMap,
+ containerKeyCountMap);
+ if (!checkAndCallFlushToDB(containerKeyMap)) {
+ LOG.error("Unable to flush containerKey information to the
DB");
+ return new ImmutablePair<>(getTaskName(), false);
+ }
+ omKeyCount++;
}
- omKeyCount++;
}
}
+ } finally {
Review Comment:
So for this:
- Writing to DB is a part of the reprocess task as well, so I think we
should add the DB write latency as a part of the overall time taken for
reprocess. We can figure out the reprocess time taken from the total time taken
to run reprocess() at time `x` and finding difference from the DB write latency
at the same time `x`. But in my opinion if we separate out the DB write latency
then:
- We will have to do something like:
```
long reprocessLatency = 0L;
for (BucketLayout layout : Arrays.asList(BucketLayout.LEGACY,
BucketLayout.FILE_SYSTEM_OPTIMIZED)) {
...
try (
TableIterator<String, ? extends Table.KeyValue<String, OmKeyInfo>>
keyIter = omKeyInfoTable.iterator()) {
while (keyIter.hasNext()) {
...
long startTime = Time.monotonicNow();
handleKeyReprocess(kv.getKey(), omKeyInfo, containerKeyMap,
containerKeyCountMap);
reprocessLatency += Time.monotonicNow() - startTime;
// DB flush part
if (!checkAndCallFlushToDB(containerKeyMap)) {
...
}
...
}
}
}
...
metrics.updateTaskReprocessLatency(reprocessLatency);
```
But this add extra code for calculating the `handleKeyReprocess()`
duration.
- For the point of capturing the final flush it is correct, but then we
might miss out scenarios like some flush taking longer time. Say we flushed the
data three times, out of which for some reason the 2nd flush took a lot longer.
If we only track the final flush we will miss this information. Same for if
we aggregate the total time taken for all the flushes. In aggregation we can do
something like the above where we keep adding the flush latency to a variable
and then write the final sum but we will miss if some intermediate flush took a
lot longer than others.
Inputs on this would be great
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]