Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20986 )
Change subject: IMPALA-12782: Show info of the event processing in /events webUI ...................................................................... Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/20986/2/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: http://gerrit.cloudera.org:8080/#/c/20986/2/common/thrift/JniCatalog.thrift@1066 PS2, Line 1066: } > is_current_event_batched doesn't seem to be used ? Nice catch! I planned to show the MetastoreEvent at the beginning but decided to leave it as a follow-up task. So currently only the original HMS event is shown. We can delete this field. http://gerrit.cloudera.org:8080/#/c/20986/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/20986/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1130 PS2, Line 1130: progressInfo.max_event_time_s = lastEvent.getEventTime(); > currentEventBatch_, currentFilteredEvents_, currentBatchStartTimeMs_, curre Yeah, I should have mentioned this somewhere. There are costs to provide consistency in metrics. E.g. if we use a mutex (synchronized) to protect these fields, the web requests could block the event processor. We already have inconsistent metrics, e.g. lastSyncedEventId_ and lastSyncedEventTimeSecs_ could be inconsistent since the updates on them are not atomic. So currently we are best effort to provide consistency on metrics but cannot garuantee it. E.g. we garuantee the consistency between num_hms_events, min_event_*, max_event* by using the snapshot of currentEventBatch_, But they could be inconsistent with other metrics like num_synced_events. I added some codes to reset these fields to reduce the possibility of seeing inconsistent metrics. Another way to fix this is wrapping these metrics into a snapshot class and create new instance whenever any metric changes. I think it doesn't worth adding this complexity since the web page is mainly used by admins when event processing is slow. In such situation, the metrics are also updated slowly. http://gerrit.cloudera.org:8080/#/c/20986/2/www/events.tmpl File www/events.tmpl: http://gerrit.cloudera.org:8080/#/c/20986/2/www/events.tmpl@53 PS2, Line 53: Started processing the current batch at {{progress-info.start_time}} ({{progress-info.elapsed_time}} ago).</br> > Nit: does 'Started processing' sound good? Done http://gerrit.cloudera.org:8080/#/c/20986/2/www/events.tmpl@54 PS2, Line 54: Started processing the current event at {{progress-info.start_time_of_event}} ({{progress-info.elapsed_time_current_event}} ago). > Nit: Should we also show the current event ID here? It's shown in the table below. Maybe it's ok to skip it here. http://gerrit.cloudera.org:8080/#/c/20986/2/www/events.tmpl@59 PS2, Line 59: {{/progress-info.current_event_batch_size}} > nit: Can we use the same style in both headers ? Event ID or EventId, Event Done -- To view, visit http://gerrit.cloudera.org:8080/20986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2e7d4952c7fd04ae89b6751204499bf9dd99f57c Gerrit-Change-Number: 20986 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Tue, 20 Feb 2024 12:14:54 +0000 Gerrit-HasComments: Yes
