PeterZh6 opened a new pull request, #11206: URL: https://github.com/apache/inlong/pull/11206
Fixes [[Feature][Sort] Enhanced sink metric instrumentation for InLong Sort Flink Connector #11201](https://github.com/apache/inlong/issues/11201) ### Motivation This PR aims to enhance sink metric instrumentation for the InLong Sort Flink Connector, particularly the StarRocks Connector. The enhancements are designed to improve observability by adding metrics that track serialization, snapshot states, and checkpoint completion. ### Modifications **This feature focuses on Sink Metrics only** **Serialization Metrics:** - Added counters for successful and failed serialization attempts: - `numSerializeSuccess` - `numSerializeError` - Introduced a latency gauge to measure the time taken for serialization: - `serializeTimeLag` **Snapshot State Metrics:** - Added counters for: - Number of snapshots created: `numSnapshotCreate` - Errors encountered during snapshot operations: `numSnapshotError` **NotifyComplete Metrics:** - Introduced a counter for completed snapshots: - `numSnapshotComplete` - Added a latency gauge to measure the time from snapshot creation attempt to completion: - `snapshotToCheckpointTimeLag` ### Verifying this change *(Please pick either of the following options)* - [ ] This change is a trivial rework/code cleanup without any test coverage. - [ ] This change is already covered by existing tests, such as: *(please describe tests)* - [x] This change added tests and can be verified as follows: `sort-end-to-end-tests-v1.15` **The result is shown in the screenshot, with the metrics marked in red** <details> <summary>Click To View Image</summary> <img width="1280" alt="starrocks_metric_demo" src="https://github.com/user-attachments/assets/5c193462-4a7a-42cc-84c9-48c06e3cb4e0"> </details> **Preparation:** To enable self-defined metrics, one has to add `inlong.metric.labels` in the `inlong-sort/sort-end-to-end-tests/sort-end-to-end-tests-v1.15/src/test/resources/flinkSql` **Method 1:** Add `while(true){}` code snippet to the end of any test under `sort-end-to-end-test-v1.15` that involves `starrocks` connnector. For example, in `Postgres2StarRocksTest`, simply remove check result code and add a while loop at the end. And then visit `localhost:8081`, which is Flink Web Dashboard, and find Metrics column. ```java @Test public void testPostgresUpdateAndDelete() throws Exception { // test logic omitted... // Infinite loop to prevent container teardown while (true) {} // result checking part is unnecessary here } ``` **Method 2:** - Configure Flink `taskmanager` to report metrics using the Slf4jReporter. Add the following to `conf/flink-conf.yaml`: ```properties metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter metrics.reporter.slf4j.interval: 15 SECONDS ``` ### Documentation - Does this pull request introduce a new feature? **Yes** - If yes, how is the feature documented? **JavaDocs** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@inlong.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org