PeterZh6 opened a new pull request, #11206:
URL: https://github.com/apache/inlong/pull/11206

   Fixes [[Feature][Sort] Enhanced sink metric instrumentation for InLong Sort 
Flink Connector #11201](https://github.com/apache/inlong/issues/11201) 
   
   ### Motivation
   
   This PR aims to enhance sink metric instrumentation for the InLong Sort 
Flink Connector, particularly the StarRocks Connector. The enhancements are 
designed to improve observability by adding metrics that track serialization, 
snapshot states, and checkpoint completion.
   
   ### Modifications
   **This feature focuses on Sink Metrics only**
   
   **Serialization Metrics:**
   - Added counters for successful and failed serialization attempts: 
     - `numSerializeSuccess`
     - `numSerializeError`
   - Introduced a latency gauge to measure the time taken for serialization: 
     - `serializeTimeLag`
   
   **Snapshot State Metrics:**
   - Added counters for:
     - Number of snapshots created: `numSnapshotCreate`
     - Errors encountered during snapshot operations: `numSnapshotError`
   
   **NotifyComplete Metrics:**
   - Introduced a counter for completed snapshots: 
     - `numSnapshotComplete`
   - Added a latency gauge to measure the time from snapshot creation attempt 
to completion: 
     - `snapshotToCheckpointTimeLag`
   
   
   ### Verifying this change
   
   *(Please pick either of the following options)*
   
   - [ ] This change is a trivial rework/code cleanup without any test coverage.
   
   - [ ] This change is already covered by existing tests, such as:
     *(please describe tests)*
   
   - [x] This change added tests and can be verified as follows:
   `sort-end-to-end-tests-v1.15`
   
   **The result is shown in the screenshot, with the metrics marked in red**
   <details>
   <summary>Click To View Image</summary>
   <img width="1280" alt="starrocks_metric_demo" 
src="https://github.com/user-attachments/assets/5c193462-4a7a-42cc-84c9-48c06e3cb4e0";>
   </details>
   
   **Preparation:**
   To enable self-defined metrics, one has to add `inlong.metric.labels` in the 
`inlong-sort/sort-end-to-end-tests/sort-end-to-end-tests-v1.15/src/test/resources/flinkSql`
   **Method 1:**
   Add `while(true){}` code snippet to the end of any test under 
`sort-end-to-end-test-v1.15` that involves `starrocks` connnector. For example, 
in `Postgres2StarRocksTest`, simply remove check result code and add a while 
loop at the end. And then visit `localhost:8081`, which is Flink Web Dashboard, 
and find Metrics column.
   ```java
   @Test
   public void testPostgresUpdateAndDelete() throws Exception {
       // test logic omitted...
   
       // Infinite loop to prevent container teardown
       while (true) {}
      // result checking part is unnecessary here
   }
   ```
   
   **Method 2:**
   - Configure Flink `taskmanager` to report metrics using the Slf4jReporter. 
Add the following to `conf/flink-conf.yaml`:
   
   ```properties
   metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter 
   metrics.reporter.slf4j.interval: 15 SECONDS
   ```
   
   
   ### Documentation
   
     - Does this pull request introduce a new feature? **Yes**
     - If yes, how is the feature documented? **JavaDocs**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@inlong.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to