[PR] [INLONG-11923][Audit] Add the function of parsing Audit's basic indicator data from the database and reporting it to prometheus [inlong]

via GitHub Sat, 06 Sep 2025 04:39:53 -0700


lsq888lsq opened a new pull request, #11986:
URL: https://github.com/apache/inlong/pull/11986


   [INLONG-11923][Audit] Add the function of parsing Audit's basic indicator 
data from the database and reporting it to prometheus
   
   Fixes #11923
   
   ### Motivation
   
   Currently, Apache InLong lacks a real-time mechanism for collecting and 
exposing internal performance metrics. This gap makes it difficult to monitor 
pipeline health, locate faults, and optimize performance. This feature adds 
comprehensive collection and reporting of 26 core metrics, enabling:
   - Automatic collection of component metrics at configurable intervals
   - Multi-dimensional measurement (count, size, latency, drop/loss rates)
   - Real-time reporting to external systems like Prometheus
   
   ### Modifications
   
   **1. Metrics Collection Layer**
   - Added ` InlongAuditMetricsManager`  to configure basic metric data related 
to the audit component
   -  Added ` InlongApiMetricsManager`  to configure basic metric data related 
to the api component
   - Added ` InlongAgentMetricsManager`  to configure basic metric data related 
to the agent component
   - Added ` InlongDataproxyMetricsManager`  to configure basic metric data 
related to the dataproxy component
   
   **2. Core Service Layer**
   - Implement `BaseMetricReporter` as the core computing engine responsible 
for collecting, analyzing, and reporting metrics
   
   **3. Metrics Configuration**
   - Audit Module Metrics (InlongAuditMetricsManager)
     - inlong_audit_log_message_num_interval: Number of audit logs in interval
     - inlong_audit_log_message_size_interval: Size of audit logs in interval  
     - inlong_audit_log_message_avg_delay_interval: Average delay of audit logs
   
   - API Module Metrics (InlongApiMetricsManager)
     - inlong_api_receive_num_interval: API receive message count
     - inlong_api_receive_size_interval: API receive message size
     - inlong_api_receive_avg_delay_interval: API receive average delay
     - inlong_api_send_num_interval: API send message count  
     - inlong_api_send_size_interval: API send message size
     - inlong_api_send_avg_delay_interval: API send average delay
     - inlong_api_abandon_rate_interval: API message abandon rate
   
   - Agent Module Metrics (InlongAgentMetricsManager)
     - inlong_agent_receive_num_interval: Agent receive message count
     - inlong_agent_receive_size_interval: Agent receive message size
     - inlong_agent_receive_avg_delay_interval: Agent receive average delay
     - inlong_agent_send_num_interval: Agent send message count
     - inlong_agent_send_size_interval: Agent send message size  
     - inlong_agent_send_avg_delay_interval: Agent send average delay
     - inlong_agent_abandon_rate_interval: Agent message abandon rate
     - inlong_agent_loss_rate_interval: Agent message loss rate
   
   - Dataproxy Module Metrics (InlongDataproxyMetricsManager)
     - inlong_dataproxy_receive_num_interval: Dataproxy receive message count
     - inlong_dataproxy_receive_size_interval: Dataproxy receive message size
     - inlong_dataproxy_receive_avg_delay_interval: Dataproxy receive average 
delay
     - inlong_dataproxy_send_num_interval: Dataproxy send message count
     - inlong_dataproxy_send_size_interval: Dataproxy send message size
     - inlong_dataproxy_send_avg_delay_interval: Dataproxy send average delay  
     - inlong_dataproxy_abandon_rate_interval: Dataproxy message abandon rate
     - inlong_dataproxy_loss_rate_interval: Dataproxy message loss rate
   
   **4. Prometheus Integration**
   - Integrated Prometheus client library
   - Built comprehensive Gauge metrics for each component
   - Dynamic help messages with time intervals
   
   **5. Data Processing**
   - SQL-based data retrieval from audit database
   - Automated metrics calculation
   
   ### Verifying this change
   
   - [x] This change added tests and can be verified as follows:
   
   **Unit Tests:**
   - Added `AuditAlertRuleTest` with comprehensive coverage
   - Tested CRUD operations and lifecycle management
   - Validated error handling and edge cases
   
   **Test Coverage:**
   - Data layer: MyBatis and SQL validation
   - Error handling: Input validation scenarios
   
   **Note:** 
   The basic metric reporting feature code (under the basemetric package) and 
related test code were independently designed and developed by me. The 
remaining functional code was completed through team collaboration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [INLONG-11923][Audit] Add the function of parsing Audit's basic indicator data from the database and reporting it to prometheus [inlong]

Reply via email to