codelipenghui opened a new pull request, #24716:
URL: https://github.com/apache/pulsar/pull/24716

   ## Summary
   
   This PIP proposes adding two broker-level metrics to provide essential 
visibility into non-recoverable data skipping when `autoSkipNonRecoverableData` 
is enabled.
   
   ## New Metrics
   
   - `pulsar_broker_non_recoverable_ledgers_skipped_total` - Count of entire 
ledgers skipped
   - `pulsar_broker_non_recoverable_entries_skipped_total` - Count of 
individual entries skipped
   
   ## Motivation
   
   Currently, there is no visibility when Pulsar skips non-recoverable data 
during disaster recovery scenarios. This creates operational blind spots where:
   
   - Operators cannot be alerted when data loss occurs
   - No audit trail exists for compliance requirements
   - Cannot distinguish between healthy systems and those silently losing data
   - Unable to determine if issues are systematic (ledger-level) or localized 
(entry-level)
   
   ## Implementation
   
   - Adds counters to `BrokerOperabilityMetrics` class
   - Integrates with existing skip methods:
     - `ManagedLedgerImpl.skipNonRecoverableLedger()` → ledger metric
     - `ManagedCursorImpl.skipNonRecoverableEntries()` → entry metric
   - Supports both Prometheus and OpenTelemetry formats
   - Broker-level approach avoids high-cardinality burden on metrics system
   
   ## Operational Benefits
   
   - **Alerting**: Get notified immediately when data loss occurs
   - **SLA Monitoring**: Track data durability metrics over time  
   - **Root Cause Analysis**: Compare metrics to understand failure patterns
   - **Investigation Workflow**: Use metrics for alerting, then check broker 
logs for specific topic details
   
   ## Design Rationale
   
   - **Broker-level vs Topic-level**: Avoids metrics system burden while 
maintaining essential visibility
   - **Two separate metrics**: Provides granular insight into different types 
of data corruption
   - **Log-based investigation**: Balances alerting capability with detailed 
forensics
   
   🤖 Generated with [Claude Code](https://claude.ai/code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to