codelipenghui opened a new pull request, #24716:
URL: https://github.com/apache/pulsar/pull/24716
## Summary
This PIP proposes adding two broker-level metrics to provide essential
visibility into non-recoverable data skipping when `autoSkipNonRecoverableData`
is enabled.
## New Metrics
- `pulsar_broker_non_recoverable_ledgers_skipped_total` - Count of entire
ledgers skipped
- `pulsar_broker_non_recoverable_entries_skipped_total` - Count of
individual entries skipped
## Motivation
Currently, there is no visibility when Pulsar skips non-recoverable data
during disaster recovery scenarios. This creates operational blind spots where:
- Operators cannot be alerted when data loss occurs
- No audit trail exists for compliance requirements
- Cannot distinguish between healthy systems and those silently losing data
- Unable to determine if issues are systematic (ledger-level) or localized
(entry-level)
## Implementation
- Adds counters to `BrokerOperabilityMetrics` class
- Integrates with existing skip methods:
- `ManagedLedgerImpl.skipNonRecoverableLedger()` → ledger metric
- `ManagedCursorImpl.skipNonRecoverableEntries()` → entry metric
- Supports both Prometheus and OpenTelemetry formats
- Broker-level approach avoids high-cardinality burden on metrics system
## Operational Benefits
- **Alerting**: Get notified immediately when data loss occurs
- **SLA Monitoring**: Track data durability metrics over time
- **Root Cause Analysis**: Compare metrics to understand failure patterns
- **Investigation Workflow**: Use metrics for alerting, then check broker
logs for specific topic details
## Design Rationale
- **Broker-level vs Topic-level**: Avoids metrics system burden while
maintaining essential visibility
- **Two separate metrics**: Provides granular insight into different types
of data corruption
- **Log-based investigation**: Balances alerting capability with detailed
forensics
🤖 Generated with [Claude Code](https://claude.ai/code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]