This is an automated email from the ASF dual-hosted git repository.

technoboy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new e466f453ebb [improve] [pip] PIP-382: Add a label named reason for 
topic_load_failed_total (#23351)
e466f453ebb is described below

commit e466f453ebbc3fa1999ca6acad708731deb067b6
Author: fengyubiao <[email protected]>
AuthorDate: Fri Aug 29 18:34:23 2025 +0800

    [improve] [pip] PIP-382: Add a label named reason for 
topic_load_failed_total (#23351)
---
 pip/pip-382.md | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/pip/pip-382.md b/pip/pip-382.md
new file mode 100644
index 00000000000..adc6e636fe4
--- /dev/null
+++ b/pip/pip-382.md
@@ -0,0 +1,48 @@
+# PIP-382: Add a label named reason for topic_load_failed_total
+
+# Background knowledge
+
+Pulsar has a metric that indicates load topic failed: 
`topic_load_failed_total`, it will be increased at the following cases
+- The target bundle in unloading.
+- Failed to load policies.
+- Failed to load up Managed Ledger.
+- Failed to read Metadata store.
+- Topic initialize fails, such as failed to re-build deduplication info.
+- Topic load timeout.
+- Others.
+
+# Motivation & Goals
+
+Adding an additional label of the metric `topic_load_failed_total` may let us 
know what error happened fastly, so we can fix the issue fastly.
+
+### Metrics
+
+Add a label named reason for topic_load_failed_total
+- label name: `reason`
+- label values:
+  - `bundle_unloading`
+  - `failed_load_policies`
+  - `failed_load_ml`
+  - `failed_access_metadata_store`
+  - `failed_init`
+  - `timeout`
+  - `others`
+
+
+# Monitoring & Alternatives
+
+- If the value of label value `reason = bundle_unloading` increases a moment, 
and it stop to increase after a while, it means everything is fine.
+  - Otherwise, the load-balancer may encounter an error.  
+- If the value of label value `reason = timeout` increases a moment, and it 
stops to increase after a while, it means too many topics were loaded at the 
same time, it may be okay. 
+  - Otherwise, broker may encounter a deadlock issue, or the resources is not 
enough for the current use case. 
+- For other label values, it means something is not expected, and we can apart 
them by the label value. 
+
+# General Notes
+
+# Links
+
+<!--
+Updated afterwards
+-->
+* Mailing List discussion 
thread:https://lists.apache.org/thread/f3xhmm342jor042n5ykkxoc32ffcn85s
+* Mailing List voting thread: 
https://lists.apache.org/thread/ng6z0dssjh1hgp91f590wkcl2ymhvn48

Reply via email to