onceMisery commented on code in PR #25299:
URL: https://github.com/apache/pulsar/pull/25299#discussion_r2924584533


##########
pip/pip-459.md:
##########
@@ -0,0 +1,331 @@
+# PIP-459: Batch Status Summary and Filtered Listing for Pulsar Functions
+
+# Background knowledge
+
+Pulsar Functions are managed by the Functions worker and exposed through the 
Pulsar Admin API and `pulsar-admin` CLI. Today, listing functions in a 
namespace returns only function names. To understand runtime health, operators 
must fetch function status for each function separately. That creates an N+1 
request pattern: one request to list function names and one additional request 
per function to fetch status. In practice, this is slow, noisy, and hard to use 
in scripts or daily operations.
+
+Function runtime status is already represented by `FunctionStatus`, which 
contains aggregate fields such as the total configured instances and the number 
of running instances. This proposal introduces a lightweight namespace-level 
summary model built from those counts. The summary is intentionally smaller 
than the full status payload: it is designed for listing and filtering, not for 
per-instance inspection.
+
+The proposal also has to account for mixed-version deployments. A new client 
may talk to an older worker that does not expose the new summary endpoint yet. 
In that case, the client must degrade safely instead of failing outright. That 
compatibility requirement affects both the public admin interface and the 
client-side implementation strategy.
+
+# Motivation
+
+`pulsar-admin functions list` currently cannot answer a basic operational 
question: which functions in this namespace are actually running. Operators 
must either inspect each function one by one or write shell loops that issue 
many admin calls. The problem becomes more visible in namespaces containing 
dozens or hundreds of functions, where a simple health check becomes expensive 
and slow.
+
+The initial feature request was to add state-aware listing to the CLI, but the 
implementation uncovered three broader issues that should be addressed together:
+
+1. The current user experience requires N+1 admin calls for namespace-level 
health inspection.
+2. A new summary endpoint must not break new clients when they talk to older 
workers during rolling upgrades.
+3. Namespace-level summary generation can become slow if the worker builds 
results strictly serially.
+
+This PIP proposes a batch status summary API for Pulsar Functions and 
integrates it into the admin client, CLI, and worker implementation in a 
backward-compatible way.
+
+# Goals
+
+## In Scope
+
+- Add a namespace-level batch status summary API for Pulsar Functions.
+- Add a lightweight public data model that returns function name, derived 
state, instance counts, and failure classification.
+- Add admin client support for the new summary API, including fallback to 
legacy workers that do not implement it.
+- Add CLI support for long-format listing and state-based filtering using the 
batch summary API.
+- Add pagination support for namespace-level function summaries.
+- Improve worker-side summary generation latency by using controlled 
parallelism.
+- Add a worker configuration knob to cap summary-query parallelism.
+- Add a worker metric to observe summary-query execution time.
+
+## Out of Scope
+
+- Changing the existing `functions list` endpoint that returns only function 
names.
+- Returning full per-instance function status from the new namespace-level 
endpoint.
+- Adding equivalent summary endpoints for sources or sinks in this PIP.
+- Adding server-side filtering by state in the REST endpoint.
+- Reworking the underlying function runtime status model or scheduler behavior.
+
+
+# High Level Design
+
+This proposal adds a new REST endpoint:
+
+`GET /admin/v3/functions/{tenant}/{namespace}/status/summary`
+
+The endpoint returns a list of `FunctionStatusSummary` objects. Each object 
contains:
+
+- `name`
+- `state`: `RUNNING`, `STOPPED`, `PARTIAL`,  `UNKNOWN`
+- `numInstances`
+- `numRunning`
+- `error`
+- `errorType`
+
+The server remains a generic summary provider. It does not apply state 
filtering. The CLI consumes the summary list and performs presentation concerns 
locally, such as `--state` filtering and `--long` formatting. This separation 
keeps the endpoint reusable for other clients and prevents coupling the REST 
contract to one CLI presentation format.
+
+For compatibility, the admin client first tries the new summary endpoint. If 
the server responds with `404 Not Found` or `405 Method Not Allowed`, the 
client falls back to the legacy flow: fetch the function names, apply 
name-based pagination, and then query each function status individually to 
build summaries client-side. This allows a new client to work against older 
workers during mixed-version upgrades.
+
+On the worker side, summary generation is executed only for the requested page 
and uses a bounded thread pool. A new worker configuration, 
`functionsStatusSummaryMaxParallelism`, limits how many function status lookups 
may run concurrently for a single summary request.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+### Data model
+
+A new public model, `FunctionStatusSummary`, is added under the admin API data 
package. It intentionally returns only aggregate listing information:
+
+```java
+public class FunctionStatusSummary {
+    public enum SummaryState {
+        RUNNING,
+        STOPPED,
+        PARTIAL,
+        UNKNOWN
+    }
+
+    public enum ErrorType {
+        AUTHENTICATION_FAILED,
+        FUNCTION_NOT_FOUND,
+        NETWORK_ERROR,
+        INTERNAL_ERROR
+    }
+
+    private String name;
+    private SummaryState state;
+    private int numInstances;
+    private int numRunning;
+    private String error;
+    private ErrorType errorType;

Review Comment:
   I think this is a great suggestion. @shibd 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to