Re: [DISCUSS] PIP-240 A new API to unload subscriptions
Yubiao, thanks for sharing your problem and a proposal, this is very helpful for the community to get in touch with the pain of Pulsar users/administrators. In my experience if a "subscription is stuck", the problems are: * the client has some problems (bug in the client/misconfiguration somewhere) - 99.9% * there is a bug in Pulsar - 0.1% Unloading a topic is an operation that triggers some reset of the state on both the broker and the clients and this usually TEMPORARY unblocks the subscription. I have never seen a problem that is temporarily solved by topic unload/broker restart to be permanently solved with that operation. If there is a problem we should spend time on investigating the problem and not in adding this kind of tool. I believe that we should not continue to add these kinds of hacks into Pulsar: - easy reset... - ignore errors... (catch Throwable...) The overall result is a system that "seems to work" but it actually doesn't work properly I would invest more time in: - monitoring tools (tools to detect quickly stuck consumers) - circuit breakers (fast fail/shut the door to consumers/producers that don't behave correctly) - guard rails (limits to clients to prevent them to exhaust the resources on the brokers) Enrico Il giorno gio 12 gen 2023 alle ore 08:22 ha scritto: > > Hi, Yubiao > > I agree with this idea because some users care about the production rate. > They don't want to unload the whole topic to fix the subscription problem. > > I've got some questions: > > 1. How do you handle the race condition when you are trying to unload the > subscription, and the new consumer wants to subscribe to this subscription at > the same time? I'm unsure if it has the race condition. I just want to remind > you about that. :) > 2. Would you like to add some restful API design to clarify the > implementation? > a. Request method > b. Request path > c. Response code > d. etc. > > > Thanks for your work. > Mattison > On Jan 11, 2023, 17:01 +0800, Yubiao Feng > , wrote: > > Hi community > > > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions. > > > > PIP issue: https://github.com/apache/pulsar/issues/19187 > > > > ### Motivation > > > > We sometimes try to unload the topic to resolve some consumption-stop > > issues. But the unloading topic will also impact the producer side. > > > > ### Goal > > > > Providing a new API to unload the subscription dimension triggers > > reconnection of all consumers on that subscription and reconnection is > > guaranteed by the client. The API will be used in these ways: > > - unload special subscription of one topic(or partitioned topic) > > - unload all subscriptions of one topic(or partitioned topic) > > - unload subscriptions of one topic(or partitioned topic) by regular > > expression > > - If a reader's subscription name is not set, a random subscription name > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can > > uninstall these subscriptions using regular expressions. > > > > In addition to triggering consumer disconnection, Unloading Subscribers > > will restart the Dispatcher, which resets the redeliver message queue and > > delayed message queue in the Broker's memory, which can help resolve issues > > caused by an abnormal dispatcher state. However, the execution flow of > > Unloading Subscribers does not include a restart of the Managed Cursor > > related to this dispatcher; if there is a problem with the cursor, we can > > only rely on the unload topic to solve it. > > > > Note: From the client's perspective, this connection may be shared by > > consumers, producers, and transactions, so Unloading Subscribers maybe > > impact the producer and transaction. > > > > These scenarios are not supported > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also > > read messages from the topic, but Unloading subscribers will not support > > triggering restarts of these three functions( because the cursor is used > > directly to read the data in these scenarios, not the consumer or reader ). > > - The Compression task(subscription name is `__compaction`) also use a > > reader to read data, but Unloading Subscribers does not support it because > > this task creates a new reader each time it starts. > > - Do not support all topics related to Transaction features. > > - `__transaction_buffer_snapshot` works with the task TB recover, and > > this task will create a new reader each time they start. > > - `__transaction_pending_ack` works with the task Transaction Pending Ack > > Store replay, and this task will use managed cursor directly to read data. > > - `__transaction_log_xxx` works with the task Transaction Log, which will > > use managed cursor directly to read data. > > - `transaction_coordinator_assign` No data will be written on this topic. > > > > Special system topic supports > > The system topic `__change_events` is used to support topic-level poli
Re: [DISCUSS] PIP-232: Introduce thread monitor to check if thread is blocked for long time.
I have the following suggestions: 1. This configuration item should be dynamically updated in the Pulsar process, only as a means of troubleshooting when problems occur 2. This configuration item should be turned off by default to avoid impact on performance
Re: Experimenting with Gradle Enterprise build scans for Pulsar CI
Gradle Enterprise build scans is now in use for branches in apache/pulsar . For example, the build scan for master branch can be accessed for the 60+ maven build runs from the GitHub Actions workflow run build summary view. master branch build listing: https://github.com/apache/pulsar/actions/workflows/pulsar-ci.yaml?query=branch%3Amaster Example workflow run (contains links to Gradle Enterprise build scans): https://github.com/apache/pulsar/actions/runs/3902082126 -Lari On 2023/01/04 18:16:41 Lari Hotari wrote: > Here's an example of builds scans for all Maven builds that are run as part > of Pulsar CI workflow: > https://ge.apache.org/scans?search.names=Git%20commit%20id%20short&search.timeZoneId=Europe/Helsinki&search.values=db7ab997&selection.buildScanB=ohtx53k6ymsgc > There are 69 maven builds that are run in total in a single execution of > Pulsar CI workflow! > > The build scans are also linked from the GitHub Actions results page, for > example > https://github.com/apache/pulsar/actions/runs/3839199234#summary-10438505199 > shows this. It's easier to navigate to the builds this way. > > I created a PR https://github.com/apache/pulsar/pull/19133 for making the > changes. Please review! > > -Lari > > On 2023/01/04 11:52:14 Lari Hotari wrote: > > Hi all, > > > > ASF Infra team together with Gradle Inc. has setup a dedicated Gradle > > Enterprise instance for Apache projects at https://ge.apache.org/ . > > > > Gradle Enterprise works for both Gradle builds and Maven builds. > > There's a Maven extension which provides the integration for Maven builds, > > https://docs.gradle.com/enterprise/maven-extension/ . > > You can find more information about the benefits of Gradle Enterprise on > > gradle.com . This is also for Maven builds! > > > > The main features are the "build scan insights" and "build cache > > acceleration". At this time, I'm first looking at experimenting with "build > > scan insights" feature. > > > > Apache Beam project seems to be also looking into this: > > https://lists.apache.org/thread/v43djv8f941x2h1r3mptvt92ynplm7mk > > > > I had a chat on ASF slack with Gavin McDonald about this, and Gavin asked > > me to create an infra ticket. I created > > https://issues.apache.org/jira/browse/INFRA-24052 for configuring the > > secret token required for accessing Gradle Enterprise from CI builds. It > > turns out that all Apache projects should already have the token in > > GE_ACCESS_TOKEN environment variable. I'll experiment with that soon. The > > token won't be available for pull request builds. While experimenting, I'll > > create an experimental branch in apache/pulsar so that the builds for this > > branch have access to the required Gradle Enterprise token. > > > > After this experimentation is completed successfully, I'll create a PR for > > enabling Gradle Enterprise builds scans that run scheduled once a day on > > the master branch (scheduled at > > https://github.com/apache/pulsar/blob/9ec1d071c7188a2db694e9d7b359faaf33cb076e/.github/workflows/pulsar-ci.yaml#L25-L26). > > > > BR, > > > > Lari > > >
Re: [DISCUSS] Registering Jackson Java 8 support modules by default for all Pulsar components, including client
The task to optimize the code that calls Jackson is handled by PR https://github.com/apache/pulsar/pull/19160 as a preparation. Please review since I will merge that and consider lazy consensus unless changes are requested. In the Pulsar master code base, we are already at Java 17. It's time to add support to Jackson support classes that were added in Java 8. The draft PR is https://github.com/apache/pulsar/pull/19161 (it will be rebased after PR 19160 changes are in). I can create a PIP to formalize and document the decision before pushing for a merge for that PR. btw. The pulsar code base also includes GSON for some JSON parsing. It would be good to replace its usage with Jackson. I wonder if there's a special reason why GSON is used besides Jackson. -Lari On 2023/01/10 13:13:50 "r...@apache.org" wrote: > Hello Lari: > > Here I actually want to discuss whether it is necessary for us to introduce > Jackson modules, and what is the scope boundary we use when introducing > Jackson modules? Here are some questions about this: > > > 1. After we introduce Jackson modules, do we need to optimize the code that > originally called Jackson? > 2. After the introduction, we keep the original code using Jackson > unchanged, and only use the logic of Jackson modules in the newly added code > > If we take 1, it seems that the changes here will be relatively large. Many > Pojo objects will involve the serialization and deserialization of Json > objects, and it feels that the entire boundary is not well controlled. > > +1 for me if we're taking 2. > > -- > Thanks > Xiaolong Ran > > Lari Hotari 于2023年1月9日周一 21:25写道: > > > Any change to Jackson configuration is a potential breaking change. Yes, > > it will need a PIP. > > I guess we can continue to discuss the change in this thread before I > > create an actual PIP which can be voted on. > > > > -Lari > > > > On 2023/01/09 11:53:02 丛搏 wrote: > > > Hi, Lari: > > > > > > Will it affect compatibility? If it is just an improved function, I > > > think it can also be added to the pulsar-common module. it adds the > > > dependency, so it needs PIP to discuss. > > > > > > Thanks, > > > Bo > > > > > > Lari Hotari 于2023年1月9日周一 19:06写道: > > > > > > > > Hi all, > > > > > > > > Jackson has a separate Java 8 support modules for adding support for > > proper serialization and deserialization of new classes that were added in > > Java 8 (Java 8 was released in 2014). > > > > > > > > These Jackson Java 8 support modules haven't been used in the Pulsar > > code base. This is a pity. This causes a lot of pain when using Java Time > > classes in Pulsar applications or Pulsar Functions. There are ways to get > > the classes working for applications, but the documentation is missing. It > > would make things easier if the Java 8 support modules for Jackson would be > > included and registered by default. > > > > > > > > I have created a PR to register Jackson Java 8 support modules by > > default for all Pulsar components. The PR is > > https://github.com/apache/pulsar/pull/19161 . > > > > > > > > Please review and provide feedback. Do we need a PIP for this change? > > > > > > > > -Lari > > > > > >
[DISCUSS] PIP-241: TopicEventListener / topic events for the BrokerService
Hi, I am starting discussion for PIP-241: TopicEventListener / topic events for the BrokerService. PIP issue: https://github.com/apache/pulsar/issues/19224 ### Motivation Some Protocol Handlers may need to know about the topic-specific events to update internal caches and/or state. These mechanisms will be useful also for core Pulsar components (like the Transactions subsystem) and probably we would be able to simplify the interaction between the internal components in the broker by using an unified mechanism to handle the lifecycle of topics. Specific use cases: KOP keeps some state for the topic and needs to handle such cases as: - Topic Unloaded: release resources dedicated to the topic - Topic Loaded: trigger loading of components tied to the partition (GroupCoordinator, TransactionManager) - Topic Deleted: remove any persistent state associated to the topic that is stored in additional side system topics - Topic Created: the same as “deleted” (ensure that there is no state on system topics related to the new topic) ### Goal This PIP defines a set of events needed for the protocol handlers (and for internal broker components) to get notifications about topic-specific events as seen by BrokerService. PIP outlines changes needed for protocol handlers to keep/cache state consistent with BrokerService’s. The changes should not affect Pulsar running without protocol handlers or with protocol handlers that do not rely on the new events. ### API Changes ```java /** * Listener for the Topic events. */ @InterfaceAudience.LimitedPrivate @InterfaceStability.Evolving public interface TopicEventsListener { /** * Types of events currently supported. * create/load/unload/delete */ enum TopicEvent { // create events included into load events CREATE, LOAD, UNLOAD, DELETE, } /** * Stages of events currently supported. * before starting the event/successful completion/failed completion */ enum EventStage { BEFORE, SUCCESS, FAILURE } /** * Outcome of the listener. * Ignored for events at final stages (SUCCESS/FAILURE), * */ enum EventProcessingOutcome { OK, FAILURE, NOT_ALLOWED } /** * POJO for event processing result (outcome, message) */ @ToString(includeFieldNames=true) @Data(staticConstructor="of") class EventProcessingResult { private final EventProcessingOutcome outcome; private final String message; } /** * Handle topic event. * Choice of the thread / maintenance of the thread pool is up to the event handlers. * @param topicName - name of the topic * @param event - TopicEvent * @param stage - EventStage * @param t - exception in case of FAILURE, if present/known * @return - EventProcessingResult. * EventProcessingResult.EventProcessingOutcome != OK indicates request to cancel * event at BEFORE stage. */ EventProcessingResult handleEvent(String topicName, TopicEvent event, EventStage stage, Throwable t); } ``` BrokerService: ```java public void addTopicEventListener(TopicEventsListener... listeners) public void removeTopicEventListener(TopicEventsListener... listeners) ``` ### Implementation See PR for the proposed implementation. https://github.com/apache/pulsar/pull/19153 ### Alternatives Add new methods to the BrokerInterceptor API -- Andrey
Re: [DISCUSS] PIP-240 A new API to unload subscriptions
> I would invest more time in: > - monitoring tools (tools to detect quickly stuck consumers) > - circuit breakers (fast fail/shut the door to consumers/producers > that don't behave correctly) > - guard rails (limits to clients to prevent them to exhaust the resources on the brokers) I agree with this point of view, we should not increase the complexity of the code, add non-essential APIs to extend unload, unload is not essentially to solve consumer stuck or producer block problem. Thanks, Bo Enrico Olivelli 于2023年1月12日周四 16:12写道: > > Yubiao, > thanks for sharing your problem and a proposal, this is very helpful > for the community to get in touch with the pain of Pulsar > users/administrators. > > In my experience if a "subscription is stuck", the problems are: > * the client has some problems (bug in the client/misconfiguration > somewhere) - 99.9% > * there is a bug in Pulsar - 0.1% > > Unloading a topic is an operation that triggers some reset of the > state on both the broker and the clients and this usually TEMPORARY > unblocks the subscription. > > I have never seen a problem that is temporarily solved by topic > unload/broker restart to be permanently solved with that operation. > If there is a problem we should spend time on investigating the > problem and not in adding this kind of tool. > > I believe that we should not continue to add these kinds of hacks into Pulsar: > - easy reset... > - ignore errors... (catch Throwable...) > > The overall result is a system that "seems to work" but it actually > doesn't work properly > > I would invest more time in: > - monitoring tools (tools to detect quickly stuck consumers) > - circuit breakers (fast fail/shut the door to consumers/producers > that don't behave correctly) > - guard rails (limits to clients to prevent them to exhaust the > resources on the brokers) > > > > Enrico > > Il giorno gio 12 gen 2023 alle ore 08:22 ha scritto: > > > > Hi, Yubiao > > > > I agree with this idea because some users care about the production rate. > > They don't want to unload the whole topic to fix the subscription problem. > > > > I've got some questions: > > > > 1. How do you handle the race condition when you are trying to unload the > > subscription, and the new consumer wants to subscribe to this subscription > > at the same time? I'm unsure if it has the race condition. I just want to > > remind you about that. :) > > 2. Would you like to add some restful API design to clarify the > > implementation? > > a. Request method > > b. Request path > > c. Response code > > d. etc. > > > > > > Thanks for your work. > > Mattison > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng > > , wrote: > > > Hi community > > > > > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions. > > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187 > > > > > > ### Motivation > > > > > > We sometimes try to unload the topic to resolve some consumption-stop > > > issues. But the unloading topic will also impact the producer side. > > > > > > ### Goal > > > > > > Providing a new API to unload the subscription dimension triggers > > > reconnection of all consumers on that subscription and reconnection is > > > guaranteed by the client. The API will be used in these ways: > > > - unload special subscription of one topic(or partitioned topic) > > > - unload all subscriptions of one topic(or partitioned topic) > > > - unload subscriptions of one topic(or partitioned topic) by regular > > > expression > > > - If a reader's subscription name is not set, a random subscription name > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users > > > can > > > uninstall these subscriptions using regular expressions. > > > > > > In addition to triggering consumer disconnection, Unloading Subscribers > > > will restart the Dispatcher, which resets the redeliver message queue and > > > delayed message queue in the Broker's memory, which can help resolve > > > issues > > > caused by an abnormal dispatcher state. However, the execution flow of > > > Unloading Subscribers does not include a restart of the Managed Cursor > > > related to this dispatcher; if there is a problem with the cursor, we can > > > only rely on the unload topic to solve it. > > > > > > Note: From the client's perspective, this connection may be shared by > > > consumers, producers, and transactions, so Unloading Subscribers maybe > > > impact the producer and transaction. > > > > > > These scenarios are not supported > > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also > > > read messages from the topic, but Unloading subscribers will not support > > > triggering restarts of these three functions( because the cursor is used > > > directly to read the data in these scenarios, not the consumer or reader > > > ). > > > - The Compression task(subscription name is `__compaction`) also use a > > > reader to read data, but Unloading Subscri