Re: [VOTE] Apache Pulsar 2.8.2 candidate 1
On 2021/12/14 03:25:45 linlin wrote: > This is the first release candidate for Apache Pulsar, version 2.8.2. > > It fixes the following issues: > https://github.com/apache/pulsar/issues?q=label%3Acherry-picked%2Fbranch-2.8+is%3Aclosed+label%3Arelease%2F2.8.2 > > *** Please download, test and vote on this release. This vote will stay open > for at least 72 hours *** > > Note that we are voting upon the source (tag), binaries are provided for > convenience. > > Source and binary files: > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.8.2-candidate-1/ > > SHA-512 checksums: > f51e93d5caa7ea4ec2616e096ca75dd71bccb475632ee5ff35d713b8f5112689d17315a1cd9350dd8f8f0bdc2e059be5fb179b2b8b3b39aae77e466103294683 > apache-pulsar-2.8.2-bin.tar.gz > 8540641e76fb541f9dbfaff263946ed19a585266e5de011e78188d78ec4e1c828e8893eb2e783a1ebad866f5513efffd93396b7abd77c347f34ab689badf4fad > apache-pulsar-2.8.2-src.tar.gz > > > Maven staging repo: > https://repository.apache.org/content/repositories/orgapachepulsar-1108/ > > The tag to be voted upon: > v2.8.2-candidate-1 > https://github.com/apache/pulsar/releases/tag/v2.8.2-candidate-1 > > Pulsar's KEYS file containing PGP keys we use to sign the release: > https://dist.apache.org/repos/dist/dev/pulsar/KEYS > > Please download the source package, and follow the README to build > and run the Pulsar standalone service. > > Lin Lin > Sorry, I will check it again and start a new vote
Re: [DISCUSSION] PIP-118: Do not restart brokers when ZooKeeper session expires
On 2021/12/14 18:03:20 Matteo Merli wrote: > https://github.com/apache/pulsar/issues/13304 > > > Pasted below for quoting convenience. > > --- > > > ## Motivation > > After all the work done for PIP-45 that was already included in 2.8 and 2.9 > releases, it enabled the concept of re-acquirable resource locks and leader > election. > > Another important change was to avoid doing any deferrable metadata operation > when we know that we are not currently connected to the metadata service. > > Finally, that enabled stabilization in 2.9 the configuration setting that > allows > brokers to continue operating in a safe mode when the session with ZooKeeper > expires. > > The way it works is that, when we lose a ZooKeeper session, the data plane > will > continue to work undisturbed, relying on the BookKeeper fencing to avoid any > inconsistencies. > > New topics are not able to get started, but existing topics will see no > impact. > > The original intention for shutting down the brokers was to ensure that we > would automatically go back to a consistent state, with respect to which > resources are "owned" in ZooKeeper by a given broker. > > With the re-acquirable resource locks, that problem was solved and thoroughly > tested to be robust. > > ## Proposed changes > > In 2.10 release, for the setting: > > ```properties > # There are two policies to apply when a broker metadata session > expires: session expired happens, "shutdown" or "reconnect". > # With "shutdown", the broker will be restarted. > # With "reconnect", the broker will keep serving the topics, while > attempting to recreate a new session. > zookeeperSessionExpiredPolicy=shutdown > ``` > > Change its default value to `reconnect`. > > > -- > Matteo Merli > > +1 Please pay attention to the following scenarios: I tried to use `Reconnect` in the production environment, and there was a serious online failure: two Brokers kept opening and closing the Ledger of the same topic, and the Ledger of BK kept throwing fencing exceptions.
Re: [VOTE] Apache Pulsar 2.8.2 candidate 2
On 2021/12/21 10:48:41 Shivji Kumar Jha wrote: > Hi LinLin, > > Log4j version 2.16.0 has DDoS possibilities in some cases [1] . Can we move > to Log4j 2.17.0 in 2.8.2? > > Apache Log4j2 versions 2.0-alpha1 through 2.16.0 (excluding 2.12.3) did not > > protect from uncontrolled recursion from self-referential lookups. This > > allows an attacker with control over Thread Context Map data to cause a > > denial of service when a crafted string is interpreted. This issue was > > fixed in Log4j 2.17.0 and 2.12.3. Already included
Re: [VOTE] PIP-132: Include message header size when check maxMessageSize of non-batch message on the client side.
+1 On 2022/01/12 01:57:59 Haiting Jiang wrote: > This is the voting thread for PIP-132. It will stay open for at least 48 > hours. > > https://github.com/apache/pulsar/issues/13591 > > Pasted below for quoting convenience. > > > > ## Motivation > > Currently, Pulsar client (Java) only checks payload size for max message size > validation. > > Client throws TimeoutException if we produce a message with too many > properties, see [1]. > But the root cause is that is trigged TooLongFrameException in broker server. > > In this PIP, I propose to include message header size when check > maxMessageSize of non-batch > messages, this brings the following benefits. > 1. Clients can throw InvalidMessageException immediately if properties takes > too much storage space. > 2. This will make the behaviour consistent with topic level max message size > check in broker. > 3. Strictly limit the entry size less than maxMessageSize, avoid sending > message to bookkeeper failed. > > ## Goal > > Include message header size when check maxMessageSize for non-batch message > on the client side. > > ## Implementation > > ``` > // Add a size check in > org.apache.pulsar.client.impl.ProducerImpl#processOpSendMsg > if (op.msg != null // for non-batch messages only > && op.getMessageHeaderAndPayloadSize() > ClientCnx.getMaxMessageSize()) { > // finish send op with InvalidMessageException > releaseSemaphoreForSendOp(op); > op.sendComplete(new PulsarClientException(new InvalidMessageException, > op.sequenceId)); > } > > > // > org.apache.pulsar.client.impl.ProducerImpl.OpSendMsg#getMessageHeaderAndPayloadSize > > public int getMessageHeaderAndPayloadSize() { > ByteBuf cmdHeader = cmd.getFirst(); > cmdHeader.markReaderIndex(); > int totalSize = cmdHeader.readInt(); > int cmdSize = cmdHeader.readInt(); > int msgHeadersAndPayloadSize = totalSize - cmdSize - 4; > cmdHeader.resetReaderIndex(); > return msgHeadersAndPayloadSize; > } > ``` > > ## Reject Alternatives > Add a new property like "maxPropertiesSize" or "maxHeaderSize" in broker.conf > and pass it to > client like maxMessageSize. But the implementation is much more complex, and > don't have the > benefit 2 and 3 mentioned in Motivation. > > ## Compatibility Issue > As a matter of fact, this PIP narrows down the sendable range. Previously, > when maxMessageSize > is 1KB, it's ok to send message with 1KB properties and 1KB payload. But with > this PIP, the > sending will fail with InvalidMessageException. > > One conservative way is to add a boolean config "includeHeaderInSizeCheck" to > enable this > feature. But I think it's OK to enable this directly as it's more reasonable, > and I don't see good > migration plan if we add a config for this. > > [1] https://github.com/apache/pulsar/issues/13560 > > Thanks, > Haiting Jiang >
Re: [VOTE] PIP 131: Resolve produce chunk messages failed when topic level maxMessageSize is set
+1 On 2021/12/29 02:29:21 Haiting Jiang wrote: > This is the voting thread for PIP-131. It will stay open for at least 48h. > > https://github.com/apache/pulsar/issues/13544 > > The discussion thread is > https://lists.apache.org/thread/c63d9s73j9x1m3dkqr3r38gyp8s7cwzf > > ## Motivation > > Currently, chunk messages producing fails if topic level maxMessageSize is > set [1]. The root cause of this issue is because chunk message is using > broker level maxMessageSize as chunk size. And topic level maxMessageSize is > always <= broker level maxMessageSize. So once it is set, the on-going chunk > message producing fails. > > ## Goal > > Resolve topic level maxMessageSize compatibility issue with chunking messages. > > ## Implementation > > Current best solution would be just skipping topic level maxMessageSize check > in org.apache.pulsar.broker.service.AbstractTopic#isExceedMaximumMessageSize. > Topic level maxMessageSize is introduced in [2], for the purpose of "easier > to plan resource quotas for client allocation". And IMO this change will not > bring further complex into this. > > ## Reject Alternatives > > Add a client side topic level maxMessageSize and keep it synced with broker. > > Required changes: > - [client] Add a new field > org.apache.pulsar.client.impl.ProducerBase#maxMessageSize to store this > client side topic level maxMessageSize. > - [PulsarApi.proto] Add a new field maxMessageSize in the > CommandProducerSuccess for the initial value of ProducerBase#maxMessageSize > - [PulsarApi.proto] Add a new Command like > CommandUpdateClientPolicy{producerId, maxMessageSize} to update > ProducerBase#maxMessageSize when topic level maxMessageSize is updated. > Further more, some other data consistency issues need be handled very > carefully when maxMessageSize is updated. > This alternative is complex but can also solve other topic level > maxMessageSize issue [3] when batching is enabled (non-batching case is > solved with PR [4]). > > [1] https://github.com/apache/pulsar/issues/13360 > [2] https://github.com/apache/pulsar/pull/8732 > [3] https://github.com/apache/pulsar/issues/12958 > [4] https://github.com/apache/pulsar/pull/13147 > > Thanks, > Haiting Jiang >
Re: [VOTE] PIP-129: Introduce intermediate state for ledger deletion
+1
Re: [VOTE] Pulsar Release 2.9.2 Candidate 2
My personal opinion: If this is a blocking issue, we should tag the issue and raise it in the discussion stage, not in the final release stage, which will waste a lot of time of the release manager. But I see this issue is already closed. https://github.com/apache/pulsar/pull/14097 We can evaluate the repair time of this issue. If it does not take too much time, I think it can be merged into 2.9.2. If it is too late, I suggest move it to 2.9.3, since there are other serious issues waiting to be released, we can run in small steps.
Re: [VOTE] Pulsar Release 2.9.2 Candidate 2
+1(binding) 1. Checked the signature 2. Start standalone 3. Publish and consume successfully 4. Checked function
Re: [VOTE] [PIP-143] Support split bundle by specified boundaries
+1 Thanks, Lin Lin
Re: [VOTE] Pulsar Release 2.9.2 Candidate 4
- Checked the signature - Checked the SHA-512 checksums - Start standalone - Create tenant and namespace - Publish messages and consume messages - Check Function Thanks, Lin Lin
Re: [VOTE] [PIP-152] Support subscription level dispatch rate limiter setting.
+1 Lin Lin On 2022/04/19 03:27:39 Haiting Jiang wrote: > Hi Pulsar community, > > This is the voting thread for PIP-152. It will stay open for at least 48 > hours. > > The proposal can be found: https://github.com/apache/pulsar/issues/15094 > > Discuss thread: > https://lists.apache.org/thread/r6dzr09lc42yh79vt0dvmvlv6wtz2czn > > Thanks, > Haiting Jiang >
Re: [VOTE] PIP-155: Drop support for Python2
+1 Lin Lin
Re: [VOTE] [PIP-150] Support read the message of startMessageId position on the broker side
+1 Lin Lin
Re: [DISCUSS] PIP-105 extension: per-consumer filtering
The addition of this feature looks like a matter of consumer selection. Should the consumer selection of each Dispatcher be abstracted into an extensible interface? Thanks Lin Lin
Re: Call for projects and mentors for OSPP 2022
Thanks for Penghui and Dianjin, Project Name: Implement the MongoDB Connector Project Description: Pulsar IO connectors enable we to easily create, deploy, and manage connectors that interact with external systems, such as Apache Cassandra, Aerospike, and many others. Currently, the MongoDB Connector is still in a demo state, we need a MongoDB Connector that can run in a production environment. We need to support both sink and source, full data transfer, multi-table data transfer, etc.. Difficulty Level: - [x] Basic - [ ] Advanced Project Validation Items: Item 1: Understand and try to use pulsar connector. Item 2: Prepare a design for this feature. Item 3: Start coding. Item 4: Add unit and integration tests. Item 5: Add doc for this feature. Project Mentor: Your Name: Lin Lin Your Email: lin...@apache.org Your Apache ID: linlin
Re: [VOTE] [PIP-182] Provide new load balance placement strategy implementation for ModularLoadManagerStrategy
+1 binding Thanks, Lin Lin
Re: [VOTE] PIP-180: Shadow Topic, an alternative way to support readonly topic ownership
+1 (binding)
Re: [E] Re: [PIP-78] Split the individual acknowledgments into multiple entries
Hi Rajan, Thank you for your PR. The main difference lies in whether 10MB is enough and memory doubling problem, which is caused by different business scenarios. In some business scenario, the QPS of 20k/s is considered to be very low, and requests exceeding this order of magnitude are common. If it is only increased to 10MB, the time exceeding the threshold only changes from 30 seconds to 60 seconds, and the problems in PIP are still not solved. "large enough" may be base on your scenario, and in some scenario, it is not enough in most cases... Because the problem has not been solved, I suggest to abstract, so that different people can choose. Your PR is an improvement to the current performance, there is no conflict between them. Thanks On 2021/01/27 03:50:07, Rajan Dhabalia wrote: > I have created a PR which should allow brokers to store up to 10M > unack-message ranges. I think it should be large enough for any usecases > and probably now, we might not need to introduce abstraction for ack > management to avoid any further complexity in message acknowledgement path > as well. > https://github.com/apache/pulsar/pull/9292 > > Thanks, > Rajan
Re: Virtual Pulsar Community Meetings
I look forward to it and hope to coordinate the time so that developers from China can also participate.
Re: [Discuss] Optimize the performance of creating Topic
On 2021/08/03 11:12:34, Ivan Kelly wrote: > > Creating a topic will first check whether the topic already exists. > > The verification will read all topics under the namespace, and then > > traverse these topics to see if the topic already exists. > > When there are a large number of topics under the namespace(about 300,000 > > topics), > > less than 10 topics can be created in one second. > Why do we need to read all topics at all? We really just need to check > whether TOPIC or TOPIC-partition-0 exist. > > Even if they do not exist, is there anything to stop one client > creating TOPIC and another creating TOPIC-partition-0? > > -Ivan > Such as the test case "testCreatePartitionedTopicHavingNonPartitionTopicWithPartitionSuffix". Some non partition topic has the partition suffix. In this case, we can not use the cache to check anymore. And we have to traverse
Re: [Proposal] Make Dispatcher pluggable
> I also share this problem, because if you want to efficiently implement > message filtering you need to do it in the broker side. > > I am not sure that making the full Dispatcher pluggable is a good idea, > because the code is too complex and also > it really depends on the internals of the Broker. > > If we make this pluggable that we must define a limited private but > "stable" API. > > My suggestion is to define particular needs and then add features to make > pluggable single specific parts > of the dispatcher. > > For instance I would add some support for "Message filtering", leaving the > implementation of the "filter" to a plugin. > This way you could implement filtering using JMS rules, or using other > metadata or security related information > > Regards > > Enrico > Hi, Enrico: Thank you for your feedback. We now have this method AbstractBaseDispatcher#filterEntriesForConsumer I think we can plug-in this method. Do you think this is okay? Provider: ``` public interface EntriesFilterProvider { // Use `EntriesFilterProvider` to create `EntriesFilter` EntriesFilter createEntriesFilter(Subscription subscription); static EntriesFilterProvider createEntriesFilterProvider(ServiceConfiguration serviceConfiguration) { // According to `EntriesFilterProviderClassName`, create Provider through reflection } } ``` Add an interface for filtering: public interface EntriesFilter { filterEntriesForConsumer(Optional entryWrapper, int entryWrapperOffset, List entries, EntryBatchSizes batchSizes, SendMessageInfo sendMessageInfo, EntryBatchIndexesAcks indexesAcks, ManagedCursor cursor, boolean isReplayRead) } Regards Lin
Re: [Proposal] Make Dispatcher pluggable
> What about: > > public interface MessageFilter { >enum FilterOutcome { >ACCEPT, -> deliver to the Consumer >REJECT, -> skip the message >SYSTEM -> use standard system processing > } > > public FilterOutcome filterMessages(List messages, > FilterContext context) throws Exception; > > } > > interface MessageWrapper { > allow to access Message payload, metadata, headers... > } > > interface FilterContext { >...isReplayRead, >...access acks >...access ManagedCursor > } > > > This way the implementation of the filter will not use internal APIs that > evolve in Pulsar sometimes even in point releases. > > Enrico Hi Enrico: Thank you for your advice. I got your point, it works for me. More detailed interface implementation, we can discuss it in PR.
Re: [Proposal] Make Dispatcher pluggable
On 2021/09/10 06:22:09, PengHui Li wrote: > Looks good overall, > > Do we need to consider the batch filter since the API is defined as > `Message`, not `Entry`, so parts of the batch need to filter out, does it > should be handled by Filter, or the consumer side, or the producer side? > > Thanks, > Penghui > Hi Penghui: In my opinion 1) It should be the Entry dimension, and the execution should be before AbstractBaseDispatcher#filterEntriesForConsumer, so that custom filtering can be performed before the data is affected 2) It should be filtered on the Broker side, which can reduce the transmission of unnecessary data Thanks
Re: [Proposal] Make Dispatcher pluggable
On 2021/09/24 14:09:14, PengHui Li wrote: > Sorry for the late reply, > > If a batch has 10 messages, but users only want to filter out parts of them > such as 3 messages, only 7 messages should be processed at the consumer > side. > So if the proposal is for the entry filter, I think we should have the > EntryFitler interface, not MessageFilter? > > Actually, I also have some doubts about the entry filter, not sure if it > can be used in the real world. Or we should disable the batch when using > the filter or deserialize > the single message metadata to decide if the consumer should skip this > message, looks both of them will bring greater overhead to the broker. > > But I am not against the pluggable filter, not all users consider the > performance first, If they are more think about it at a functional > perspective, the pluggable filter will help them. > We should clarify it in the proposal, let users know how to make trade-offs. > > Thanks, > Penghui > Hello penghui, I agree with your concerns. At this stage, we can only do Entry-level filtering. If the Message in the Entry is forced to be filtered on the Broker side, there will be problems in the subsequent consumer ack. Therefore, if we want to use this filter, we must set enableBatching=false, which is the same as delayed messages. I will explain this point in the comments of the interface Thanks, Lin Lin
Re: [VOTE] PIP-99 Pulsar Proxy Extensions
On 2021/09/27 09:00:52, Enrico Olivelli wrote: > Hello everyone, > > I would like to start a VOTE for PIP-99 Pulsar Proxy Extensions > > This is the PIP-99 > https://github.com/apache/pulsar/issues/12157 > > This is the PR for the implementation (still draft, I will complete it > after the PIP is approved) > https://github.com/apache/pulsar/pull/11838 > > Please VOTE within 72 hours, approximately Thursday 30th > > Best regards > Enrico > +1
Re: Creating Good Release notes
Hello Enrico: I am releasing 2.8.2, and I wrote a small tool to help me generate the 2.8.2 release notes, I found that I need several tags to identify: 1) Component, used to identify which type of PR this PR belongs to. I think we can discuss which components to keep in the future. There are a lot of components now 2) cherry-picked/branch-2.8 and release/2.8.2, `release/2.8.2` is used to identify whether the current release, `cherry-picked/branch-2.8` is used to identify whether I need to generate documentation for this PR Lin Lin On 2021/12/02 11:11:43 Enrico Olivelli wrote: > Hello community, > > There is an open discussion on the Pulsar 2.9.0 release notes PR: > https://github.com/apache/pulsar/pull/12425 > > I have created the block of release notes by downloading the list of PR > using some GitHub API. > Then I have manually classified: > - News and Noteworthy: cool things in the Release > - Breaking Changes: things you MUST know when you upgrade > - Java Client, C++ Client, Python Client, Functions/Pulsar IO > > The goal is to provide useful information for people who want to upgrade > Pulsar. > > My problems are: > - PR titles are often badly written, but I don't want to fix all of them > (typos, tenses of verbs, formatting) > - There are more than 300 PRs, I don't want to classify them manually, I > just highlighted the most important from my point of view > > If for 2.9.0 we still keep a list of PR, then I believe that the current > status of the patch is good. > > If we want to do it another way, then I am now asking if there is someone > who can volunteer in fixing and classifying the list of 300 PRs, it is a > huge task. > > There is already much more work to do to get 2.9.0 completely released (and > also PulsarAdapters) and we have to cut 2.9.1 as soon as possible due to a > bad regression found in 2.9.0. > > Thanks > Enrico >
Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API
User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean that they allow this bug. This is a configuration, not an API. It is difficult to use documentation to regulate user behavior. Maybe we can add a new field (BrokerTimestamp) to save the timestamp on the Broker side. The time priority for trimming Ledger is as follows: BrokerPublishTime > BrokerTimestamp If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set. If not, we set `BrokerReceiveTime` and is no longer affected by client time. On 2024/01/15 02:15:17 PengHui Li wrote: > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` by default. > Users can still have a way to disable it if they care about the additional > metadata stored > in each entry. > > For the `hasBrokerPublishTime` API. The topic might also have historical > data without > broker publish time. So, it should be fine to keep this API because we > don't know how > long users will retain their data. > > Regards, > Penghui > > On Sat, Jan 6, 2024 at 10:35 PM linlin wrote: > > > Now, if the message's metadata does not set a broker side timestamp, the > > ledger expiration check is based on the client's publish time. > > > > When the client machine's clock is incorrect (eg: set to 1 year later) , > > the ledger can not be cleaned up. Issue > > https://github.com/apache/pulsar/issues/21347 > > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for messages > > on the broker side, but we can not ensure that the > > `AppendBrokerTimestampMetadataInterceptor` is always enable > > > > Therefore, I open this PR(https://github.com/apache/pulsar/pull/21835) to > > always set the broker timestamp for messages on the broker side. > > > > With this change , firstly we should deprecate > > AppendBrokerTimestampMetadataInterceptor. > > It no longer needs to exist > > > > Secondly, we should deprecate `hasBrokerPublishTime` in interface Message. > > It always returns true. > > This API is created in PR (https://github.com/apache/pulsar/pull/11553) > > This PR is for the client to obtain BrokerPublishTime, so the > > `hasBrokerPublishTime` API is not necessary. > > >
Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API
Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough. We should also improve the TTL: When client publish time > Ledger create time + Ledger rollover time, and `brokerPublishTime` is not set, we can make Ledger TTL time = Ledger create time + Ledger rollover time. This change can make sure entry expired when client clock is not right. On 2024/01/17 04:31:16 PengHui Li wrote: > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean > that they allow this bug. > This is a configuration, not an API. It is difficult to use documentation > to regulate user behavior. > > Actually, they need to know why they want to disable ` > AppendBrokerTimestampMetadataInterceptor`. > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata > Interceptor`? > > Pulsar's default configuration tries to provide a best practice for most of > the cases. To avoid the potential > problem, Pulsar enables `AppendBrokerTimestampMetadataInterceptor` by > default. But it doesn't mean all the cases > should enable `AppendBrokerTimestampMetadataInterceptor`. If users don't > use TTL, the producers are > well-maintained. Is there any reason to have at least 16 bytes append to > each entry? The default configuration > behavior change also needs to be highlighted in the release note. Users > need to know if they don't disable the > `AppendBrokerTimestampMetadataInterceptor` manually on a newer version, the > retained data size will increase. > > BTW, we need to explain each interceptor in the `broker.conf` and why ` > AppendBrokerTimestampMetadataInterceptor` > is enabled by default. If users don't want to read it, change anything in > `broker.conf` they don't really know what it is. > How can they know what they expect? > > Regards, > Penghui > > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin wrote: > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean > > that they allow this bug. > > This is a configuration, not an API. It is difficult to use documentation > > to regulate user behavior. > > > > Maybe we can add a new field (BrokerTimestamp) to save the timestamp on > > the Broker side. > > The time priority for trimming Ledger is as follows: > > > > BrokerPublishTime > BrokerTimestamp > > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set. > > If not, we set `BrokerReceiveTime` and is no longer affected by client > > time. > > > > On 2024/01/15 02:15:17 PengHui Li wrote: > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` by > > default. > > > Users can still have a way to disable it if they care about the > > additional > > > metadata stored > > > in each entry. > > > > > > For the `hasBrokerPublishTime` API. The topic might also have historical > > > data without > > > broker publish time. So, it should be fine to keep this API because we > > > don't know how > > > long users will retain their data. > > > > > > Regards, > > > Penghui > > > > > > On Sat, Jan 6, 2024 at 10:35 PM linlin wrote: > > > > > > > Now, if the message's metadata does not set a broker side timestamp, > > the > > > > ledger expiration check is based on the client's publish time. > > > > > > > > When the client machine's clock is incorrect (eg: set to 1 year later) > > , > > > > the ledger can not be cleaned up. Issue > > > > https://github.com/apache/pulsar/issues/21347 > > > > > > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for > > messages > > > > on the broker side, but we can not ensure that the > > > > `AppendBrokerTimestampMetadataInterceptor` is always enable > > > > > > > > Therefore, I open this PR(https://github.com/apache/pulsar/pull/21835) > > to > > > > always set the broker timestamp for messages on the broker side. > > > > > > > > With this change , firstly we should deprecate > > > > AppendBrokerTimestampMetadataInterceptor. > > > > It no longer needs to exist > > > > > > > > Secondly, we should deprecate `hasBrokerPublishTime` in interface > > Message. > > > > It always returns true. > > > > This API is created in PR (https://github.com/apache/pulsar/pull/11553 > > ) > > > > This PR is for the client to obtain BrokerPublishTime, so the > > > > `hasBrokerPublishTime` API is not necessary. > > > > > > > > > >
Re: [DISCUSS] PIP-232: Introduce thread monitor to check if thread is blocked for long time.
I have the following suggestions: 1. This configuration item should be dynamically updated in the Pulsar process, only as a means of troubleshooting when problems occur 2. This configuration item should be turned off by default to avoid impact on performance
Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin
Thanks for your review. > Could you clarify the limitation of the current logic? The current logic cannot guarantee that the traffic of each bundle is the same, and must be balanced through split. However, the load of the topic is not always the same, and the traffic of the business changes with time, so the load of bundle will continue to change. If we rely on split+unload to balance, the number of bundles will eventually reach the upper limit. In order to avoid frequent split and unload, the current logic has many thresholds, allowing Broker to tolerate load imbalance, which is one of the reasons why the load gap between different nodes of the Pulsar cluster is large > For this issue, the community introduced a new assignment strategy, > LeastResourceUsageWithWeight, which better randomizes assignments. Yes, but LeastResourceUsageWithWeight still cannot completely solve the current problem, only alleviate it. We also optimized based on this implementation, but we will discuss this optimization in the following PIP, The current pip is not covered. > If each partition has the same load, then having the same number of topics per bundle should lead to the load balance. Then, I wonder how the current way, "hashing" does not achieve the goal here. We think that the loads of different partitions under a same topic are the same, but the loads of partitions of different topics are different. Bundles are shared by all topics in the entire namespace. If we guarantee each bundle has the same number of partitions, but these partitions may come from different topics, resulting in different loads for each bundle. If we split bundle according to load, the load of each topic may be different in different time periods, and it is impossible to keep the load of each Bundle the same. Using the round robin strategy, we can ensure that the number of partitions from a same Topic on each Bundle is roughly consistent, so that the load of each Bundle is also consistent. > happens if the leader restarts? how do we guarantee this mappingpersistence? 1)First of all, we need to find the starting bundle. partition-0 finds a bundle through consistent hashing, so as long as the number of bundles remains the same, the starting bundle is the same every time, and then other partitions 1, 2, 3, 4 ... is assigned the same result every time. 2)If the number of bundles changes, i.e. triggering split, the bundles of the entire namespace will be forced to be unloaded and all reassigned > It is unclear how RoundRobinPartitionAssigner will work with the existing > code. The specific implementation has been refined, please check the latest PIP issue On 2023/03/16 18:20:35 Heesung Sohn wrote: > Hi, > > Thank you for sharing this. > In general, I think this can be another good option for Pulsar load > assignment logic. > However, I have some comments below.
RE: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin
Thanks for joining this discussion > 1. where is the partition to bundle mapping stored? We don't need to store the mapping relationship, it can be calculated dynamically. The first is the starting bundle, partition-0 is calculated directly through consistent hashing. Subsequent partitions are assigned to subsequent bundles by round robin > 2. when upgrade origin logic to the new round robin logic. how the current > code distinguish partition assigned by origin logic and the new created topic > assign by round robin logic. This is an incompatible modification, so the entire cluster needs to be upgraded, not just a part of the nodes > 2. can you explain how the re-assignment works (when bundle number change). > which component will trigger and do the work ? When a bundle-split occurs, the bundle unload at the namespace level will be triggered. In this namespace, the binding relationship between all partitions and the bundle will be re-determined. The re-determined steps are as stated in the issue: 1) partition-0 finds the starting bundle through consistent hashing 2) Subsequent partitions are assigned to subsequent bundles by round robin > 3. If bundle-split is not expected. how many bundle should user set. and do > we need disable bundle split we the round robin logic applied. Now this way does not limit the use of bundle-split, but it will trigger the rebinding of partitions under the entire namespace during bundle split, and there will be a certain allocation time
RE: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin
The namespace level bundle-unload can be performed in NamespaceService#splitAndOwnBundleOnceAndRetry A new judgment will be added here. After splitting the bundle, it should determine whether to unload at the namespace level. On 2023/03/19 09:53:07 lifepuzzlefun wrote: > I'm interest on the implementation details. > > > 1. where is the partition to bundle mapping stored?when upgrade origin logic > to the new round robin logic. how the current code distinguish partition > assigned by origin logic and the new created topic assign by round robin > logic. > > > 2. can you explain how the re-assignment works (when bundle number change). > which component will trigger and do the work ? > > > 3. If bundle-split is not expected. how many bundle should user set. and do > we need disable bundle split we the round robin logic applied. > > > >
Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin
> This appears to be the "round-robin topic-to-bundle mapping" option in > the `fundBundle` function. Is this the only place that needs an update? Can > you list what change is required? In this PIP, we only discuss topic-to-bundle mapping Change is required: 1) When lookup, partitions is assigned to bundle: Lookup -> NamespaceService#getBrokerServiceUrlAsync -> NamespaceService#getBundleAsync -> NamespaceBundles#findBundle Consistent hashing is now used to assign partitions to bundle in NamespaceBundles#findBundle. We should add a configuration item partitionAssignerClassName, so that different partition assignment algorithms can be dynamically configured. The existing algorithm will be used as the default (partitionAssignerClassName=ConsistentHashingPartitionAssigner) 2) Implement a new partition assignment class RoundRobinPartitionAssigner. New partition assignments will be implemented in this class > How do we enable this "round-robin topic-to-bundle mapping option" (by > namespace policy and broker.conf)? In broker.conf, a new option called `partitionAssignerClassName` > Can we apply this option to existing namespaces? (what's the admin > operation to enable this option)? The cluster must ensure that all nodes use the same algorithm. Broker-level configuration can be made effective by restarting or admin API BrokersBase#updateDynamicConfiguration > I assume the "round-robin topic-to-bundle mapping option" works with a > single partitioned topic, because other topics might show different load > per partition. Is this intention? (so users need to ensure not to put other > topics in the namespace, if this option is configured) For single-partition topics, since the starting bundle is determined using a consistent hash. Therefore, single-partition topics will spread out to different bundle as much as possible. For high load single-partition topics, current algorithms cannot solve this problem. This PIP cannot solve this problem as well. If it just a low load single-partition topic , the impact on the entire bundle is very small. However, in real scenarios, high-load businesses will share the load through multiple partitions. > Some brokers might have more bundles than other brokers. Do we have > different logic for bundle balancing across brokers? or do we rely on the > existing assign/unload/split logic to balance bundles among brokers? In this PIP, we do not involve the mapping between bundles and brokers, the existing algorithm works well with this PIP. However, we will also contribute our mapping algorithm in the subsequent PIP. For example: bundles under same namespace can be assigned to broker in a round-robin manner.
Re: [VOTE] PIP-254: Support configuring client version with a description suffix
+1 Thanks, Lin Lin On 2023/03/15 07:54:20 Yunze Xu wrote: > Hi all, > > This thread is to start the vote for PIP-254. > > Discussion thread: > https://lists.apache.org/thread/65cf7w76tt23sbsjnr8rpfxqf1nt9s9l > > PIP link: https://github.com/apache/pulsar/issues/19705 > > Thanks, > Yunze >
Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin
neral solution for common scenarios. I support making > the topic-bundle assigner pluggable without > introducing the implementation to the Pulsar repo. Users can implement > their own assigner based on the business > requirement. Pulsar's general solution may not be good for all scenarios, > but it is better for scalability (bundle split) > and enough for most common scenarios. We can keep improving the general > solution for the general requirement > for the most common scenarios. > > Regards, > Penghui > > > On Wed, Mar 22, 2023 at 9:52 AM Lin Lin wrote: > > > > > > This appears to be the "round-robin topic-to-bundle mapping" option in > > > the `fundBundle` function. Is this the only place that needs an update? > > Can > > > you list what change is required? > > > > In this PIP, we only discuss topic-to-bundle mapping > > Change is required: > > 1) > > When lookup, partitions is assigned to bundle: > > Lookup -> NamespaceService#getBrokerServiceUrlAsync -> > > NamespaceService#getBundleAsync -> > > NamespaceBundles#findBundle > > Consistent hashing is now used to assign partitions to bundle in > > NamespaceBundles#findBundle. > > We should add a configuration item partitionAssignerClassName, so that > > different partition assignment algorithms can be dynamically configured. > > The existing algorithm will be used as the default > > (partitionAssignerClassName=ConsistentHashingPartitionAssigner) > > 2) > > Implement a new partition assignment class RoundRobinPartitionAssigner. > > New partition assignments will be implemented in this class > > > > > > > How do we enable this "round-robin topic-to-bundle mapping option" (by > > > namespace policy and broker.conf)? > > > > In broker.conf, a new option called `partitionAssignerClassName` > > > > > Can we apply this option to existing namespaces? (what's the admin > > > operation to enable this option)? > > > > The cluster must ensure that all nodes use the same algorithm. > > Broker-level configuration can be made effective by restarting or admin API > > BrokersBase#updateDynamicConfiguration > > > > > I assume the "round-robin topic-to-bundle mapping option" works with a > > > single partitioned topic, because other topics might show different load > > > per partition. Is this intention? (so users need to ensure not to put > > other > > > topics in the namespace, if this option is configured) > > > > For single-partition topics, since the starting bundle is determined > > using a consistent hash. > > Therefore, single-partition topics will spread out to different bundle as > > much as possible. > > For high load single-partition topics, current algorithms cannot solve > > this problem. > > This PIP cannot solve this problem as well. > > If it just a low load single-partition topic , the impact on the entire > > bundle is very small. > > However, in real scenarios, high-load businesses will share the load > > through multiple partitions. > > > > > Some brokers might have more bundles than other brokers. Do we have > > > different logic for bundle balancing across brokers? or do we rely on the > > > existing assign/unload/split logic to balance bundles among brokers? > > > > In this PIP, we do not involve the mapping between bundles and brokers, > > the existing algorithm works well with this PIP. > > However, we will also contribute our mapping algorithm in the subsequent > > PIP. > > For example: bundles under same namespace can be assigned to broker in a > > round-robin manner. > > > > > > >
Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin
gt; > If this is not a general solution for common scenarios. I support making > > the topic-bundle assigner pluggable without > > introducing the implementation to the Pulsar repo. Users can implement > > their own assigner based on the business > > requirement. Pulsar's general solution may not be good for all scenarios, > > but it is better for scalability (bundle split) > > and enough for most common scenarios. We can keep improving the general > > solution for the general requirement > > for the most common scenarios. > > > > Regards, > > Penghui > > > > > > On Wed, Mar 22, 2023 at 9:52 AM Lin Lin wrote: > > > > > > > > > This appears to be the "round-robin topic-to-bundle mapping" option in > > > > the `fundBundle` function. Is this the only place that needs an update? > > > Can > > > > you list what change is required? > > > > > > In this PIP, we only discuss topic-to-bundle mapping > > > Change is required: > > > 1) > > > When lookup, partitions is assigned to bundle: > > > Lookup -> NamespaceService#getBrokerServiceUrlAsync -> > > > NamespaceService#getBundleAsync -> > > > NamespaceBundles#findBundle > > > Consistent hashing is now used to assign partitions to bundle in > > > NamespaceBundles#findBundle. > > > We should add a configuration item partitionAssignerClassName, so that > > > different partition assignment algorithms can be dynamically configured. > > > The existing algorithm will be used as the default > > > (partitionAssignerClassName=ConsistentHashingPartitionAssigner) > > > 2) > > > Implement a new partition assignment class RoundRobinPartitionAssigner. > > > New partition assignments will be implemented in this class > > > > > > > > > > How do we enable this "round-robin topic-to-bundle mapping option" (by > > > > namespace policy and broker.conf)? > > > > > > In broker.conf, a new option called `partitionAssignerClassName` > > > > > > > Can we apply this option to existing namespaces? (what's the admin > > > > operation to enable this option)? > > > > > > The cluster must ensure that all nodes use the same algorithm. > > > Broker-level configuration can be made effective by restarting or admin > > API > > > BrokersBase#updateDynamicConfiguration > > > > > > > I assume the "round-robin topic-to-bundle mapping option" works with a > > > > single partitioned topic, because other topics might show different > > load > > > > per partition. Is this intention? (so users need to ensure not to put > > > other > > > > topics in the namespace, if this option is configured) > > > > > > For single-partition topics, since the starting bundle is determined > > > using a consistent hash. > > > Therefore, single-partition topics will spread out to different bundle > > as > > > much as possible. > > > For high load single-partition topics, current algorithms cannot solve > > > this problem. > > > This PIP cannot solve this problem as well. > > > If it just a low load single-partition topic , the impact on the entire > > > bundle is very small. > > > However, in real scenarios, high-load businesses will share the load > > > through multiple partitions. > > > > > > > Some brokers might have more bundles than other brokers. Do we have > > > > different logic for bundle balancing across brokers? or do we rely on > > the > > > > existing assign/unload/split logic to balance bundles among brokers? > > > > > > In this PIP, we do not involve the mapping between bundles and brokers, > > > the existing algorithm works well with this PIP. > > > However, we will also contribute our mapping algorithm in the subsequent > > > PIP. > > > For example: bundles under same namespace can be assigned to broker in a > > > round-robin manner. > > > > > > > > > > > >
Re: [DISCUSS] Sorting out pulsar's internal thread pools
This is a good idea. Thanks, Lin Lin On 2023/04/18 02:07:55 mattisonc...@gmail.com wrote: > > Hello, folks. > > I would like to start discussing the pulsar internal thread pool sorting out. > > How did I get this idea? > > Recently, we met some problems with the BK operation timeout. After > investigating, we found an issue that is we share the IO executor(workgroup) > with the Bookkeeper client and internal client and do some other async task > in the dispatcher or somewhere to avoid deadlock. > > But the problem over here. If we use this executor to do some kind of > `blocking`(or spend much time computing. e.g. reply to many delayed messages) > operation, it will block BK clients from sending requests if they are using > the same thread. > > And then, I checked all the usage of the thread pool. We need the rule to > constrain what thread pool we should use. > > What am I expecting? > > I want to collect all the thread pools and define a clear usage guide to > avoid wrong use and improve the fault tolerance(the component problem > shouldn't affect the whole broker) > > > > I need to hear your guy's opinions. Please feel free to leave any questions. > Thanks! > > > Best, > Mattison > > >
Re: Re:Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin
We make this configuration item to be a dynamic configuration. We change change it on broker level. If we can change it on namespace level, even load of bundle in some namespace is balanced, it is still difficult to make broker balance On 2023/04/16 16:07:45 lifepuzzlefun wrote: > I think this feature is very helpful on heavy traffic topic which have > continuous stable load on each partition. > > > Is there a way we can set some kind of namespace policy to set the plugin > PartitionAssigner. Hope this can be set on namespace level, > if this can be achieved, it is more adoptable to try this feature in > production environment. : - ) > > At 2023-04-12 11:24:11, "Lin Lin" wrote: > >As I mentioned in the implementation of PIP, we will plug-in the partition > >assignment strategy. > > > >However, in the same cluster, it is impossible for some Brokers to use > >consistent hashing and some Brokers to use round robin. > > > >On 2023/04/11 07:37:19 Xiangying Meng wrote: > >> Hi Linlin, > >> > This is an incompatible modification, so the entire cluster needs to be > >> upgraded, not just a part of the nodes > >> > >> Appreciate your contribution to the new feature in PIP-255. > >> I have a question regarding the load-balancing aspect of this feature. > >> > >> You mentioned that this is an incompatible modification, > >> and the entire cluster needs to be upgraded, not just a part of the nodes. > >> I was wondering why we can only have one load-balancing strategy. > >> Would it be possible to abstract the logic here and make it an optional > >> choice? > >> This way, we could have multiple load-balancing strategies, > >> such as hash-based, round-robin, etc., available for users to choose from. > >> > >> I'd love to hear your thoughts on this. > >> > >> Best regards, > >> Xiangying > >> > >> On Mon, Apr 10, 2023 at 8:23 PM PengHui Li wrote: > >> > >> > Hi Lin, > >> > > >> > > The load managed by each Bundle is not even. Even if the number of > >> > partitions managed > >> >by each bundle is the same, there is no guarantee that the sum of the > >> > loads of these partitions > >> >will be the same. > >> > > >> > Do we expect that the bundles should have the same loads? The bundle is > >> > the > >> > base unit of the > >> > load balancer, we can set the high watermark of the bundle, e.g., the > >> > maximum topics and throughput. > >> > But the bundle can have different real loads, and if one bundle runs out > >> > of > >> > the high watermark, the bundle > >> > will be split. Users can tune the high watermark to distribute the loads > >> > evenly across brokers. > >> > > >> > For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load > >> > of > >> > a bundle is 5 and 2 brokers. > >> > We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2 > >> > to broker-2. > >> > > >> > Of course, this is the ideal situation. If bundle 0 has been assigned to > >> > broker-0 and bundle 1 has been > >> > assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 > >> > will > >> > go to broker 1. The loads for each > >> > broker are 3 and 7. Dynamic programming can help to find an optimized > >> > solution with more bundle unloads. > >> > > >> > So, should we design the bundle to have even loads? It is difficult to > >> > achieve in reality. And the proposal > >> > said, "Let each bundle carry the same load as possible". Is it the > >> > correct > >> > direction for the load balancer? > >> > > >> > > Doesn't shed loads very well. The existing default policy > >> > ThresholdShedder has a relatively high usage > >> >threshold, and various traffic thresholds need to be set. Many > >> > clusters > >> > with high TPS and small message > >> >bodies may have high CPU but low traffic; And for many small-scale > >> > clusters, the threshold needs to be > >> >modified according to the actual business. > >> > > >> > Can it be resolved by introducing the entry write/read rate to the bundle > >> > stats? > >> > > >> > > The r
Re: [VOTE] Pulsar Manager Release 0.4.0 Candidate 2
+1 binding - Verified shasum, and asc of released files. - Build and run from source with JDK 8 - Test basic CRUD Thanks, Lin Lin On 2023/05/05 11:04:19 tison wrote: > Hi everyone, > > Please review and vote on the release candidate #1 for the version 0.4.0, > as follows: > > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > The complete staging area is available for your review, which includes: > * Release notes > https://github.com/apache/pulsar-manager/releases/tag/v0.4.0-candidate-2 > * The official Apache source and binary distributions to be deployed to > dist.apache.org > * Source code tag "v0.4.0-candidate-2" > > PulsarManager's KEYS file contains PGP keys we used to sign this release: > https://downloads.apache.org/pulsar/KEYS > > Please download these packages and review this release candidate: > > - Review release notes > - Download the source package (verify shasum, and asc) and follow the > instructions to build and run the pulsar-manager front end and back-end > service. > - Download the binary package (verify shasum, and asc) and follow the > instructions to run the pulsar-manager front-end and back-end service. > > The vote will be open for at least 72 hours. It is adopted by majority > approval, with at least 3 PMC affirmative votes. > > Source and binary files: > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-manager/apache-pulsar-manager-0.4.0/apache-pulsar-manager-0.4.0-bin.tar.gz > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-manager/apache-pulsar-manager-0.4.0/apache-pulsar-manager-0.4.0-src.tar.gz > > Best, > tison. >
Re: [VOTE] Release Apache Pulsar 3.0.11 based on 3.0.11-candidate-1
+1 - Built from source - Checked the signatures of the source and binary release artifacts - Ran pulsar standalone - Checked producer and consumer On 2025/03/31 09:00:49 Lari Hotari wrote: > Hello Apache Pulsar Community, > > This is a call for the vote to release the Apache Pulsar version > 3.0.11 based on 3.0.11-candidate-1. > > Included changes since the previous release: > https://github.com/apache/pulsar/compare/v3.0.10...v3.0.11-candidate-1 > > *** Please download, test and vote on this release. This vote will stay open > for at least 24 hours *** > > Only votes from PMC members are binding, but members of the community are > encouraged to test the release and vote with "(non-binding)". > > Note that we are voting upon the source (tag), binaries are provided for > convenience. > > The release candidate is available at: > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-3.0.11-candidate-1/ > > SHA-512 checksums: > 02770f68a0d07ec23ce1509b5bbdd833a5eed22486daf431c9a12d598fed557bbd1bd537a5e197f05c570130fc7b8710c56891419e97e41d916fc3b7674f060b > apache-pulsar-3.0.11-src.tar.gz > d80c4f6c6b3cfa2d245c19093c00f10dd4b7353e9c264a9ed4c0e0e64a08098626a03ff7d48519fae90c0d708c4f20038e425217723ec69a12bc4e4f0b7d9be6 > apache-pulsar-3.0.11-bin.tar.gz > > Maven staging repo: > https://repository.apache.org/content/repositories/orgapachepulsar-1334 > > The tag to be voted upon: > v3.0.11-candidate-1 (commit a51279eb79abafd7a12d8a82db33f4899e58d1d2) > https://github.com/apache/pulsar/releases/tag/v3.0.11-candidate-1 > > Pulsar's KEYS file containing PGP keys you use to sign the release: > https://downloads.apache.org/pulsar/KEYS > > Docker images: > docker pull lhotari/pulsar:3.0.11-a51279e > https://hub.docker.com/layers/lhotari/pulsar/3.0.11-a51279e/images/sha256-266bb1ab1aa8c347410d4dd195113cdd6d01a8eb29504c5725def16022a34143 > docker pull lhotari/pulsar-all:3.0.11-a51279e > https://hub.docker.com/layers/lhotari/pulsar-all/3.0.11-a51279e/images/sha256-8163f45a1777289ab37367072015a52ecb7efdf2ef6d3491acff6b6da42fc482 > > Please download the source package, and follow the README to build > and run the Pulsar standalone service. > > More advanced release validation instructions can be found at > https://pulsar.apache.org/contribute/validate-release-candidate/ > > Thanks, > > Lari Hotari >
Re: [Discussion] PR #24423 Handling Overloaded Netty Channels in Apache Pulsar
Hi Yubiao: If a sufficient number of PulsarClients are started at one time, the number of channels will also increase. Concurrent requests can still cause Broker OOM. For example, with each channel limited to 100MB, 1 million channels. This is what I meant by the lack of a global perspective. This throttling isn't calculated based on all connections. If it's an direct memory overflow, should we impose limits on direct memory allocation? All components share the same direct memory pool, which is globally counted, this would enable more effective throttling. If it is caused by excessive retries, the client should back off. After this feature added, the client still keep retrying. Although the Broker will not have an OOM, produce and consume will still be affected. The unavailability issue remains. On 2025/07/11 01:50:40 Yubiao Feng wrote: > Hi LinLin > > > We can establish a multi-layered mechanism: > > API layer restrictions: Limiting based on in-flight requests, such as > producer throttling and consumer throttling. > > Memory layer restrictions: Memory pool limitations, blocking new request > processing rather than allocating more memory when the pool is exhausted. > > The PR #24423 happens to be the implementer of layer 2 of your suggested > mechanism. We can separate a PR to do the layer 1 > > > Lacks a global perspective. When the number of clients is sufficiently > high, it can still lead to Out-Of-Memory (OOM) errors. > > As has been answered in the previous text, this fix only focuses on > addressing the OOM caused by the accumulation of a large number of > responses in memory due to the channel granularity being unwritable. > BTW, If you have a large number of clients, you can reduce the backlog of > responses allowed for each channel by adjusting the parameter > `connectionMaxPendingWriteBytes` > > > Mutual interference. It can negatively impact production, consumption, > heartbeat, and other operations. > > As has been answered in the previous text, once the channel is not > writable, all requests that are sent to this channel can not receive > a reply anymore because the response can not be written out. The results > are the same; the clients receive replies delayed. To improve performance, > users should consider using more channels; in other words, they can set a > bigger `connectionsPerBroker` or separate clients. > > > > On Fri, Jul 11, 2025 at 8:54 AM Lin Lin wrote: > > > Connection-level rate limiting has the following drawbacks: > > > > Lacks a global perspective. When the number of clients is sufficiently > > high, it can still lead to Out-Of-Memory (OOM) errors. > > > > Mutual interference. It can negatively impact production, consumption, > > heartbeat, and other operations. > > > > Design consistency. When should rate limiting be TCP-based, and when > > should it be based on a RateLimiter? > > > > We can establish a multi-layered mechanism: > > > > API layer restrictions: Limiting based on in-flight requests, such as > > producer throttling and consumer throttling. > > > > Memory layer restrictions: Memory pool limitations, blocking new request > > processing rather than allocating more memory when the pool is exhausted. > > > > On 2025/07/08 06:07:10 Yubiao Feng wrote: > > > Hi all > > > > > > I want to satrt a discussion, which relates to the PR. #24423: Handling > > > Overloaded Netty Channels in Apache Pulsar > > > > > > Problem Statement > > > We've encountered a critical issue in our Apache Pulsar clusters where > > > brokers experience Out-Of-Memory (OOM) errors and continuous restarts > > under > > > specific load patterns. This occurs when Netty channel write buffers > > become > > > full, leading to a buildup of unacknowledged responses in the broker's > > > memory. > > > > > > Background > > > Our clusters are configured with numerous namespaces, each containing > > > approximately 8,000 to 10,000 topics. Our consumer applications are quite > > > large, with each consumer using a regular expression (regex) pattern to > > > subscribe to all topics within a namespace. > > > > > > The problem manifests particularly during consumer application restarts. > > > When a consumer restarts, it issues a getTopicsOfNamespace request. Due > > to > > > the sheer number of topics, the response size is extremely large. This > > > massive response overwhelms the socket output buffer, causing it to fill > > up > > > rapidly. Consequently, the broker's responses get backlogged i
Re: [Discussion] PR #24423 Handling Overloaded Netty Channels in Apache Pulsar
Connection-level rate limiting has the following drawbacks: Lacks a global perspective. When the number of clients is sufficiently high, it can still lead to Out-Of-Memory (OOM) errors. Mutual interference. It can negatively impact production, consumption, heartbeat, and other operations. Design consistency. When should rate limiting be TCP-based, and when should it be based on a RateLimiter? We can establish a multi-layered mechanism: API layer restrictions: Limiting based on in-flight requests, such as producer throttling and consumer throttling. Memory layer restrictions: Memory pool limitations, blocking new request processing rather than allocating more memory when the pool is exhausted. On 2025/07/08 06:07:10 Yubiao Feng wrote: > Hi all > > I want to satrt a discussion, which relates to the PR. #24423: Handling > Overloaded Netty Channels in Apache Pulsar > > Problem Statement > We've encountered a critical issue in our Apache Pulsar clusters where > brokers experience Out-Of-Memory (OOM) errors and continuous restarts under > specific load patterns. This occurs when Netty channel write buffers become > full, leading to a buildup of unacknowledged responses in the broker's > memory. > > Background > Our clusters are configured with numerous namespaces, each containing > approximately 8,000 to 10,000 topics. Our consumer applications are quite > large, with each consumer using a regular expression (regex) pattern to > subscribe to all topics within a namespace. > > The problem manifests particularly during consumer application restarts. > When a consumer restarts, it issues a getTopicsOfNamespace request. Due to > the sheer number of topics, the response size is extremely large. This > massive response overwhelms the socket output buffer, causing it to fill up > rapidly. Consequently, the broker's responses get backlogged in memory, > eventually leading to the broker's OOM and subsequent restart loop. > > Why "Returning an Error" Is Not a Solution > A common approach to handling overload is to simply return an error when > the broker cannot process a request. However, in this specific scenario, > this solution is ineffective. If a consumer application fails to start due > to an error, it triggers a user pod restart, which then leads to the same > getTopicsOfNamespace request being reissued, resulting in a continuous loop > of errors and restarts. This creates an unrecoverable state for the > consumer application and puts immense pressure on the brokers. > > Proposed Solution and Justification > We believe the solution proposed in > https://github.com/apache/pulsar/pull/24423 is highly suitable for > addressing this issue. The core mechanism introduced in this PR – pausing > acceptance of new requests when a channel cannot handle more output – is > exceptionally reasonable and addresses the root cause of the memory > pressure. > > This approach prevents the broker from accepting new requests when its > write buffers are full, effectively backpressuring the client and > preventing the memory buildup that leads to OOMs. Furthermore, we > anticipate that this mechanism will not significantly increase future > maintenance costs, as it elegantly handles overload scenarios at a > fundamental network layer. > > I invite the community to discuss this solution and its potential benefits > for the overall stability and resilience of Apache Pulsar. > > Thanks > Yubiao Feng > > -- > This email and any attachments are intended solely for the recipient(s) > named above and may contain confidential, privileged, or proprietary > information. If you are not the intended recipient, you are hereby notified > that any disclosure, copying, distribution, or reproduction of this > information is strictly prohibited. If you have received this email in > error, please notify the sender immediately by replying to this email and > delete it from your system. >