BewareMyPower opened a new pull request, #24577:
URL: https://github.com/apache/pulsar/pull/24577

   ### Motivation
   
   I observed a topic has been loaded for more than 30 seconds in production 
environment recently after its namespace bundle ownership was transferred. From 
the logs and heap dump, the conclusion is that the topic has been stuck before 
calling `asyncOpen` in `createPersistentTopic0`.
   
   After revisiting the long code path of topic loading, I found many places 
are not efficient, and the existing `pulsar_topic_load_times` metric is 
incorrect. This metric only counts the time from the beginning of 
`createPersistentTopic0`. However, there are some other time-consuming tasks 
before it's called.
   
   
https://github.com/apache/pulsar/blob/c96f27a4dde9e4211a5b83e91f5e384d7ac8d904/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1171
   
   It waits for topic policies of this topic are available before inserting a 
topic future to `BrokerService#topics`. In extreme cases, there could be many 
concurrent `TopicPoliciesService#getTopicPoliciesAsync` call.
   
   
https://github.com/apache/pulsar/blob/c96f27a4dde9e4211a5b83e91f5e384d7ac8d904/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1670
   
   In `loadOrCreatePersistentTopic`, it will check the ownership via 
`checkTopicNsOwnership` before acquiring the topic load semaphore here: 
https://github.com/apache/pulsar/blob/c96f27a4dde9e4211a5b83e91f5e384d7ac8d904/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1674
   
   This violates the semantics of the `maxConcurrentTopicLoadRequest` config.
   
   After acquiring the semaphore, it calls 
`checkOwnershipAndCreatePersistentTopic`, which validates the ownership again 
via `NamespaceService#isServiceUnitActiveAsync`: 
https://github.com/apache/pulsar/blob/c96f27a4dde9e4211a5b83e91f5e384d7ac8d904/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1739
   
   Actually, the implementation of `isServiceUnitActiveAsync` is exactly the 
same with `checkTopicNsOwnership`, where the only difference is that the 
previous one returns a boolean future, while the latter one returns a failed 
future if it's false.
   
   Even after that, it could fetch the topic policies before 
`createPersistentTopic0`: 
https://github.com/apache/pulsar/blob/c96f27a4dde9e4211a5b83e91f5e384d7ac8d904/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1745
   
   
   ### Modifications
   
   Major changes:
   - Remove the `getTopicPoliciesBypassSystemTopic` call before inserting a 
topic future. This method is only used in `getManagedLedgerConfig`.
   - Remove the duplicated ownership validation before acquiring the topic load 
semaphore
   - Avoid calling `getManagedLedgerConfig` and 
`fetchPartitionedTopicMetadataAsync` repeatedly by executing these tasks before 
other tasks that depend on them.
   - Perform the validations concurrently, including 
`checkMaxTopicsPerNamespace`, `checkTopicAlreadyMigrated`, 
`validateTopicConsistency`.
   
   Though many tasks are metadata store access with `MetadataCache` used, 
executing them concurrently is still more efficient.
   
   For observability,
   - Take all tasks after acquiring the topic load semaphore into consideration 
of the `pulsar_topic_load_times` metric.
   - Add a log for topic policies get latency specifically when loading a 
topic. From my experience, using a reader with `hasMessageAvailable` and 
`readNext` loop could have poor performance when CPU pressure is high. This 
read loop is also too heavy.
   
   Other changes and refactoring:
   - Pass the `TopicName` instance across the whole flow to avoid unnecessary 
conversions between `TopicName` and `String`.
   - Move the `isTransactionInternalName` at the beginning
   - Replace `isServiceUnitActiveAsync` with `checkTopicNsOwnership` and remove 
this method to avoid users using this method, which makes code hard to read.
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: https://github.com/BewareMyPower/pulsar/pull/50
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to