(pulsar-site) branch main updated: Apply #1034 changes to version-4.0.x docs

lhotari Tue, 09 Sep 2025 00:00:28 -0700

This is an automated email from the ASF dual-hosted git repository.

lhotari pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/pulsar-site.git



The following commit(s) were added to refs/heads/main by this push:
     new 7dbb370aa65 Apply #1034 changes to version-4.0.x docs
7dbb370aa65 is described below

commit 7dbb370aa6559b1c382405e00f060b4272645e46
Author: Lari Hotari <[email protected]>
AuthorDate: Tue Sep 9 09:58:59 2025 +0300

    Apply #1034 changes to version-4.0.x docs
---
 .../concepts-architecture-overview.md              |  43 +++++++-
 .../concepts-broker-load-balancing-quick-start.md  |  10 +-
 versioned_docs/version-4.0.x/concepts-clients.md   |  44 ++++++++
 versioned_docs/version-4.0.x/concepts-messaging.md |   3 +
 versioned_docs/version-4.0.x/concepts-overview.md  |   8 +-
 .../version-4.0.x/concepts-replication.md          |   2 +-
 .../version-4.0.x/concepts-tiered-storage.md       | 112 ++++++++++++++++++++-
 .../version-4.0.x/concepts-topic-compaction.md     |  42 ++++++--
 .../version-4.0.x/concepts-transactions.md         | 103 +++++++++++++++++++
 versioned_docs/version-4.0.x/helm-deploy.md        |  14 +--
 10 files changed, 349 insertions(+), 32 deletions(-)

diff --git a/versioned_docs/version-4.0.x/concepts-architecture-overview.md 
b/versioned_docs/version-4.0.x/concepts-architecture-overview.md
index 0b5d40b730d..93d29c13955 100644
--- a/versioned_docs/version-4.0.x/concepts-architecture-overview.md
+++ b/versioned_docs/version-4.0.x/concepts-architecture-overview.md
@@ -9,9 +9,9 @@ At the highest level, a Pulsar instance is composed of one or 
more Pulsar cluste
 
 A Pulsar cluster consists of the following components:
 
-* One or more brokers handles and [load 
balances](administration-load-balance.md) incoming messages from producers, 
dispatches messages to consumers, communicates with the Pulsar configuration 
store to handle various coordination tasks, stores messages in BookKeeper 
instances (aka bookies), relies on a cluster-specific ZooKeeper cluster for 
certain tasks, and more.
+* One or more brokers handles and [load 
balances](administration-load-balance.md) incoming messages from producers, 
dispatches messages to consumers, communicates with the Pulsar metadata store 
to handle various coordination tasks, stores messages in BookKeeper instances 
(aka bookies), and coordinates cluster operations through the metadata store.
 * A BookKeeper cluster consisting of one or more bookies handles [persistent 
storage](#persistent-storage) of messages.
-* A ZooKeeper cluster specific to that cluster handles coordination tasks 
between Pulsar clusters.
+* A metadata store cluster (ZooKeeper, etcd, or other supported backend) 
handles coordination tasks and cluster-specific metadata storage.
 
 The diagram below illustrates a Pulsar cluster:
 
@@ -46,9 +46,36 @@ Clusters can replicate among themselves using 
[geo-replication](concepts-replica
 
 ## Metadata store
 
-The Pulsar metadata store maintains all the metadata of a Pulsar cluster, such 
as topic metadata, schema, broker load data, and so on. Pulsar uses [Apache 
ZooKeeper](https://zookeeper.apache.org/) for metadata storage, cluster 
configuration, and coordination. The Pulsar metadata store can be deployed on a 
separate ZooKeeper cluster or deployed on an existing ZooKeeper cluster. You 
can use one ZooKeeper cluster for both Pulsar metadata store and BookKeeper 
metadata store. If you want to d [...]
+The Pulsar metadata store maintains all the metadata of a Pulsar cluster, such 
as topic metadata, schema, broker load data, and so on. Pulsar supports 
multiple metadata store backends to provide flexibility in deployment 
architectures and operational requirements:
 
-> Pulsar also supports more metadata backend services, including 
[etcd](https://etcd.io/) and [RocksDB](http://rocksdb.org/) (for standalone 
Pulsar only).
+### Supported Metadata Store Backends
+
+- **[Apache ZooKeeper](https://zookeeper.apache.org/)** - Default option, 
production-ready metadata store with strong consistency guarantees.
+- **[etcd](https://etcd.io/)** - Cloud-native distributed key-value store, 
ideal for Kubernetes environments and cloud deployments.
+- **[RocksDB](http://rocksdb.org/)** - Embedded key-value store for standalone 
Pulsar deployments, eliminating the need for external coordination services.
+- **[Oxia](https://github.com/oxia-db/oxia/)** - A robust, scalable metadata 
store and coordination system designed for large-scale distributed systems, 
with built-in support for stream index storage to optimize real-time data 
management.
+
+### Configuration
+
+You can configure the metadata store using the `metadataStoreUrl` parameter:
+
+```bash
+# ZooKeeper
+metadataStoreUrl=zk:my-zk-1:2181,my-zk-2:2181,my-zk-3:2181
+
+# etcd
+metadataStoreUrl=etcd:my-etcd-1:2379,my-etcd-2:2379,my-etcd-3:2379
+
+# RocksDB (standalone)
+metadataStoreUrl=rocksdb:///path/to/data
+
+# Oxia
+metadataStoreUrl=oxia:oxia-server:6648
+```
+
+### Deployment Considerations
+
+The Pulsar metadata store can be deployed on a separate cluster or integrated 
with existing infrastructure. You can use one ZooKeeper cluster for both Pulsar 
metadata store and BookKeeper metadata store. If you want to deploy Pulsar 
brokers connected to an existing BookKeeper cluster, you need to deploy 
separate clusters for Pulsar metadata store and BookKeeper metadata store 
respectively.
 
 In a Pulsar instance:
 
@@ -125,13 +152,19 @@ The **Pulsar proxy** provides a solution to this problem 
by acting as a single g
 
 > For the sake of performance and fault tolerance, you can run as many 
 > instances of the Pulsar proxy as you'd like.
 
-Architecturally, the Pulsar proxy gets all the information it requires from 
ZooKeeper. When starting the proxy on a machine, you only need to provide 
metadata store connection strings for the cluster-specific and instance-wide 
configuration store clusters. Here's an example:
+Architecturally, the Pulsar proxy gets all the information it requires from 
the metadata store. When starting the proxy on a machine, you only need to 
provide metadata store connection strings for the cluster-specific and 
instance-wide configuration store clusters. Here's an example:
 
 ```bash
 cd /path/to/pulsar/directory
+# Using ZooKeeper
 bin/pulsar proxy \
     --metadata-store zk:my-zk-1:2181,my-zk-2:2181,my-zk-3:2181 \
     --configuration-metadata-store zk:my-zk-1:2181,my-zk-2:2181,my-zk-3:2181
+
+# Using etcd
+bin/pulsar proxy \
+    --metadata-store etcd:my-etcd-1:2379,my-etcd-2:2379 \
+    --configuration-metadata-store etcd:my-etcd-1:2379,my-etcd-2:2379
 ```
 
 > #### Pulsar proxy docs
diff --git 
a/versioned_docs/version-4.0.x/concepts-broker-load-balancing-quick-start.md 
b/versioned_docs/version-4.0.x/concepts-broker-load-balancing-quick-start.md
index db45b6a129a..d9e03007cc1 100644
--- a/versioned_docs/version-4.0.x/concepts-broker-load-balancing-quick-start.md
+++ b/versioned_docs/version-4.0.x/concepts-broker-load-balancing-quick-start.md
@@ -37,7 +37,7 @@ networks:
 services:
   # Start ZooKeeper
   zookeeper:
-    image: apachepulsar/pulsar:3.0.1
+    image: apachepulsar/pulsar:latest
     container_name: zookeeper
     restart: on-failure
     networks:
@@ -61,7 +61,7 @@ services:
   pulsar-init:
     container_name: pulsar-init
     hostname: pulsar-init
-    image: apachepulsar/pulsar:3.0.1
+    image: apachepulsar/pulsar:latest
     networks:
       - pulsar
     command: |
@@ -77,7 +77,7 @@ services:
 
   # Start bookie
   bookie:
-    image: apachepulsar/pulsar:3.0.1
+    image: apachepulsar/pulsar:latest
     container_name: bookie
     restart: on-failure
     networks:
@@ -99,7 +99,7 @@ services:
 
   # Start broker 1
   broker-1:
-    image: apachepulsar/pulsar:3.0.1
+    image: apachepulsar/pulsar:latest
     container_name: broker-1
     hostname: broker-1
     restart: on-failure
@@ -131,7 +131,7 @@ services:
 
   # Start broker 2
   broker-2:
-    image: apachepulsar/pulsar:3.0.1
+    image: apachepulsar/pulsar:latest
     container_name: broker-2
     hostname: broker-2
     restart: on-failure
diff --git a/versioned_docs/version-4.0.x/concepts-clients.md 
b/versioned_docs/version-4.0.x/concepts-clients.md
index 735d3b4b188..5ba436849d9 100644
--- a/versioned_docs/version-4.0.x/concepts-clients.md
+++ b/versioned_docs/version-4.0.x/concepts-clients.md
@@ -117,3 +117,47 @@ Each TableView uses one Reader instance per partition, and 
reads the topic start
 The following figure illustrates the dynamic construction of a TableView 
updated with newer values of each key.
 
 ![Dynamic construction of a TableView in Pulsar](/assets/tableview.png)
+
+## Transactions
+
+Pulsar clients support transactions that enable atomic operations across 
multiple topics and partitions. Transactions provide exactly-once semantics and 
ensure that either all operations within a transaction succeed or fail together.
+
+With transactions, Pulsar clients can:
+
+* **Atomic message production**: Produce messages to multiple topics 
atomically within a transaction boundary.
+* **Atomic message acknowledgment**: Acknowledge messages within transaction 
boundaries, ensuring processed messages are only committed when the transaction 
succeeds.
+* **Cross-topic operations**: Perform operations spanning multiple topics as 
part of a single atomic transaction.
+
+### Transaction workflow
+
+1. **Begin transaction**: Create a new transaction with configurable timeout.
+2. **Perform operations**: Send messages and acknowledge consumed messages 
within the transaction context.
+3. **Commit or abort**: Either commit the transaction (making all operations 
permanent) or abort it (rolling back all operations).
+
+Example transaction usage:
+
+```java
+// Create a transaction
+Transaction txn = client.newTransaction()
+    .withTransactionTimeout(1, TimeUnit.MINUTES)
+    .build().get();
+
+try {
+    // Send messages within transaction
+    producer.newMessage(txn).value("message-1").send();
+    producer.newMessage(txn).value("message-2").send();
+    
+    // Acknowledge messages within transaction  
+    consumer.acknowledgeAsync(messageId, txn);
+    
+    // Commit transaction
+    txn.commit().get();
+} catch (Exception e) {
+    // Abort transaction on error
+    txn.abort().get();
+}
+```
+
+Transactions are particularly useful for building exactly-once processing 
pipelines, ensuring data consistency across multiple Pulsar topics, and 
implementing complex event processing patterns.
+
+For more details, see [Pulsar transactions](concepts-transactions.md).
diff --git a/versioned_docs/version-4.0.x/concepts-messaging.md 
b/versioned_docs/version-4.0.x/concepts-messaging.md
index fe7c2106d45..223b03d8b52 100644
--- a/versioned_docs/version-4.0.x/concepts-messaging.md
+++ b/versioned_docs/version-4.0.x/concepts-messaging.md
@@ -1417,6 +1417,9 @@ delayedDeliveryTickTimeMillis=1000
 # has passed, and they may be as late as the deliverAt time plus the 
tickTimeMillis for the topic plus the
 # delayedDeliveryTickTimeMillis.
 isDelayedDeliveryDeliverAtTimeStrict=false
+
+# Maximum number of delayed messages per dispatcher. Once this limit is 
reached, no more delayed messages are allowed.
+maxNumDelayedDeliveryTrackerMemoryEntries=100000
 ```
 
 ### Producer
diff --git a/versioned_docs/version-4.0.x/concepts-overview.md 
b/versioned_docs/version-4.0.x/concepts-overview.md
index 135d3f00365..8cde9bb68db 100644
--- a/versioned_docs/version-4.0.x/concepts-overview.md
+++ b/versioned_docs/version-4.0.x/concepts-overview.md
@@ -14,12 +14,14 @@ Key features of Pulsar are listed below:
 * Native support for multiple clusters in a Pulsar instance, with seamless 
[geo-replication](administration-geo.md) of messages across clusters.
 * Very low publish and end-to-end latency.
 * Seamless scalability to over a million topics.
-* A simple [client API](concepts-clients.md) with bindings for 
[Java](client-libraries-java.md), [Go](client-libraries-go.md), 
[Python](client-libraries-python.md) and [C++](client-libraries-cpp.md).
-* Multiple [subscription types](concepts-messaging.md#subscription-types) 
([exclusive](concepts-messaging.md#exclusive), 
[shared](concepts-messaging.md#shared), and 
[failover](concepts-messaging.md#failover)) for topics.
+* A simple [client API](concepts-clients.md) with bindings for 
[Java](client-libraries-java.md), [Go](client-libraries-go.md), 
[Python](client-libraries-python.md), [C++](client-libraries-cpp.md), 
[C#/.NET](client-libraries-dotnet.md), [Node.js](client-libraries-node.md), and 
[WebSocket](client-libraries-websocket.md).
+* Multiple [subscription types](concepts-messaging.md#subscription-types) 
([exclusive](concepts-messaging.md#exclusive), 
[shared](concepts-messaging.md#shared), 
[failover](concepts-messaging.md#failover), and 
[key_shared](concepts-messaging.md#key_shared)) for topics.
 * Guaranteed message delivery with [persistent message 
storage](concepts-architecture-overview.md#persistent-storage) provided by 
[Apache BookKeeper](http://bookkeeper.apache.org/).
-A serverless lightweight computing framework [Pulsar 
Functions](functions-overview.md) offers the capability for stream-native data 
processing.
+* A serverless lightweight computing framework [Pulsar 
Functions](functions-overview.md) offers the capability for stream-native data 
processing.
 * A serverless connector framework [Pulsar IO](io-overview.md), which is built 
on Pulsar Functions, makes it easier to move data in and out of Apache Pulsar.
 * [Tiered Storage](tiered-storage-overview.md) offloads data from hot/warm 
storage to cold/long-term storage (such as S3 and GCS) when the data is aging 
out.
+* Native support for [transactions](concepts-transactions.md) enabling atomic 
operations across topics and partitions.
+* Flexible [authentication and authorization](concepts-authentication.md) with 
support for multiple providers including OAuth/OIDC.
 
 ## Contents
 
diff --git a/versioned_docs/version-4.0.x/concepts-replication.md 
b/versioned_docs/version-4.0.x/concepts-replication.md
index 934659198fd..2916b6099e4 100644
--- a/versioned_docs/version-4.0.x/concepts-replication.md
+++ b/versioned_docs/version-4.0.x/concepts-replication.md
@@ -33,7 +33,7 @@ In synchronous geo-replication, data is synchronously 
replicated to multiple dat
 
 ![Example of synchronous geo-replication mechanism in 
Pulsar](/assets/geo-replication-sync.svg)
 
-Synchronous geo-replication in Pulsar is achieved by BookKeeper. A synchronous 
geo-replicated cluster consists of a cluster of bookies and a cluster of 
brokers that run in multiple data centers, and a global Zookeeper installation 
(a ZooKeeper ensemble is running across multiple data centers). You need to 
configure a BookKeeper region-aware placement policy to store data across 
multiple data centers and guarantee availability constraints on writes.
+Synchronous geo-replication in Pulsar is achieved by BookKeeper. A synchronous 
geo-replicated cluster consists of a cluster of bookies and a cluster of 
brokers that run in multiple data centers, and a global metadata store 
installation (such as ZooKeeper, etcd, or other supported metadata stores 
running across multiple data centers). You need to configure a BookKeeper 
region-aware placement policy to store data across multiple data centers and 
guarantee availability constraints on writes.
 
 Synchronous geo-replication provides the highest availability and also 
guarantees stronger data consistency between different data centers. However, 
your applications have to pay an extra latency penalty across data centers.
 
diff --git a/versioned_docs/version-4.0.x/concepts-tiered-storage.md 
b/versioned_docs/version-4.0.x/concepts-tiered-storage.md
index 559410ef12a..3ca528fa203 100644
--- a/versioned_docs/version-4.0.x/concepts-tiered-storage.md
+++ b/versioned_docs/version-4.0.x/concepts-tiered-storage.md
@@ -10,8 +10,116 @@ One way to alleviate this cost is to use Tiered Storage. 
With tiered storage, ol
 
 ![Tiered Storage](/assets/pulsar-tiered-storage.png)
 
+## How Tiered Storage Works
+
+Tiered storage leverages Pulsar's segment-based architecture where data is 
stored in immutable segments (ledgers) in BookKeeper. When segments are sealed 
and become read-only, they can be safely offloaded to external storage systems.
+
+### Offloading Process
+
+1. **Segment Sealing**: When a BookKeeper ledger is closed (due to size 
limits, time limits, or manual triggers), it becomes immutable.
+2. **Eligibility Check**: The broker determines which segments are eligible 
for offloading based on configured policies.
+3. **Data Transfer**: Eligible segments are copied to the configured long-term 
storage backend.
+4. **Metadata Update**: BookKeeper metadata is updated to reference the 
external storage location.
+5. **Local Deletion**: After a configurable delay (default: 4 hours), the 
original data is deleted from BookKeeper.
+
+### Transparent Access
+
+Consumers can access offloaded data transparently. When a consumer requests 
data that has been offloaded:
+- The broker checks BookKeeper metadata to determine the storage location
+- If data is in external storage, the broker retrieves it seamlessly
+- The consumer receives the data as if it were still in BookKeeper
+
+## Supported Storage Backends
+
+Pulsar supports multiple storage backends for tiered storage:
+
+### Cloud Storage Providers
+
+- **Amazon S3**: Industry-standard object storage with multiple storage classes
+- **Google Cloud Storage (GCS)**: Google's object storage with lifecycle 
management
+- **Microsoft Azure Blob Storage**: Azure's object storage solution
+- **Alibaba Cloud OSS**: Alibaba's object storage service
+
+### On-Premises Solutions
+
+- **Filesystem**: Local or network-attached storage for on-premises deployments
+- **S3-Compatible Storage**: MinIO, Ceph, and other S3-compatible solutions
+
+### Storage Classes and Cost Optimization
+
+Different storage backends offer various storage classes:
+- **Hot storage**: Immediate access, higher cost (S3 Standard, GCS Standard)
+- **Cool storage**: Infrequent access, lower cost (S3 IA, GCS Nearline)
+- **Cold storage**: Archive storage, lowest cost (S3 Glacier, GCS Coldline)
+
+## Configuration and Policies
+
+### Offloading Triggers
+
+Tiered storage can be triggered by:
+
+1. **Size-based policies**: Offload when topic backlog exceeds a certain size
+2. **Time-based policies**: Offload data older than a specified age
+3. **Manual triggers**: Administrative commands via REST API or CLI
+4. **Namespace-level policies**: Automatic offloading based on namespace 
configuration
+
+### Key Configuration Parameters
+
+- **Offload threshold**: Minimum backlog size before offloading begins
+- **Offload deletion lag**: Delay before deleting data from BookKeeper after 
offloading
+- **Max block size**: Maximum size of data blocks uploaded to external storage
+- **Read buffer size**: Buffer size for reading offloaded data
+- **Offload driver**: Backend storage driver configuration
+
+## Performance Considerations
+
+### Read Performance
+
+- **First access**: Reading offloaded data is slower than BookKeeper due to 
network latency
+- **Caching**: Some implementations cache frequently accessed offloaded data
+- **Prefetching**: Brokers may prefetch data based on access patterns
+
+### Write Performance
+
+- **Asynchronous offloading**: Offloading occurs in the background without 
affecting write performance
+- **Parallel transfers**: Multiple segments can be offloaded concurrently
+- **Bandwidth management**: Configurable limits to prevent offloading from 
overwhelming network resources
+
+### Cost Benefits
+
 > Data written to BookKeeper is replicated to 3 physical machines by default. 
 > However, once a segment is sealed in BookKeeper it becomes immutable and can 
 > be copied to long term storage. Long term storage can achieve cost savings 
 > by using mechanisms such as [Reed-Solomon error 
 > correction](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)
 >  to require fewer physical copies of data.
 
-Pulsar currently supports S3, Google Cloud Storage (GCS), and filesystem for 
[long-term storage](cookbooks-tiered-storage.md). Offloading to long-term 
storage is triggered via a Rest API or command line interface. The user passes 
in the amount of topic data they wish to retain on BookKeeper, and the broker 
will copy the backlog data to long-term storage. The original data will then be 
deleted from BookKeeper after a configured delay (4 hours by default).
+Cost savings come from:
+- **Reduced replication**: External storage typically requires fewer replicas
+- **Storage class optimization**: Use cheaper storage classes for older data
+- **Operational efficiency**: Reduced BookKeeper cluster storage requirements
+
+## Use Cases and Benefits
+
+### Long-term Data Retention
+
+- **Compliance requirements**: Meet regulatory requirements for data retention
+- **Historical analysis**: Maintain access to historical data for analytics
+- **Audit trails**: Preserve message history for compliance and auditing
+
+### Cost Optimization
+
+- **Storage cost reduction**: Move cold data to cheaper storage tiers
+- **Operational efficiency**: Reduce BookKeeper storage requirements
+- **Elastic scaling**: Handle varying data volumes without over-provisioning
+
+## Management and Operations
+
+### Monitoring
+
+- **Offloading metrics**: Track offloading progress and performance
+- **Storage usage**: Monitor storage consumption across tiers
+- **Access patterns**: Analyze data access patterns for optimization
+
+### Automation
+
+- **Policy-based management**: Automated offloading based on predefined 
policies
+- **Lifecycle management**: Integration with cloud provider lifecycle policies
+- **Alerting**: Notifications for offloading failures or threshold breaches
 
-> For a guide for setting up tiered storage, see the [Tiered storage 
cookbook](cookbooks-tiered-storage.md).
+> For detailed setup instructions and configuration examples, see the [Tiered 
storage cookbook](cookbooks-tiered-storage.md).
diff --git a/versioned_docs/version-4.0.x/concepts-topic-compaction.md 
b/versioned_docs/version-4.0.x/concepts-topic-compaction.md
index 5dfc7396df1..6075a97d810 100644
--- a/versioned_docs/version-4.0.x/concepts-topic-compaction.md
+++ b/versioned_docs/version-4.0.x/concepts-topic-compaction.md
@@ -32,17 +32,17 @@ When topic compaction is triggered [via the 
CLI](cookbooks-compaction.md), it wo
 
 2. After that, the broker will create a new [BookKeeper 
ledger](concepts-architecture-overview.md#ledgers) and make a second iteration 
through each message on the topic. For each message:
 
-    - If the key matches the latest occurrence of that key, then the key's 
data payload, message ID, and metadata will be written to the newly created 
ledger. 
-  
-    - If the key doesn't match the latest then the message will be skipped and 
left alone. 
-  
-    - If any given message has an empty payload, it will be skipped and 
considered deleted (akin to the concept of 
[tombstones](https://en.wikipedia.org/wiki/Tombstone_(data_store)) in key-value 
databases). 
-  
-3. At the end of this second iteration through the topic, the newly created 
BookKeeper ledger is closed and two things are written to the topic's metadata: 
+    - If the key matches the latest occurrence of that key, then the key's 
data payload, message ID, and metadata will be written to the newly created 
ledger.
+
+    - If the key doesn't match the latest then the message will be skipped and 
left alone.
+
+    - If any given message has an empty payload, it will be skipped and 
considered deleted (akin to the concept of 
[tombstones](https://en.wikipedia.org/wiki/Tombstone_(data_store)) in key-value 
databases).
+
+3. At the end of this second iteration through the topic, the newly created 
BookKeeper ledger is closed and two things are written to the topic's metadata:
 
     - The ID of the BookKeeper ledger
-    - The message ID of the last compacted message (this is known as the 
**compaction horizon** of the topic). 
-  
+    - The message ID of the last compacted message (this is known as the 
**compaction horizon** of the topic).
+
   Once this metadata is written compaction is complete.
 
 4. After the initial compaction operation, the Pulsar 
[broker](concepts-architecture-overview.md#brokers) that owns the topic is 
notified whenever any future changes are made to the compaction horizon and 
compacted backlog. When such changes occur:
@@ -51,4 +51,28 @@ When topic compaction is triggered [via the 
CLI](cookbooks-compaction.md), it wo
       * Read from the topic like normal (if the message ID is greater than or 
equal to the compaction horizon) or
       * Read beginning at the compaction horizon (if the message ID is lower 
than the compaction horizon)
 
+## Compaction Configuration
+
+Topic compaction behavior can be configured through various broker settings:
+
+### Key Configuration Parameters
+
+- **`compactionRetainNullKey`**: Controls whether null keys are retained 
during compaction. When set to `true`, messages with null keys are preserved in 
the compacted ledger. When `false` (default), messages with null keys are 
treated as non-key messages and may be removed during compaction.
+
+- **`brokerServiceCompactionThreshold`**: The threshold size (in bytes) that 
triggers automatic compaction for a topic's backlog. When the topic's backlog 
exceeds this size, compaction will be triggered automatically.
+
+### Null Key Handling
+
+The `compactionRetainNullKey` parameter is particularly important for topics 
that contain messages without keys. This configuration determines how the 
compaction process handles such messages:
+
+- **Enabled (`true`)**: Messages with null keys are preserved during 
compaction, ensuring that unkeyed messages remain accessible in the compacted 
view.
+- **Disabled (`false`)**: Messages with null keys may be removed during 
compaction, as they cannot be properly deduplicated without a key.
+
+This setting is useful for topics that mix keyed and unkeyed messages, 
allowing administrators to control whether unkeyed messages should be retained 
in the compacted topic.
+
+### Performance Considerations
+
+- **Compaction frequency**: Balance between storage savings and CPU/I/O 
overhead
+- **Topic size**: Larger topics take longer to compact but may benefit more 
from compaction
+- **Key distribution**: Topics with many unique keys benefit less from 
compaction
 
diff --git a/versioned_docs/version-4.0.x/concepts-transactions.md 
b/versioned_docs/version-4.0.x/concepts-transactions.md
index 808eeccd08f..838981984a3 100644
--- a/versioned_docs/version-4.0.x/concepts-transactions.md
+++ b/versioned_docs/version-4.0.x/concepts-transactions.md
@@ -27,3 +27,106 @@ Messages produced within a transaction are stored in the 
transaction buffer. The
 Message acknowledges within a transaction are maintained by the pending 
acknowledge state before the transaction completes. If a message is in the 
pending acknowledge state, the message cannot be acknowledged by other 
transactions until the message is removed from the pending acknowledge state.
 
 The pending acknowledge state is persisted in the pending acknowledge log. The 
pending acknowledge log is backed by a Pulsar topic. A new broker can restore 
the state from the pending acknowledge log to ensure the acknowledgment is not 
lost.
+
+## Performance Optimizations
+
+### Transaction Log Batching
+
+Pulsar supports batched writing to transaction logs to improve performance and 
reduce the overhead of maintaining transaction state. When enabled, multiple 
transaction log entries are batched together before writing to storage, 
reducing I/O operations and improving throughput.
+
+**Key configuration parameters:**
+- `transactionLogBatchedWriteEnabled`: Enable batched writing for transaction 
logs
+- `transactionLogBatchedWriteMaxRecords`: Maximum number of records in a batch
+- `transactionLogBatchedWriteMaxSize`: Maximum size of a batch in bytes
+- `transactionLogBatchedWriteMaxDelayInMillis`: Maximum delay before flushing 
a batch
+
+### Pending Acknowledgment Batching
+
+To improve performance when handling large numbers of pending acknowledgments, 
Pulsar supports batching of pending acknowledgment operations. This reduces the 
overhead of maintaining pending ack state for high-throughput transactional 
workloads.
+
+### Segmented Transaction Buffer Snapshots
+
+For handling large numbers of aborted transactions efficiently, Pulsar 
implements segmented snapshot functionality. This feature helps manage 
transaction buffer snapshots more effectively when dealing with scenarios 
involving many aborted transactions.
+
+**Benefits:**
+- Improved memory management for transaction buffers
+- Better handling of abort scenarios with large transaction volumes
+- Reduced snapshot overhead for transaction recovery
+
+### Transaction Buffer Performance Tuning
+
+Recent improvements include enhanced transaction buffer configurations for 
performance tuning:
+
+- **Buffer size optimization**: Configurable buffer sizes for different 
workload patterns
+- **Batch processing**: Improved batching within transaction buffers
+- **Memory management**: Better memory allocation strategies for transaction 
data
+
+## Transaction Isolation and Consistency
+
+### Read Committed Isolation
+
+Pulsar transactions provide **read committed** isolation level, ensuring that:
+- Consumers only see messages from committed transactions
+- Uncommitted messages remain invisible until transaction commit
+- Aborted transactions have their messages discarded automatically
+
+### Cross-Partition Consistency
+
+Transactions in Pulsar can span multiple topics and partitions while 
maintaining consistency:
+- **Atomic commits**: All operations within a transaction succeed or fail 
together
+- **Coordinator-managed state**: Transaction coordinator ensures consistent 
state across partitions
+- **Failure recovery**: System can recover to consistent state after 
coordinator failures
+
+## Transaction Timeouts and Recovery
+
+### Timeout Management
+
+The transaction coordinator handles transaction timeouts to prevent 
indefinitely hanging transactions:
+- **Configurable timeouts**: Set appropriate timeout values for different use 
cases
+- **Automatic abort**: Transactions are automatically aborted when they exceed 
timeout
+- **Resource cleanup**: Timed-out transactions have their resources cleaned up 
automatically
+
+### Coordinator Recovery
+
+When a transaction coordinator fails, recovery mechanisms ensure transaction 
consistency:
+- **State restoration**: Transaction state is restored from transaction logs
+- **Pending transaction handling**: In-progress transactions are properly 
handled during recovery
+- **Metadata consistency**: Transaction metadata remains consistent across 
coordinator restarts
+
+## Configuration and Best Practices
+
+### Key Configuration Parameters
+
+#### Core Transaction Settings
+- `transactionCoordinatorEnabled`: Enable transaction coordinator in broker 
(default: `false`)
+- `transactionMetadataStoreProviderClassName`: Transaction metadata store 
provider class (default: 
`org.apache.pulsar.transaction.coordinator.impl.MLTransactionMetadataStoreProvider`)
+- `transactionBufferProviderClassName`: Transaction buffer provider class 
(default: 
`org.apache.pulsar.broker.transaction.buffer.impl.TopicTransactionBufferProvider`)
+- `transactionPendingAckStoreProviderClassName`: Transaction pending ack store 
provider class (default: 
`org.apache.pulsar.broker.transaction.pendingack.impl.MLPendingAckStoreProvider`)
+
+#### Batched Write Settings
+- `transactionLogBatchedWriteEnabled`: Enable batched transaction log writes 
for better efficiency (default: `false`)
+- `transactionLogBatchedWriteMaxRecords`: Maximum log records count in a batch 
(default: `512`)
+- `transactionLogBatchedWriteMaxSize`: Maximum bytes size in a batch (default: 
`4194304` - 4 MB)
+- `transactionLogBatchedWriteMaxDelayInMillis`: Maximum wait time for first 
record in batch (default: `1`)
+- `transactionPendingAckBatchedWriteEnabled`: Enable batched writes for 
pending ack store (default: `false`)
+
+#### Buffer and Snapshot Settings
+- `transactionBufferSegmentedSnapshotEnabled`: Enable segmented buffer 
snapshots for handling large numbers of aborted transactions (default: `false`)
+- `transactionBufferSnapshotMaxTransactionCount`: Take snapshot after this 
many transaction operations (default: `1000`)
+- `transactionBufferSnapshotMinTimeInMillis`: Interval for taking snapshots in 
milliseconds (default: `5000`)
+- `transactionBufferSnapshotSegmentSize`: Size of snapshot segment in bytes 
(default: `262144` - 256 KB)
+
+#### Performance and Limits
+- `maxActiveTransactionsPerCoordinator`: Maximum active transactions per 
coordinator (default: `0` - no limit)
+- `numTransactionReplayThreadPoolSize`: Thread pool size for transaction 
replay (default: number of CPU cores)
+- `transactionBufferClientMaxConcurrentRequests`: Maximum concurrent requests 
for buffer client (default: `1000`)
+- `transactionBufferClientOperationTimeoutInMills`: Buffer client operation 
timeout in milliseconds (default: `3000`)
+
+### Performance Considerations
+
+- **Batch size tuning**: Optimize batch sizes for your workload characteristics
+- **Coordinator placement**: Distribute transaction coordinators appropriately
+- **Resource allocation**: Ensure adequate resources for transaction processing
+- **Monitoring**: Monitor transaction metrics for performance optimization
+
+For detailed configuration and usage examples, see [Pulsar 
transactions](txn-what.md).
diff --git a/versioned_docs/version-4.0.x/helm-deploy.md 
b/versioned_docs/version-4.0.x/helm-deploy.md
index 5be5ec2ac68..2aaf7cfce6d 100644
--- a/versioned_docs/version-4.0.x/helm-deploy.md
+++ b/versioned_docs/version-4.0.x/helm-deploy.md
@@ -134,27 +134,27 @@ The Pulsar Helm Chart is designed to enable controlled 
upgrades. So it can confi
 images:
   zookeeper:
     repository: apachepulsar/pulsar-all
-    tag: @pulsar:version@
+    tag: latest
     pullPolicy: IfNotPresent
   bookie:
     repository: apachepulsar/pulsar-all
-    tag: @pulsar:version@
+    tag: latest
     pullPolicy: IfNotPresent
   autorecovery:
     repository: apachepulsar/pulsar-all
-    tag: @pulsar:version@
+    tag: latest
     pullPolicy: IfNotPresent
   broker:
     repository: apachepulsar/pulsar-all
-    tag: @pulsar:version@
+    tag: latest
     pullPolicy: IfNotPresent
   proxy:
     repository: apachepulsar/pulsar-all
-    tag: @pulsar:version@
+    tag: latest
     pullPolicy: IfNotPresent
   functions:
     repository: apachepulsar/pulsar-all
-    tag: @pulsar:version@
+    tag: latest
   pulsar_manager:
     repository: apachepulsar/pulsar-manager
     tag: v0.3.0
@@ -172,7 +172,7 @@ pulsar_metadata:
   component: pulsar-init
   image:
     repository: apachepulsar/pulsar-all
-    tag: @pulsar:version@
+    tag: latest
     pullPolicy: IfNotPresent
 ```

(pulsar-site) branch main updated: Apply #1034 changes to version-4.0.x docs

Reply via email to