This is an automated email from the ASF dual-hosted git repository.
lhotari pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/pulsar-site.git
The following commit(s) were added to refs/heads/main by this push:
new 7dbb370aa65 Apply #1034 changes to version-4.0.x docs
7dbb370aa65 is described below
commit 7dbb370aa6559b1c382405e00f060b4272645e46
Author: Lari Hotari <[email protected]>
AuthorDate: Tue Sep 9 09:58:59 2025 +0300
Apply #1034 changes to version-4.0.x docs
---
.../concepts-architecture-overview.md | 43 +++++++-
.../concepts-broker-load-balancing-quick-start.md | 10 +-
versioned_docs/version-4.0.x/concepts-clients.md | 44 ++++++++
versioned_docs/version-4.0.x/concepts-messaging.md | 3 +
versioned_docs/version-4.0.x/concepts-overview.md | 8 +-
.../version-4.0.x/concepts-replication.md | 2 +-
.../version-4.0.x/concepts-tiered-storage.md | 112 ++++++++++++++++++++-
.../version-4.0.x/concepts-topic-compaction.md | 42 ++++++--
.../version-4.0.x/concepts-transactions.md | 103 +++++++++++++++++++
versioned_docs/version-4.0.x/helm-deploy.md | 14 +--
10 files changed, 349 insertions(+), 32 deletions(-)
diff --git a/versioned_docs/version-4.0.x/concepts-architecture-overview.md
b/versioned_docs/version-4.0.x/concepts-architecture-overview.md
index 0b5d40b730d..93d29c13955 100644
--- a/versioned_docs/version-4.0.x/concepts-architecture-overview.md
+++ b/versioned_docs/version-4.0.x/concepts-architecture-overview.md
@@ -9,9 +9,9 @@ At the highest level, a Pulsar instance is composed of one or
more Pulsar cluste
A Pulsar cluster consists of the following components:
-* One or more brokers handles and [load
balances](administration-load-balance.md) incoming messages from producers,
dispatches messages to consumers, communicates with the Pulsar configuration
store to handle various coordination tasks, stores messages in BookKeeper
instances (aka bookies), relies on a cluster-specific ZooKeeper cluster for
certain tasks, and more.
+* One or more brokers handles and [load
balances](administration-load-balance.md) incoming messages from producers,
dispatches messages to consumers, communicates with the Pulsar metadata store
to handle various coordination tasks, stores messages in BookKeeper instances
(aka bookies), and coordinates cluster operations through the metadata store.
* A BookKeeper cluster consisting of one or more bookies handles [persistent
storage](#persistent-storage) of messages.
-* A ZooKeeper cluster specific to that cluster handles coordination tasks
between Pulsar clusters.
+* A metadata store cluster (ZooKeeper, etcd, or other supported backend)
handles coordination tasks and cluster-specific metadata storage.
The diagram below illustrates a Pulsar cluster:
@@ -46,9 +46,36 @@ Clusters can replicate among themselves using
[geo-replication](concepts-replica
## Metadata store
-The Pulsar metadata store maintains all the metadata of a Pulsar cluster, such
as topic metadata, schema, broker load data, and so on. Pulsar uses [Apache
ZooKeeper](https://zookeeper.apache.org/) for metadata storage, cluster
configuration, and coordination. The Pulsar metadata store can be deployed on a
separate ZooKeeper cluster or deployed on an existing ZooKeeper cluster. You
can use one ZooKeeper cluster for both Pulsar metadata store and BookKeeper
metadata store. If you want to d [...]
+The Pulsar metadata store maintains all the metadata of a Pulsar cluster, such
as topic metadata, schema, broker load data, and so on. Pulsar supports
multiple metadata store backends to provide flexibility in deployment
architectures and operational requirements:
-> Pulsar also supports more metadata backend services, including
[etcd](https://etcd.io/) and [RocksDB](http://rocksdb.org/) (for standalone
Pulsar only).
+### Supported Metadata Store Backends
+
+- **[Apache ZooKeeper](https://zookeeper.apache.org/)** - Default option,
production-ready metadata store with strong consistency guarantees.
+- **[etcd](https://etcd.io/)** - Cloud-native distributed key-value store,
ideal for Kubernetes environments and cloud deployments.
+- **[RocksDB](http://rocksdb.org/)** - Embedded key-value store for standalone
Pulsar deployments, eliminating the need for external coordination services.
+- **[Oxia](https://github.com/oxia-db/oxia/)** - A robust, scalable metadata
store and coordination system designed for large-scale distributed systems,
with built-in support for stream index storage to optimize real-time data
management.
+
+### Configuration
+
+You can configure the metadata store using the `metadataStoreUrl` parameter:
+
+```bash
+# ZooKeeper
+metadataStoreUrl=zk:my-zk-1:2181,my-zk-2:2181,my-zk-3:2181
+
+# etcd
+metadataStoreUrl=etcd:my-etcd-1:2379,my-etcd-2:2379,my-etcd-3:2379
+
+# RocksDB (standalone)
+metadataStoreUrl=rocksdb:///path/to/data
+
+# Oxia
+metadataStoreUrl=oxia:oxia-server:6648
+```
+
+### Deployment Considerations
+
+The Pulsar metadata store can be deployed on a separate cluster or integrated
with existing infrastructure. You can use one ZooKeeper cluster for both Pulsar
metadata store and BookKeeper metadata store. If you want to deploy Pulsar
brokers connected to an existing BookKeeper cluster, you need to deploy
separate clusters for Pulsar metadata store and BookKeeper metadata store
respectively.
In a Pulsar instance:
@@ -125,13 +152,19 @@ The **Pulsar proxy** provides a solution to this problem
by acting as a single g
> For the sake of performance and fault tolerance, you can run as many
> instances of the Pulsar proxy as you'd like.
-Architecturally, the Pulsar proxy gets all the information it requires from
ZooKeeper. When starting the proxy on a machine, you only need to provide
metadata store connection strings for the cluster-specific and instance-wide
configuration store clusters. Here's an example:
+Architecturally, the Pulsar proxy gets all the information it requires from
the metadata store. When starting the proxy on a machine, you only need to
provide metadata store connection strings for the cluster-specific and
instance-wide configuration store clusters. Here's an example:
```bash
cd /path/to/pulsar/directory
+# Using ZooKeeper
bin/pulsar proxy \
--metadata-store zk:my-zk-1:2181,my-zk-2:2181,my-zk-3:2181 \
--configuration-metadata-store zk:my-zk-1:2181,my-zk-2:2181,my-zk-3:2181
+
+# Using etcd
+bin/pulsar proxy \
+ --metadata-store etcd:my-etcd-1:2379,my-etcd-2:2379 \
+ --configuration-metadata-store etcd:my-etcd-1:2379,my-etcd-2:2379
```
> #### Pulsar proxy docs
diff --git
a/versioned_docs/version-4.0.x/concepts-broker-load-balancing-quick-start.md
b/versioned_docs/version-4.0.x/concepts-broker-load-balancing-quick-start.md
index db45b6a129a..d9e03007cc1 100644
--- a/versioned_docs/version-4.0.x/concepts-broker-load-balancing-quick-start.md
+++ b/versioned_docs/version-4.0.x/concepts-broker-load-balancing-quick-start.md
@@ -37,7 +37,7 @@ networks:
services:
# Start ZooKeeper
zookeeper:
- image: apachepulsar/pulsar:3.0.1
+ image: apachepulsar/pulsar:latest
container_name: zookeeper
restart: on-failure
networks:
@@ -61,7 +61,7 @@ services:
pulsar-init:
container_name: pulsar-init
hostname: pulsar-init
- image: apachepulsar/pulsar:3.0.1
+ image: apachepulsar/pulsar:latest
networks:
- pulsar
command: |
@@ -77,7 +77,7 @@ services:
# Start bookie
bookie:
- image: apachepulsar/pulsar:3.0.1
+ image: apachepulsar/pulsar:latest
container_name: bookie
restart: on-failure
networks:
@@ -99,7 +99,7 @@ services:
# Start broker 1
broker-1:
- image: apachepulsar/pulsar:3.0.1
+ image: apachepulsar/pulsar:latest
container_name: broker-1
hostname: broker-1
restart: on-failure
@@ -131,7 +131,7 @@ services:
# Start broker 2
broker-2:
- image: apachepulsar/pulsar:3.0.1
+ image: apachepulsar/pulsar:latest
container_name: broker-2
hostname: broker-2
restart: on-failure
diff --git a/versioned_docs/version-4.0.x/concepts-clients.md
b/versioned_docs/version-4.0.x/concepts-clients.md
index 735d3b4b188..5ba436849d9 100644
--- a/versioned_docs/version-4.0.x/concepts-clients.md
+++ b/versioned_docs/version-4.0.x/concepts-clients.md
@@ -117,3 +117,47 @@ Each TableView uses one Reader instance per partition, and
reads the topic start
The following figure illustrates the dynamic construction of a TableView
updated with newer values of each key.

+
+## Transactions
+
+Pulsar clients support transactions that enable atomic operations across
multiple topics and partitions. Transactions provide exactly-once semantics and
ensure that either all operations within a transaction succeed or fail together.
+
+With transactions, Pulsar clients can:
+
+* **Atomic message production**: Produce messages to multiple topics
atomically within a transaction boundary.
+* **Atomic message acknowledgment**: Acknowledge messages within transaction
boundaries, ensuring processed messages are only committed when the transaction
succeeds.
+* **Cross-topic operations**: Perform operations spanning multiple topics as
part of a single atomic transaction.
+
+### Transaction workflow
+
+1. **Begin transaction**: Create a new transaction with configurable timeout.
+2. **Perform operations**: Send messages and acknowledge consumed messages
within the transaction context.
+3. **Commit or abort**: Either commit the transaction (making all operations
permanent) or abort it (rolling back all operations).
+
+Example transaction usage:
+
+```java
+// Create a transaction
+Transaction txn = client.newTransaction()
+ .withTransactionTimeout(1, TimeUnit.MINUTES)
+ .build().get();
+
+try {
+ // Send messages within transaction
+ producer.newMessage(txn).value("message-1").send();
+ producer.newMessage(txn).value("message-2").send();
+
+ // Acknowledge messages within transaction
+ consumer.acknowledgeAsync(messageId, txn);
+
+ // Commit transaction
+ txn.commit().get();
+} catch (Exception e) {
+ // Abort transaction on error
+ txn.abort().get();
+}
+```
+
+Transactions are particularly useful for building exactly-once processing
pipelines, ensuring data consistency across multiple Pulsar topics, and
implementing complex event processing patterns.
+
+For more details, see [Pulsar transactions](concepts-transactions.md).
diff --git a/versioned_docs/version-4.0.x/concepts-messaging.md
b/versioned_docs/version-4.0.x/concepts-messaging.md
index fe7c2106d45..223b03d8b52 100644
--- a/versioned_docs/version-4.0.x/concepts-messaging.md
+++ b/versioned_docs/version-4.0.x/concepts-messaging.md
@@ -1417,6 +1417,9 @@ delayedDeliveryTickTimeMillis=1000
# has passed, and they may be as late as the deliverAt time plus the
tickTimeMillis for the topic plus the
# delayedDeliveryTickTimeMillis.
isDelayedDeliveryDeliverAtTimeStrict=false
+
+# Maximum number of delayed messages per dispatcher. Once this limit is
reached, no more delayed messages are allowed.
+maxNumDelayedDeliveryTrackerMemoryEntries=100000
```
### Producer
diff --git a/versioned_docs/version-4.0.x/concepts-overview.md
b/versioned_docs/version-4.0.x/concepts-overview.md
index 135d3f00365..8cde9bb68db 100644
--- a/versioned_docs/version-4.0.x/concepts-overview.md
+++ b/versioned_docs/version-4.0.x/concepts-overview.md
@@ -14,12 +14,14 @@ Key features of Pulsar are listed below:
* Native support for multiple clusters in a Pulsar instance, with seamless
[geo-replication](administration-geo.md) of messages across clusters.
* Very low publish and end-to-end latency.
* Seamless scalability to over a million topics.
-* A simple [client API](concepts-clients.md) with bindings for
[Java](client-libraries-java.md), [Go](client-libraries-go.md),
[Python](client-libraries-python.md) and [C++](client-libraries-cpp.md).
-* Multiple [subscription types](concepts-messaging.md#subscription-types)
([exclusive](concepts-messaging.md#exclusive),
[shared](concepts-messaging.md#shared), and
[failover](concepts-messaging.md#failover)) for topics.
+* A simple [client API](concepts-clients.md) with bindings for
[Java](client-libraries-java.md), [Go](client-libraries-go.md),
[Python](client-libraries-python.md), [C++](client-libraries-cpp.md),
[C#/.NET](client-libraries-dotnet.md), [Node.js](client-libraries-node.md), and
[WebSocket](client-libraries-websocket.md).
+* Multiple [subscription types](concepts-messaging.md#subscription-types)
([exclusive](concepts-messaging.md#exclusive),
[shared](concepts-messaging.md#shared),
[failover](concepts-messaging.md#failover), and
[key_shared](concepts-messaging.md#key_shared)) for topics.
* Guaranteed message delivery with [persistent message
storage](concepts-architecture-overview.md#persistent-storage) provided by
[Apache BookKeeper](http://bookkeeper.apache.org/).
-A serverless lightweight computing framework [Pulsar
Functions](functions-overview.md) offers the capability for stream-native data
processing.
+* A serverless lightweight computing framework [Pulsar
Functions](functions-overview.md) offers the capability for stream-native data
processing.
* A serverless connector framework [Pulsar IO](io-overview.md), which is built
on Pulsar Functions, makes it easier to move data in and out of Apache Pulsar.
* [Tiered Storage](tiered-storage-overview.md) offloads data from hot/warm
storage to cold/long-term storage (such as S3 and GCS) when the data is aging
out.
+* Native support for [transactions](concepts-transactions.md) enabling atomic
operations across topics and partitions.
+* Flexible [authentication and authorization](concepts-authentication.md) with
support for multiple providers including OAuth/OIDC.
## Contents
diff --git a/versioned_docs/version-4.0.x/concepts-replication.md
b/versioned_docs/version-4.0.x/concepts-replication.md
index 934659198fd..2916b6099e4 100644
--- a/versioned_docs/version-4.0.x/concepts-replication.md
+++ b/versioned_docs/version-4.0.x/concepts-replication.md
@@ -33,7 +33,7 @@ In synchronous geo-replication, data is synchronously
replicated to multiple dat

-Synchronous geo-replication in Pulsar is achieved by BookKeeper. A synchronous
geo-replicated cluster consists of a cluster of bookies and a cluster of
brokers that run in multiple data centers, and a global Zookeeper installation
(a ZooKeeper ensemble is running across multiple data centers). You need to
configure a BookKeeper region-aware placement policy to store data across
multiple data centers and guarantee availability constraints on writes.
+Synchronous geo-replication in Pulsar is achieved by BookKeeper. A synchronous
geo-replicated cluster consists of a cluster of bookies and a cluster of
brokers that run in multiple data centers, and a global metadata store
installation (such as ZooKeeper, etcd, or other supported metadata stores
running across multiple data centers). You need to configure a BookKeeper
region-aware placement policy to store data across multiple data centers and
guarantee availability constraints on writes.
Synchronous geo-replication provides the highest availability and also
guarantees stronger data consistency between different data centers. However,
your applications have to pay an extra latency penalty across data centers.
diff --git a/versioned_docs/version-4.0.x/concepts-tiered-storage.md
b/versioned_docs/version-4.0.x/concepts-tiered-storage.md
index 559410ef12a..3ca528fa203 100644
--- a/versioned_docs/version-4.0.x/concepts-tiered-storage.md
+++ b/versioned_docs/version-4.0.x/concepts-tiered-storage.md
@@ -10,8 +10,116 @@ One way to alleviate this cost is to use Tiered Storage.
With tiered storage, ol

+## How Tiered Storage Works
+
+Tiered storage leverages Pulsar's segment-based architecture where data is
stored in immutable segments (ledgers) in BookKeeper. When segments are sealed
and become read-only, they can be safely offloaded to external storage systems.
+
+### Offloading Process
+
+1. **Segment Sealing**: When a BookKeeper ledger is closed (due to size
limits, time limits, or manual triggers), it becomes immutable.
+2. **Eligibility Check**: The broker determines which segments are eligible
for offloading based on configured policies.
+3. **Data Transfer**: Eligible segments are copied to the configured long-term
storage backend.
+4. **Metadata Update**: BookKeeper metadata is updated to reference the
external storage location.
+5. **Local Deletion**: After a configurable delay (default: 4 hours), the
original data is deleted from BookKeeper.
+
+### Transparent Access
+
+Consumers can access offloaded data transparently. When a consumer requests
data that has been offloaded:
+- The broker checks BookKeeper metadata to determine the storage location
+- If data is in external storage, the broker retrieves it seamlessly
+- The consumer receives the data as if it were still in BookKeeper
+
+## Supported Storage Backends
+
+Pulsar supports multiple storage backends for tiered storage:
+
+### Cloud Storage Providers
+
+- **Amazon S3**: Industry-standard object storage with multiple storage classes
+- **Google Cloud Storage (GCS)**: Google's object storage with lifecycle
management
+- **Microsoft Azure Blob Storage**: Azure's object storage solution
+- **Alibaba Cloud OSS**: Alibaba's object storage service
+
+### On-Premises Solutions
+
+- **Filesystem**: Local or network-attached storage for on-premises deployments
+- **S3-Compatible Storage**: MinIO, Ceph, and other S3-compatible solutions
+
+### Storage Classes and Cost Optimization
+
+Different storage backends offer various storage classes:
+- **Hot storage**: Immediate access, higher cost (S3 Standard, GCS Standard)
+- **Cool storage**: Infrequent access, lower cost (S3 IA, GCS Nearline)
+- **Cold storage**: Archive storage, lowest cost (S3 Glacier, GCS Coldline)
+
+## Configuration and Policies
+
+### Offloading Triggers
+
+Tiered storage can be triggered by:
+
+1. **Size-based policies**: Offload when topic backlog exceeds a certain size
+2. **Time-based policies**: Offload data older than a specified age
+3. **Manual triggers**: Administrative commands via REST API or CLI
+4. **Namespace-level policies**: Automatic offloading based on namespace
configuration
+
+### Key Configuration Parameters
+
+- **Offload threshold**: Minimum backlog size before offloading begins
+- **Offload deletion lag**: Delay before deleting data from BookKeeper after
offloading
+- **Max block size**: Maximum size of data blocks uploaded to external storage
+- **Read buffer size**: Buffer size for reading offloaded data
+- **Offload driver**: Backend storage driver configuration
+
+## Performance Considerations
+
+### Read Performance
+
+- **First access**: Reading offloaded data is slower than BookKeeper due to
network latency
+- **Caching**: Some implementations cache frequently accessed offloaded data
+- **Prefetching**: Brokers may prefetch data based on access patterns
+
+### Write Performance
+
+- **Asynchronous offloading**: Offloading occurs in the background without
affecting write performance
+- **Parallel transfers**: Multiple segments can be offloaded concurrently
+- **Bandwidth management**: Configurable limits to prevent offloading from
overwhelming network resources
+
+### Cost Benefits
+
> Data written to BookKeeper is replicated to 3 physical machines by default.
> However, once a segment is sealed in BookKeeper it becomes immutable and can
> be copied to long term storage. Long term storage can achieve cost savings
> by using mechanisms such as [Reed-Solomon error
> correction](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)
> to require fewer physical copies of data.
-Pulsar currently supports S3, Google Cloud Storage (GCS), and filesystem for
[long-term storage](cookbooks-tiered-storage.md). Offloading to long-term
storage is triggered via a Rest API or command line interface. The user passes
in the amount of topic data they wish to retain on BookKeeper, and the broker
will copy the backlog data to long-term storage. The original data will then be
deleted from BookKeeper after a configured delay (4 hours by default).
+Cost savings come from:
+- **Reduced replication**: External storage typically requires fewer replicas
+- **Storage class optimization**: Use cheaper storage classes for older data
+- **Operational efficiency**: Reduced BookKeeper cluster storage requirements
+
+## Use Cases and Benefits
+
+### Long-term Data Retention
+
+- **Compliance requirements**: Meet regulatory requirements for data retention
+- **Historical analysis**: Maintain access to historical data for analytics
+- **Audit trails**: Preserve message history for compliance and auditing
+
+### Cost Optimization
+
+- **Storage cost reduction**: Move cold data to cheaper storage tiers
+- **Operational efficiency**: Reduce BookKeeper storage requirements
+- **Elastic scaling**: Handle varying data volumes without over-provisioning
+
+## Management and Operations
+
+### Monitoring
+
+- **Offloading metrics**: Track offloading progress and performance
+- **Storage usage**: Monitor storage consumption across tiers
+- **Access patterns**: Analyze data access patterns for optimization
+
+### Automation
+
+- **Policy-based management**: Automated offloading based on predefined
policies
+- **Lifecycle management**: Integration with cloud provider lifecycle policies
+- **Alerting**: Notifications for offloading failures or threshold breaches
-> For a guide for setting up tiered storage, see the [Tiered storage
cookbook](cookbooks-tiered-storage.md).
+> For detailed setup instructions and configuration examples, see the [Tiered
storage cookbook](cookbooks-tiered-storage.md).
diff --git a/versioned_docs/version-4.0.x/concepts-topic-compaction.md
b/versioned_docs/version-4.0.x/concepts-topic-compaction.md
index 5dfc7396df1..6075a97d810 100644
--- a/versioned_docs/version-4.0.x/concepts-topic-compaction.md
+++ b/versioned_docs/version-4.0.x/concepts-topic-compaction.md
@@ -32,17 +32,17 @@ When topic compaction is triggered [via the
CLI](cookbooks-compaction.md), it wo
2. After that, the broker will create a new [BookKeeper
ledger](concepts-architecture-overview.md#ledgers) and make a second iteration
through each message on the topic. For each message:
- - If the key matches the latest occurrence of that key, then the key's
data payload, message ID, and metadata will be written to the newly created
ledger.
-
- - If the key doesn't match the latest then the message will be skipped and
left alone.
-
- - If any given message has an empty payload, it will be skipped and
considered deleted (akin to the concept of
[tombstones](https://en.wikipedia.org/wiki/Tombstone_(data_store)) in key-value
databases).
-
-3. At the end of this second iteration through the topic, the newly created
BookKeeper ledger is closed and two things are written to the topic's metadata:
+ - If the key matches the latest occurrence of that key, then the key's
data payload, message ID, and metadata will be written to the newly created
ledger.
+
+ - If the key doesn't match the latest then the message will be skipped and
left alone.
+
+ - If any given message has an empty payload, it will be skipped and
considered deleted (akin to the concept of
[tombstones](https://en.wikipedia.org/wiki/Tombstone_(data_store)) in key-value
databases).
+
+3. At the end of this second iteration through the topic, the newly created
BookKeeper ledger is closed and two things are written to the topic's metadata:
- The ID of the BookKeeper ledger
- - The message ID of the last compacted message (this is known as the
**compaction horizon** of the topic).
-
+ - The message ID of the last compacted message (this is known as the
**compaction horizon** of the topic).
+
Once this metadata is written compaction is complete.
4. After the initial compaction operation, the Pulsar
[broker](concepts-architecture-overview.md#brokers) that owns the topic is
notified whenever any future changes are made to the compaction horizon and
compacted backlog. When such changes occur:
@@ -51,4 +51,28 @@ When topic compaction is triggered [via the
CLI](cookbooks-compaction.md), it wo
* Read from the topic like normal (if the message ID is greater than or
equal to the compaction horizon) or
* Read beginning at the compaction horizon (if the message ID is lower
than the compaction horizon)
+## Compaction Configuration
+
+Topic compaction behavior can be configured through various broker settings:
+
+### Key Configuration Parameters
+
+- **`compactionRetainNullKey`**: Controls whether null keys are retained
during compaction. When set to `true`, messages with null keys are preserved in
the compacted ledger. When `false` (default), messages with null keys are
treated as non-key messages and may be removed during compaction.
+
+- **`brokerServiceCompactionThreshold`**: The threshold size (in bytes) that
triggers automatic compaction for a topic's backlog. When the topic's backlog
exceeds this size, compaction will be triggered automatically.
+
+### Null Key Handling
+
+The `compactionRetainNullKey` parameter is particularly important for topics
that contain messages without keys. This configuration determines how the
compaction process handles such messages:
+
+- **Enabled (`true`)**: Messages with null keys are preserved during
compaction, ensuring that unkeyed messages remain accessible in the compacted
view.
+- **Disabled (`false`)**: Messages with null keys may be removed during
compaction, as they cannot be properly deduplicated without a key.
+
+This setting is useful for topics that mix keyed and unkeyed messages,
allowing administrators to control whether unkeyed messages should be retained
in the compacted topic.
+
+### Performance Considerations
+
+- **Compaction frequency**: Balance between storage savings and CPU/I/O
overhead
+- **Topic size**: Larger topics take longer to compact but may benefit more
from compaction
+- **Key distribution**: Topics with many unique keys benefit less from
compaction
diff --git a/versioned_docs/version-4.0.x/concepts-transactions.md
b/versioned_docs/version-4.0.x/concepts-transactions.md
index 808eeccd08f..838981984a3 100644
--- a/versioned_docs/version-4.0.x/concepts-transactions.md
+++ b/versioned_docs/version-4.0.x/concepts-transactions.md
@@ -27,3 +27,106 @@ Messages produced within a transaction are stored in the
transaction buffer. The
Message acknowledges within a transaction are maintained by the pending
acknowledge state before the transaction completes. If a message is in the
pending acknowledge state, the message cannot be acknowledged by other
transactions until the message is removed from the pending acknowledge state.
The pending acknowledge state is persisted in the pending acknowledge log. The
pending acknowledge log is backed by a Pulsar topic. A new broker can restore
the state from the pending acknowledge log to ensure the acknowledgment is not
lost.
+
+## Performance Optimizations
+
+### Transaction Log Batching
+
+Pulsar supports batched writing to transaction logs to improve performance and
reduce the overhead of maintaining transaction state. When enabled, multiple
transaction log entries are batched together before writing to storage,
reducing I/O operations and improving throughput.
+
+**Key configuration parameters:**
+- `transactionLogBatchedWriteEnabled`: Enable batched writing for transaction
logs
+- `transactionLogBatchedWriteMaxRecords`: Maximum number of records in a batch
+- `transactionLogBatchedWriteMaxSize`: Maximum size of a batch in bytes
+- `transactionLogBatchedWriteMaxDelayInMillis`: Maximum delay before flushing
a batch
+
+### Pending Acknowledgment Batching
+
+To improve performance when handling large numbers of pending acknowledgments,
Pulsar supports batching of pending acknowledgment operations. This reduces the
overhead of maintaining pending ack state for high-throughput transactional
workloads.
+
+### Segmented Transaction Buffer Snapshots
+
+For handling large numbers of aborted transactions efficiently, Pulsar
implements segmented snapshot functionality. This feature helps manage
transaction buffer snapshots more effectively when dealing with scenarios
involving many aborted transactions.
+
+**Benefits:**
+- Improved memory management for transaction buffers
+- Better handling of abort scenarios with large transaction volumes
+- Reduced snapshot overhead for transaction recovery
+
+### Transaction Buffer Performance Tuning
+
+Recent improvements include enhanced transaction buffer configurations for
performance tuning:
+
+- **Buffer size optimization**: Configurable buffer sizes for different
workload patterns
+- **Batch processing**: Improved batching within transaction buffers
+- **Memory management**: Better memory allocation strategies for transaction
data
+
+## Transaction Isolation and Consistency
+
+### Read Committed Isolation
+
+Pulsar transactions provide **read committed** isolation level, ensuring that:
+- Consumers only see messages from committed transactions
+- Uncommitted messages remain invisible until transaction commit
+- Aborted transactions have their messages discarded automatically
+
+### Cross-Partition Consistency
+
+Transactions in Pulsar can span multiple topics and partitions while
maintaining consistency:
+- **Atomic commits**: All operations within a transaction succeed or fail
together
+- **Coordinator-managed state**: Transaction coordinator ensures consistent
state across partitions
+- **Failure recovery**: System can recover to consistent state after
coordinator failures
+
+## Transaction Timeouts and Recovery
+
+### Timeout Management
+
+The transaction coordinator handles transaction timeouts to prevent
indefinitely hanging transactions:
+- **Configurable timeouts**: Set appropriate timeout values for different use
cases
+- **Automatic abort**: Transactions are automatically aborted when they exceed
timeout
+- **Resource cleanup**: Timed-out transactions have their resources cleaned up
automatically
+
+### Coordinator Recovery
+
+When a transaction coordinator fails, recovery mechanisms ensure transaction
consistency:
+- **State restoration**: Transaction state is restored from transaction logs
+- **Pending transaction handling**: In-progress transactions are properly
handled during recovery
+- **Metadata consistency**: Transaction metadata remains consistent across
coordinator restarts
+
+## Configuration and Best Practices
+
+### Key Configuration Parameters
+
+#### Core Transaction Settings
+- `transactionCoordinatorEnabled`: Enable transaction coordinator in broker
(default: `false`)
+- `transactionMetadataStoreProviderClassName`: Transaction metadata store
provider class (default:
`org.apache.pulsar.transaction.coordinator.impl.MLTransactionMetadataStoreProvider`)
+- `transactionBufferProviderClassName`: Transaction buffer provider class
(default:
`org.apache.pulsar.broker.transaction.buffer.impl.TopicTransactionBufferProvider`)
+- `transactionPendingAckStoreProviderClassName`: Transaction pending ack store
provider class (default:
`org.apache.pulsar.broker.transaction.pendingack.impl.MLPendingAckStoreProvider`)
+
+#### Batched Write Settings
+- `transactionLogBatchedWriteEnabled`: Enable batched transaction log writes
for better efficiency (default: `false`)
+- `transactionLogBatchedWriteMaxRecords`: Maximum log records count in a batch
(default: `512`)
+- `transactionLogBatchedWriteMaxSize`: Maximum bytes size in a batch (default:
`4194304` - 4 MB)
+- `transactionLogBatchedWriteMaxDelayInMillis`: Maximum wait time for first
record in batch (default: `1`)
+- `transactionPendingAckBatchedWriteEnabled`: Enable batched writes for
pending ack store (default: `false`)
+
+#### Buffer and Snapshot Settings
+- `transactionBufferSegmentedSnapshotEnabled`: Enable segmented buffer
snapshots for handling large numbers of aborted transactions (default: `false`)
+- `transactionBufferSnapshotMaxTransactionCount`: Take snapshot after this
many transaction operations (default: `1000`)
+- `transactionBufferSnapshotMinTimeInMillis`: Interval for taking snapshots in
milliseconds (default: `5000`)
+- `transactionBufferSnapshotSegmentSize`: Size of snapshot segment in bytes
(default: `262144` - 256 KB)
+
+#### Performance and Limits
+- `maxActiveTransactionsPerCoordinator`: Maximum active transactions per
coordinator (default: `0` - no limit)
+- `numTransactionReplayThreadPoolSize`: Thread pool size for transaction
replay (default: number of CPU cores)
+- `transactionBufferClientMaxConcurrentRequests`: Maximum concurrent requests
for buffer client (default: `1000`)
+- `transactionBufferClientOperationTimeoutInMills`: Buffer client operation
timeout in milliseconds (default: `3000`)
+
+### Performance Considerations
+
+- **Batch size tuning**: Optimize batch sizes for your workload characteristics
+- **Coordinator placement**: Distribute transaction coordinators appropriately
+- **Resource allocation**: Ensure adequate resources for transaction processing
+- **Monitoring**: Monitor transaction metrics for performance optimization
+
+For detailed configuration and usage examples, see [Pulsar
transactions](txn-what.md).
diff --git a/versioned_docs/version-4.0.x/helm-deploy.md
b/versioned_docs/version-4.0.x/helm-deploy.md
index 5be5ec2ac68..2aaf7cfce6d 100644
--- a/versioned_docs/version-4.0.x/helm-deploy.md
+++ b/versioned_docs/version-4.0.x/helm-deploy.md
@@ -134,27 +134,27 @@ The Pulsar Helm Chart is designed to enable controlled
upgrades. So it can confi
images:
zookeeper:
repository: apachepulsar/pulsar-all
- tag: @pulsar:version@
+ tag: latest
pullPolicy: IfNotPresent
bookie:
repository: apachepulsar/pulsar-all
- tag: @pulsar:version@
+ tag: latest
pullPolicy: IfNotPresent
autorecovery:
repository: apachepulsar/pulsar-all
- tag: @pulsar:version@
+ tag: latest
pullPolicy: IfNotPresent
broker:
repository: apachepulsar/pulsar-all
- tag: @pulsar:version@
+ tag: latest
pullPolicy: IfNotPresent
proxy:
repository: apachepulsar/pulsar-all
- tag: @pulsar:version@
+ tag: latest
pullPolicy: IfNotPresent
functions:
repository: apachepulsar/pulsar-all
- tag: @pulsar:version@
+ tag: latest
pulsar_manager:
repository: apachepulsar/pulsar-manager
tag: v0.3.0
@@ -172,7 +172,7 @@ pulsar_metadata:
component: pulsar-init
image:
repository: apachepulsar/pulsar-all
- tag: @pulsar:version@
+ tag: latest
pullPolicy: IfNotPresent
```