This is an automated email from the ASF dual-hosted git repository. palashc pushed a commit to branch PHOENIX-7800-add-phoenix-5.3.1-release in repository https://gitbox.apache.org/repos/asf/phoenix-site.git
commit 34f088de814322eefc64915cbc75304766660078 Author: Palash Chauhan <[email protected]> AuthorDate: Mon May 18 17:45:48 2026 -0700 PHOENIX-7800 : Add Phoenix 5.3.1 release and features Add the 5.3.1 row to the Downloads page, bump PHOENIX_VERSION to 5.3.1, add a 5.3.1 highlights section to Recent Improvements, refresh the home page "What's New" badge and add a 4th card for Eventually Consistent Indexes. Add docs pages for the new 5.3.1 features: - Eventually Consistent Global Indexes (CONSISTENCY=EVENTUAL) - High Availability (graceful failover, ACTIVE_TO_STANDBY) - PhoenixSyncTable data-validation tool - ROW_SIZE() / RAW_ROW_SIZE() built-in functions Append two new sections to the existing Metrics page covering the new HBase scan-latency metrics (PHOENIX-7704) and the top-N slowest parallel scans API (PHOENIX-7729). Register the new feature pages in features/meta.json and cross-link from secondary-indexes.mdx to the new EC indexes page. Co-authored-by: Cursor <[email protected]> --- .../features/eventually-consistent-indexes.mdx | 146 ++++++++++++++++ .../(multi-page)/features/high-availability.mdx | 187 +++++++++++++++++++++ .../docs/_mdx/(multi-page)/features/meta.json | 4 + .../docs/_mdx/(multi-page)/features/metrics.mdx | 79 +++++++++ .../(multi-page)/features/phoenix-sync-table.mdx | 159 ++++++++++++++++++ .../docs/_mdx/(multi-page)/features/row-size.mdx | 91 ++++++++++ .../(multi-page)/features/secondary-indexes.mdx | 2 + app/pages/_landing/downloads/content.md | 2 + app/pages/_landing/home/whats-new.tsx | 25 ++- app/pages/_landing/recent-improvements/content.mdx | 9 + phoenix-version.ts | 2 +- 11 files changed, 699 insertions(+), 7 deletions(-) diff --git a/app/pages/_docs/docs/_mdx/(multi-page)/features/eventually-consistent-indexes.mdx b/app/pages/_docs/docs/_mdx/(multi-page)/features/eventually-consistent-indexes.mdx new file mode 100644 index 00000000..eaf087b7 --- /dev/null +++ b/app/pages/_docs/docs/_mdx/(multi-page)/features/eventually-consistent-indexes.mdx @@ -0,0 +1,146 @@ +--- +title: "Eventually Consistent Global Indexes" +description: "Global secondary indexes maintained asynchronously off the data-table write path — for write-heavy workloads that can tolerate a bounded staleness window on the index in exchange for higher write throughput." +--- + +An **eventually consistent global index** behaves like a regular Phoenix +[global index](/docs/features/secondary-indexes#global-indexes) at the SQL +level — same DDL, same `INCLUDE`, same query planning — but Phoenix maintains +it asynchronously instead of on the synchronous write path. Writes commit +faster on the data table, the index catches up shortly after, and reads from +the index are read-repaired against the data table so they never return +incorrect data. Introduced in Phoenix 5.3.1 +([PHOENIX-7794](https://issues.apache.org/jira/browse/PHOENIX-7794)). + +## When to use it [#ec-indexes-when] + +Pick eventually consistent over the default (strong) consistency when **both** +are true: + +- Synchronous index maintenance is your write-path bottleneck — typically a + data table fanning out to several large indexes per mutation. +- A bounded staleness window on the index (seconds) is acceptable for the + queries that hit it. + +Stay with the default `CONSISTENCY=STRONG` for indexes that back +read-your-write flows — e.g. "insert a row, then immediately query it via the +new index" inside a single user request. + +This is a property of **global** indexes only. `CONSISTENCY=EVENTUAL` on a +`LOCAL` index parses but has no runtime effect. + +## Creating an EC index [#ec-indexes-create] + +`CONSISTENCY=EVENTUAL` is set in the trailing properties slot of `CREATE INDEX`: + +```sql +CREATE INDEX my_idx + ON my_table (v1) + INCLUDE (v2) + CONSISTENCY=EVENTUAL; +``` + +`UNCOVERED` and `ASYNC` compose normally: + +```sql +CREATE UNCOVERED INDEX my_idx ON my_table (city, name) CONSISTENCY=EVENTUAL; +CREATE INDEX my_idx ON my_table (v1) INCLUDE (v2) ASYNC CONSISTENCY=EVENTUAL; +``` + +The default is `CONSISTENCY=STRONG`. Flip an existing index in either direction: + +```sql +ALTER INDEX my_idx ON my_table CONSISTENCY=EVENTUAL; +ALTER INDEX my_idx ON my_table CONSISTENCY=STRONG; +``` + +The consistency mode is a per-index property — not a connection setting and +not a query hint. Every reader of the index sees the same mode. + +## How it works [#ec-indexes-how] + +EC indexes are maintained by a per-region background consumer that reads a +[Change Data Capture](/docs/features/change-data-capture) stream on the data +table and applies the resulting mutations to the index. The first EC index +on a table provisions the stream automatically; subsequent EC indexes on the +same table share it. + +The consumer supports two strategies for turning a CDC event into an index +mutation, with opposite write-vs-read IO tradeoffs: + +- **Derive on consume (default).** The CDC event carries a lightweight + data-row-state marker; the consumer re-reads the data row at consume time + to compute the index mutation. Cheap on the write path, one extra + data-table read per event on the consume path. Relies on + `phoenix.max.lookback.age.seconds` being long enough for the data table to + retain the before image of every modified row until the consumer catches up. +- **Serialize on write.** The index mutation is computed at write time and + serialized into the CDC event itself; the consumer just replays it. More + write IO (and optionally compressed), no extra read on consume. Useful when + the consumer's data-table read is the bottleneck or max-lookback is tight. + +Toggle with `phoenix.index.cdc.mutation.serialize` (see Tuning). For most +workloads the default — derive on consume — is the right choice. + +## How reads behave [#ec-indexes-reads] + +There is **no query-side change** — no hint, no new syntax. The planner picks +an EC index exactly like a STRONG index. The visibility contract differs in +two ways: + +- A row recently inserted on the data table may not yet appear in the index. +- An existing index row's covered column values may be stale until the next + update is applied. + +Phoenix never returns incorrect rows: any index row not yet caught up is +verified against the data table before being returned, exactly like a STRONG +index. The practical visibility window is **a few seconds** on a healthy +cluster, governed by the tunables below. + +## Tuning [#ec-indexes-tuning] + +Set on the RegionServer side in `hbase-site.xml`. The defaults are sensible +for most clusters; the two knobs you will typically reach for are batch size +(throughput) and timestamp buffer (visibility delay floor). + +| Property | Default | Description | +| -------------------------------------------------------- | ------: | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `phoenix.index.cdc.consumer.enabled` | `true` | Master switch for the async index maintenance subsystem. Disable to halt all EC-index maintenance cluster-wide. | +| `phoenix.index.cdc.consumer.batch.size` | `500` | Events drained per iteration. Larger amortizes overhead; smaller bounds staleness on bursty workloads. | +| `phoenix.index.cdc.consumer.poll.interval.ms` | `1000` | Sleep when there is no work to do. Raise to reduce idle wake-ups. | +| `phoenix.index.cdc.consumer.timestamp.buffer.ms` | `5000` | Safety buffer subtracted from "now" before consuming. Floor for visibility delay. | +| `phoenix.index.cdc.consumer.startup.delay.ms` | `10000` | Delay before a freshly opened region starts consuming. | +| `phoenix.index.cdc.consumer.max.data.visibility.retries` | `10` | Retries when the data row is not yet visible to the consumer. | +| `phoenix.index.cdc.consumer.retry.pause.ms` | `2000` | Sleep between data-visibility retries. | +| `phoenix.index.cdc.mutation.serialize` | `false` | Selects the consumer strategy (see [How it works](#ec-indexes-how)). `false` derives index mutations at consume time (lower write IO); `true` serializes them at write time (no consume-side read). | +| `phoenix.index.cdc.mutations.compress.enabled` | `false` | Snappy-compress the serialized index mutation. Only relevant when `phoenix.index.cdc.mutation.serialize=true`. | + +With defaults, expect end-to-end index visibility of **~5–10 seconds**. + +## Limitations [#ec-indexes-limitations] + +- **Global indexes only.** `LOCAL INDEX ... CONSISTENCY=EVENTUAL` is a no-op. +- **Designed and tested for non-transactional tables.** Combining EC indexes + with transactional tables is undefined in 5.3.1. +- **Salted data tables are not supported** — the underlying change stream + Phoenix needs to maintain an EC index is incompatible with salting. +- **Data-table max lookback age must outlive the async lag.** Increase + `phoenix.max.lookback.age.seconds` on the data table if you tune the + consumer for low throughput or you expect long catch-up periods. + +## Operational notes [#ec-indexes-ops] + +- The first EC index on a table provisions the underlying change stream + automatically. Subsequent EC indexes on the same table reuse it. +- Region splits and merges are handled transparently — no operator action is + required to keep the index up to date through topology changes. +- `ALTER INDEX ... CONSISTENCY=STRONG` returns the index to synchronous + maintenance for **new** writes. Any in-flight async work continues to flow + through until the queue drains. + +## See also [#ec-indexes-see-also] + +- [Secondary Indexes](/docs/features/secondary-indexes) — global, local, + covered, uncovered, and functional indexes. +- [Change Data Capture](/docs/features/change-data-capture) — exposed + Phoenix-level change streams for application use. diff --git a/app/pages/_docs/docs/_mdx/(multi-page)/features/high-availability.mdx b/app/pages/_docs/docs/_mdx/(multi-page)/features/high-availability.mdx new file mode 100644 index 00000000..64fad51f --- /dev/null +++ b/app/pages/_docs/docs/_mdx/(multi-page)/features/high-availability.mdx @@ -0,0 +1,187 @@ +--- +title: "High Availability" +description: "Active/active and active/standby Phoenix client wiring across two HBase clusters, with graceful failover that lets in-flight readers drain while writes are blocked." +--- + +Phoenix High Availability (HA) lets a JDBC client transparently target a pair +of HBase clusters that mirror the same Phoenix schema, so an operator-driven +or fault-driven failover never requires the application to restart, reconnect, +or rewrite URLs. + +Phoenix 5.3.1 adds **graceful failover** — an intermediate `ACTIVE_TO_STANDBY` +role that lets writes drain while readers keep working — plus support for +HBase's `MASTER` and `RPC` connection registries +([PHOENIX-7493](https://issues.apache.org/jira/browse/PHOENIX-7493), +[PHOENIX-7495](https://issues.apache.org/jira/browse/PHOENIX-7495), +[PHOENIX-7586](https://issues.apache.org/jira/browse/PHOENIX-7586)). + +## Concepts [#ha-concepts] + +An **HA group** is a named tuple of two HBase clusters and an HA policy, +shared by every client that participates. The current role of each cluster +lives in a JSON record in ZooKeeper, replicated to both clusters' ZK +ensembles and watched by the client; role changes are picked up +automatically. + +### Cluster roles + +| Role | Clients can connect? | Meaning | +| ------------------- | :------------------: | ------------------------------------------------------------------------------------------------------------------------------------- | +| `ACTIVE` | yes | Cluster is serving live reads and writes. | +| `STANDBY` | yes | Cluster is reachable but not the current primary; FAILOVER clients refuse to bind to it. | +| `ACTIVE_TO_STANDBY` | yes | Transitional state during graceful failover: existing readers continue to work; writes may be rejected (see Graceful failover below). | +| `OFFLINE` | no | Cluster is intentionally taken out of rotation. | +| `UNKNOWN` | no | Role has not been initialized or the record could not be read. | + +### HA policies + +The policy is part of the HA-group record. Clients do not pick it — operators +do, when provisioning the group. + +- **`FAILOVER`** — exactly one cluster (the `ACTIVE`) serves the connection at + any moment. The client transparently re-binds on role change. +- **`PARALLEL`** — every statement is issued to **both** clusters in parallel, + with the faster result returned. Useful when both clusters carry identical + data and you want to mask single-cluster tail latency. + +### Failover sub-policies (FAILOVER only) + +Controls how a FAILOVER connection reacts when its bound cluster transitions +away from `ACTIVE`: + +| `phoenix.ha.failover.policy` | Behavior | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | +| `explicit` (default) | Subsequent operations throw `FailoverSQLException`; the application calls `failover()` to rebind to the new `ACTIVE`. | +| `active` | Connection transparently rebinds to the new `ACTIVE` on the next statement, up to `phoenix.ha.failover.count` attempts (default `3`). | + +## JDBC URL [#ha-url] + +The HA URL is a bracketed pair of per-cluster endpoints separated by `|`, +optionally followed by a principal: + +``` +jdbc:phoenix+zk:[zk1\:2181::/hbase|zk2\:2181::/hbase]:my_principal +``` + +The presence of `|` inside the URL is what triggers the HA code path. The two +URLs inside the brackets are always ZooKeeper quorums — Phoenix uses them to +read the HA-group record. + +The per-cluster connection Phoenix opens underneath may use any of the +supported HBase registries (`ZK`, `MASTER`, or `RPC`), based on what the +operator configured in the HA-group record. `RPC` requires HBase 2.5+. + +### Connecting + +Set the HA group name as a JDBC property and open a connection like any other: + +```java +Properties props = new Properties(); +props.setProperty("phoenix.ha.group.name", "myGroup"); +try (Connection conn = DriverManager.getConnection( + "jdbc:phoenix+zk:[zk1\\:2181::/hbase|zk2\\:2181::/hbase]", props)) { + // Normal JDBC usage. +} +``` + +The returned `Connection` honors the policy declared in the HA-group record. + +## Graceful failover [#ha-graceful] + +Graceful failover is a two-step demotion of the `ACTIVE` cluster: + +1. **`ACTIVE → ACTIVE_TO_STANDBY`.** The operator flips the source cluster + into `ACTIVE_TO_STANDBY`. **Existing connections remain open and in-flight + reads complete normally.** On the server side, with + `phoenix.cluster.role.based.mutation.block.enabled=true`, new mutations + are rejected with `MutationBlockedIOException` so replication to the peer + can drain. +2. **`ACTIVE_TO_STANDBY → STANDBY`.** Once replication has caught up, the + operator demotes the source the rest of the way. Phoenix closes all + wrapped per-cluster connections to the demoted cluster; the peer is + meanwhile promoted to `ACTIVE`. `explicit` clients see + `FailoverSQLException` on their next statement; `active` clients + transparently rebind to the new `ACTIVE`. + +Rolling back is supported: an `ACTIVE → ACTIVE_TO_STANDBY → ACTIVE` sequence +leaves connections fully intact at every step. + +### Enabling server-side write blocking + +Graceful failover relies on the source cluster's RegionServers rejecting +writes while the role is `ACTIVE_TO_STANDBY`. Enable in the source cluster's +`hbase-site.xml`: + +```xml +<property> + <name>phoenix.cluster.role.based.mutation.block.enabled</name> + <value>true</value> +</property> +``` + +Default is `false`. Readers are unaffected either way. + +## Configuration [#ha-config] + +All HA-related keys. All can be set in `hbase-site.xml` on the client and/or +as JDBC connection properties. + +### Required + +| Key | Notes | +| ----------------------- | ------------------------------------------------------------------ | +| `phoenix.ha.group.name` | Name of the HA group — must match the operator-provisioned record. | + +### ZooKeeper tuning (client side) + +| Key | Default | +| ------------------------------------- | ------: | +| `phoenix.ha.zk.connection.timeout.ms` | `4000` | +| `phoenix.ha.zk.session.timeout.ms` | `4000` | +| `phoenix.ha.zk.retry.base.sleep.ms` | `1000` | +| `phoenix.ha.zk.retry.max` | `5` | +| `phoenix.ha.zk.retry.max.sleep.ms` | `10000` | + +### Fallback to a single cluster + +| Key | Default | +| ----------------------------- | -------------------------------------------------------------------------------------------------- | +| `phoenix.ha.fallback.enabled` | `true` — if the HA record cannot be read from either ZK, fall back to a single-cluster connection. | +| `phoenix.ha.fallback.cluster` | (empty) — JDBC URL of the fallback cluster. | + +### Failover behavior + +| Key | Default | +| ---------------------------------- | -------------------------------------------------------------------------------------- | +| `phoenix.ha.transition.timeout.ms` | `300000` (5 min) — time the client gets to close connections during a role transition. | +| `phoenix.ha.failover.policy` | `explicit` — or `active` to auto-rebind. | +| `phoenix.ha.failover.count` | `3` — max auto-rebind attempts for the `active` sub-policy. | +| `phoenix.ha.failover.timeout.ms` | `10000` — wait timeout for a single failover operation. | + +### Server-side write blocking + +| Key | Default | +| --------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | +| `phoenix.cluster.role.based.mutation.block.enabled` | `false` — set `true` in the source cluster's `hbase-site.xml` to block writes while in `ACTIVE_TO_STANDBY`. | + +## Operator workflow [#ha-operator-workflow] + +A canonical graceful failover from cluster A → cluster B: + +1. Confirm `phoenix.cluster.role.based.mutation.block.enabled=true` is set on + cluster A and that replication A → B is healthy. +2. Set A → `ACTIVE_TO_STANDBY`, B remains `STANDBY`. Writes to A start being + rejected; existing reads continue. +3. Wait for A → B replication lag to reach zero. +4. Promote: A → `STANDBY`, B → `ACTIVE`. `active` clients transparently + rebind; `explicit` clients receive `FailoverSQLException` and call + `failover()` themselves. + +For an unplanned failover, skip step 2 and go directly to step 4. The +`ACTIVE_TO_STANDBY` role is a graceful-failover convenience, not a +correctness requirement. + +## See also [#ha-see-also] + +- [Metrics](/docs/features/metrics) — client metrics are emitted by both + wrapped and HA connections. diff --git a/app/pages/_docs/docs/_mdx/(multi-page)/features/meta.json b/app/pages/_docs/docs/_mdx/(multi-page)/features/meta.json index c2680852..22d047c4 100644 --- a/app/pages/_docs/docs/_mdx/(multi-page)/features/meta.json +++ b/app/pages/_docs/docs/_mdx/(multi-page)/features/meta.json @@ -4,6 +4,7 @@ "transactions", "user-defined-functions", "secondary-indexes", + "eventually-consistent-indexes", "storage-formats", "atomic-upsert", "namespace-mapping", @@ -25,8 +26,11 @@ "bson", "change-data-capture", "bulk-loading", + "phoenix-sync-table", "query-server", + "high-availability", "metrics", + "row-size", "tracing", "cursor" ] diff --git a/app/pages/_docs/docs/_mdx/(multi-page)/features/metrics.mdx b/app/pages/_docs/docs/_mdx/(multi-page)/features/metrics.mdx index f5626055..fd75fe10 100644 --- a/app/pages/_docs/docs/_mdx/(multi-page)/features/metrics.mdx +++ b/app/pages/_docs/docs/_mdx/(multi-page)/features/metrics.mdx @@ -117,3 +117,82 @@ service.submit(new Runnable() { } }); ``` + +## Scan latency metrics [#scan-latency-metrics] + +Phoenix 5.3.1 surfaces HBase's per-scan latency and IO metrics through the +request-level metrics API, giving per-query visibility into where a scan +actually spent its time +([PHOENIX-7704](https://issues.apache.org/jira/browse/PHOENIX-7704)). + +| Metric | Description | +| ---------------------------- | ----------------------------------------------------------------------- | +| `FS_READ_TIME` | Time (ms) spent in the underlying filesystem (HDFS/S3) for HFile reads. | +| `BYTES_READ_FROM_FS` | Bytes read from the filesystem — cold reads that missed every cache. | +| `BYTES_READ_FROM_MEMSTORE` | Bytes read from memstore — un-flushed cells. | +| `BYTES_READ_FROM_BLOCKCACHE` | Bytes served from the HBase block cache — warm reads. | +| `BLOCK_READ_OPS_COUNT` | Number of HFile block reads. | +| `RPC_SCAN_PROCESSING_TIME` | Time (ms) the RegionServer spent processing the scan RPC. | +| `RPC_SCAN_QUEUE_WAIT_TIME` | Time (ms) the scan RPC spent in the RegionServer's call queue. | + +These are request-level metrics — they appear in the per-table map returned by +`PhoenixRuntime.getRequestReadMetricInfo(rs)` alongside the existing entries: + +```java +try (ResultSet rs = stmt.executeQuery(sql)) { + while (rs.next()) { + // ... + } + Map<String, Map<MetricType, Long>> readMetrics = + PhoenixRuntime.getRequestReadMetricInfo(rs); + long fsReadMs = readMetrics.get(tableName).get(MetricType.FS_READ_TIME); + long blocksRead = readMetrics.get(tableName).get(MetricType.BLOCK_READ_OPS_COUNT); + // ... +} +``` + +They piggyback on `phoenix.trace.read.metrics.enabled` (the existing +request-level metrics switch, on by default) and require no additional +configuration. Values originate from HBase 2.6.3+; on older HBase versions +they are silently zero. + +## Top-N slowest parallel scans [#top-n-slowest-scans] + +For diagnosing tail-latency on a single statement, Phoenix can retain the +top-N slowest scan paths it issued while executing a query — useful when a +query fans out across many regions and you want to find the worst few +([PHOENIX-7729](https://issues.apache.org/jira/browse/PHOENIX-7729)). + +### Enabling + +Off by default. Set the count to a positive number on the connection +(`Properties`) or globally in `hbase-site.xml`: + +| Property | Default | Description | +| ---------------------------------------- | ------: | -------------------------------------------------------------------------------------------------------------------------------- | +| `phoenix.slowest.scan.metrics.count` | `0` | N — number of slowest scans to retain. `0` disables the feature. | +| `phoenix.scan.metrics.by.region.enabled` | `false` | When `true` (and `count > 0`), the per-scan record also carries the region name and serving RegionServer. Requires HBase 2.6.3+. | + +Request-level metrics (`phoenix.trace.read.metrics.enabled`) must also be on, +which is the default. + +### Retrieving the result + +`PhoenixRuntime.getTopNSlowestScanMetrics(rs)` returns the slowest scan paths +seen while executing the query, as a list of JSON-object lists — one outer +entry per slow scan path, one inner entry per HBase table touched on that +path. Forward to your telemetry sink as needed. + +```java +try (ResultSet rs = stmt.executeQuery(sql)) { + while (rs.next()) { + // ... + } + List<List<JsonObject>> slowest = PhoenixRuntime.getTopNSlowestScanMetrics(rs); + for (List<JsonObject> path : slowest) { + for (JsonObject scan : path) { + log.info("slow scan on {}: {}", scan.get("table"), scan); + } + } +} +``` diff --git a/app/pages/_docs/docs/_mdx/(multi-page)/features/phoenix-sync-table.mdx b/app/pages/_docs/docs/_mdx/(multi-page)/features/phoenix-sync-table.mdx new file mode 100644 index 00000000..a3d68382 --- /dev/null +++ b/app/pages/_docs/docs/_mdx/(multi-page)/features/phoenix-sync-table.mdx @@ -0,0 +1,159 @@ +--- +title: "PhoenixSyncTable Tool" +description: "Detect data divergence between a source and a target Phoenix table across two HBase clusters via a chunked hash comparison driven by MapReduce." +--- + +`PhoenixSyncTableTool` is a MapReduce-based divergence detector for Phoenix +tables that are replicated (or migrated) between two HBase clusters. It +compares chunks of source and target data without transferring full rows over +the network and records any chunk whose hashes disagree to a Phoenix system +table for later inspection. Available in Phoenix 5.3.1 +([PHOENIX-7751](https://issues.apache.org/jira/browse/PHOENIX-7751)). + +The tool is conceptually similar to HBase's `HashTable`/`SyncTable` pair but +is Phoenix-aware (respects TTL, `CURRENT_SCN`, tenant id, indexes, and the +column-encoding scheme) and runs as a **single** MapReduce job with no HDFS +intermediate. Output is a Phoenix table, queryable with SQL. + +`PhoenixSyncTableTool` performs **detection only** in 5.3.1; it does not +modify the target cluster. + +## When to use it [#sync-table-when] + +Reach for `PhoenixSyncTableTool` to verify: + +- A cluster migration that used HBase snapshots, replication, or both — to + confirm the target is byte-for-byte identical after cutover. +- Long-running HBase replication — to detect cases where a replication peer + has silently drifted. +- DR drills — to confirm the standby is in sync before a planned failover. + +For ad-hoc row-count or row-key spot-checks you usually want a small SQL +query instead; `PhoenixSyncTableTool` is the right choice when you need +**full-data** confidence with bounded network cost. + +## Running the tool [#sync-table-running] + +The tool runs through `hbase` (or `hadoop jar`) and takes only two mandatory +flags — the source table name and the target cluster's ZooKeeper quorum. + +```bash +hbase org.apache.phoenix.mapreduce.PhoenixSyncTableTool \ + --table-name MY_SCHEMA.MY_TABLE \ + --target-cluster zk1,zk2,zk3:2181:/hbase \ + --run-foreground +``` + +The source cluster comes from the Hadoop/HBase configuration the job is +submitted under, so `--target-cluster` is the ZooKeeper quorum of the +**other** cluster. Accepted quorum formats: + +- `host:port:/znode` +- `h1,h2:port:/znode` +- `h1:p1,h2:p2:/znode` + +### Flags + +| Short | Long | Required | Default | Purpose | +| --------- | --------------------- | :------: | -------------------- | ------------------------------------------------------------------------------------------------------------------------ | +| `-tn` | `--table-name` | yes | — | Source table (physical name; index physical names are also accepted). | +| `-tc` | `--target-cluster` | yes | — | ZK quorum of the target cluster. | +| `-s` | `--schema` | no | — | Phoenix schema name. | +| `-tenant` | `--tenant-id` | no | — | Tenant id for tenant-specific sync. | +| `-ft` | `--from-time` | no | `0` | Lower bound of the cell-timestamp window, in ms. | +| `-tt` | `--to-time` | no | `now - 1 hour` | Upper bound; also used as `CURRENT_SCN`. The 1-hour buffer gives async replication time to catch up. | +| `-cs` | `--chunk-size` | no | `1073741824` (1 GiB) | Approximate chunk size in bytes. Smaller chunks narrow the divergence search radius at the cost of more checkpoint rows. | +| `-rs` | `--raw-scan` | no | `false` | Include delete markers. | +| `-rav` | `--read-all-versions` | no | `false` | Compare every cell version, not just the latest. | +| `-coal` | `--coalesce-split` | no | `false` | Coalesce multiple source regions into one mapper. | +| `-runfg` | `--run-foreground` | no | `false` | Block until the job completes (default is fire-and-forget submit). | +| `-dr` | `--dry-run` | no | `false` | Marker only — reserved for a future auto-repair extension. | +| `-h` | `--help` | no | — | Print help and exit. | + +The mapper count is implicitly the number of source-table regions (one +mapper per region) unless `--coalesce-split` is set. + +## Output [#sync-table-output] + +### MapReduce counters + +When `--run-foreground` is set, the tool logs counters from the +`PhoenixSyncTableMapper$SyncCounters` group: + +- `MAPPERS_VERIFIED`, `MAPPERS_MISMATCHED` +- `CHUNKS_VERIFIED`, `CHUNKS_MISMATCHED` +- `SOURCE_ROWS_PROCESSED`, `TARGET_ROWS_PROCESSED` + +### `PHOENIX_SYNC_TABLE_CHECKPOINT` + +The tool auto-creates a Phoenix system table on the **source** cluster (90-day +TTL, Snappy compression) with one row per chunk and per region. To list +divergences from the last run: + +```sql +SELECT START_ROW_KEY, END_ROW_KEY, COUNTERS, EXECUTION_END_TIME +FROM PHOENIX_SYNC_TABLE_CHECKPOINT +WHERE TABLE_NAME = 'MY_TABLE' + AND TARGET_CLUSTER = 'zk1,zk2,zk3:2181:/hbase' + AND TYPE = 'CHUNK' + AND STATUS = 'MISMATCHED'; +``` + +Each row carries `STATUS` (`VERIFIED` or `MISMATCHED`), `TYPE` (`CHUNK` or +`REGION`), the key range, and a comma-separated `COUNTERS` string with +per-chunk source and target row counts. + +### Resumability + +A re-run of the same `(table, target, from-time, to-time, tenant)` tuple +picks up where the previous run left off — already-verified sub-ranges are +skipped. + +## Prerequisites [#sync-table-prereqs] + +- **Cross-cluster line of sight.** Mapper YARN nodes need ZooKeeper and RPC + reachability to **both** clusters' RegionServers. +- **Both clusters must run Phoenix 5.3.1+.** +- **Live read, not snapshot-based.** Both clusters are scanned through the + regular Phoenix read path. +- **Kerberos** delegation tokens for the target cluster are acquired + automatically when security is enabled. +- The submitter principal needs `READ` on the physical HBase tables on both + clusters, plus `WRITE` to `PHOENIX_SYNC_TABLE_CHECKPOINT` on the source. +- Views and logical (not physical) index names are rejected. Pass the + physical index table name to validate an index. + +## Tuning [#sync-table-tuning] + +`--chunk-size` is the main lever: + +- Larger chunks (e.g. 4 GiB) reduce checkpoint rows and per-chunk overhead + but make every mismatch report a coarser range. +- Smaller chunks (e.g. 64 MiB) narrow the mismatch search radius and produce + more checkpoint rows. + +The tool runs at long-scan timescales. Adjust these client-side timeouts +(set in the Hadoop `Configuration` the job is submitted with) if you see +scanner timeouts on very large regions: + +| Property | Default | +| ------------------------------------------- | ------------ | +| `phoenix.sync.table.query.timeout` | ~150 minutes | +| `phoenix.sync.table.rpc.timeout` | 30 minutes | +| `phoenix.sync.table.client.scanner.timeout` | 30 minutes | +| `phoenix.sync.table.rpc.retries.counter` | 5 | + +## Limitations [#sync-table-limitations] + +- **Detection only.** Mismatched chunks are recorded but not repaired in + 5.3.1. `--dry-run` is a marker reserved for a future auto-repair pass. +- **No views.** Only physical tables and index physical names are accepted. +- The default `--to-time` is `now - 1 hour`. To compare data written less + than one hour ago, pass an explicit `--to-time`. + +## See also [#sync-table-see-also] + +- [Bulk Loading](/docs/features/bulk-loading) — the other heavyweight + data-movement MapReduce tool that ships with Phoenix. +- [Change Data Capture](/docs/features/change-data-capture) — for streaming + delta capture instead of post-hoc verification. diff --git a/app/pages/_docs/docs/_mdx/(multi-page)/features/row-size.mdx b/app/pages/_docs/docs/_mdx/(multi-page)/features/row-size.mdx new file mode 100644 index 00000000..01b9317e --- /dev/null +++ b/app/pages/_docs/docs/_mdx/(multi-page)/features/row-size.mdx @@ -0,0 +1,91 @@ +--- +title: "ROW_SIZE() and RAW_ROW_SIZE()" +description: "Built-in SQL functions that return the on-the-wire HBase byte footprint of a row — useful for hot-row diagnosis, capacity planning, and finding outliers directly from SQL." +--- + +`ROW_SIZE()` and `RAW_ROW_SIZE()` return the HBase byte footprint of the row +currently being scanned. Both are zero-argument scalar functions that return +`UNSIGNED_LONG`. Available in Phoenix 5.3.1 +([PHOENIX-7705](https://issues.apache.org/jira/browse/PHOENIX-7705)). + +| | `ROW_SIZE()` | `RAW_ROW_SIZE()` | +| ----------- | ----------------------------------- | ------------------------------------------------- | +| Arguments | none | none | +| Return type | `UNSIGNED_LONG` | `UNSIGNED_LONG` | +| Counts | latest visible version of each cell | all retained cell versions **and** delete markers | + +Each cell's contribution is its full HBase footprint — row key, column +family, qualifier, timestamp, type byte, tags, and value — so the result is +**not** equal to the sum of user-visible column value lengths. + +## Usage [#row-size-usage] + +`ROW_SIZE()` is only valid **as an argument to an aggregate** in the `SELECT` +list, or inside a `WHERE` clause. A bare projection (`SELECT ROW_SIZE() FROM t`) +is rejected at compile time. + +### Total table footprint + +```sql +SELECT SUM(ROW_SIZE()) FROM my_table; +``` + +### Per-row size + +Group by the primary key so each group is a single row: + +```sql +SELECT SUM(ROW_SIZE()) FROM my_table GROUP BY id; +``` + +### Distribution + +```sql +SELECT AVG(ROW_SIZE()), MIN(ROW_SIZE()), MAX(ROW_SIZE()) FROM my_table; +``` + +### Find rows whose footprint exceeds a threshold + +`ROW_SIZE()` is also valid in `WHERE`: + +```sql +SELECT COUNT(1) +FROM my_table +WHERE ROW_SIZE() > 1024 AND status = 'ACTIVE'; +``` + +### Including delete markers and old versions + +```sql +SELECT organization_id, SUM(RAW_ROW_SIZE()) +FROM my_table +GROUP BY organization_id; +``` + +`RAW_ROW_SIZE()` counts every retained cell version and the bytes of any +delete-family or delete-column tombstones — useful for measuring the +post-compaction-debt footprint of a row. + +## Limitations and caveats [#row-size-caveats] + +- **Wrap in an aggregate.** `SELECT ROW_SIZE() FROM t` is rejected. Use + `SUM(ROW_SIZE())` (with `GROUP BY <pk>` for per-row values). +- **Forces a full row read.** A query using either function reads more bytes + per row than the same query without it, because the scan can no longer + use empty-column / key-only / encoded-qualifier optimizations. Reach for + these functions for diagnostics, not hot-path scans. +- **`RAW_ROW_SIZE()` reads all versions and tombstones.** Both the byte count + and the row count it produces will exceed `ROW_SIZE()` on the same data. +- **Cell footprint, not user-data size.** Adding a column generally + increases `ROW_SIZE()` by more than that column's value byte length. +- **Measures whatever physical row the planner scans.** If the optimizer + chooses a secondary index, `ROW_SIZE()` will measure the index row, not + the data row. Pin the plan with a hint (e.g. `/*+ NO_INDEX */`) if you + need a specific physical answer. + +## See also [#row-size-see-also] + +- [Metrics](/docs/features/metrics) — runtime measurements on the client + side (scan bytes, mutation bytes, scan latency, etc.). +- [Statistics Collection](/docs/features/statistics-collection) — aggregate + estimates without scanning every row. diff --git a/app/pages/_docs/docs/_mdx/(multi-page)/features/secondary-indexes.mdx b/app/pages/_docs/docs/_mdx/(multi-page)/features/secondary-indexes.mdx index a8a1f12b..524394ee 100644 --- a/app/pages/_docs/docs/_mdx/(multi-page)/features/secondary-indexes.mdx +++ b/app/pages/_docs/docs/_mdx/(multi-page)/features/secondary-indexes.mdx @@ -48,6 +48,8 @@ Each are useful in different scenarios and have their own failure profiles and p Global indexing targets _read heavy_ uses cases. With global indexes, all the performance penalties for indexes occur at write time. We intercept the data table updates on write ([DELETE](/docs/grammar#delete), [UPSERT VALUES](/docs/grammar#upsert-values) and [UPSERT SELECT](/docs/grammar#upsert-select)), build the index update and then sent any necessary updates to all interested index tables. At read time, Phoenix will select the index table to use that will produce the fastest query t [...] +For write-heavy workloads where synchronous index maintenance is the bottleneck and a bounded staleness window on the index is acceptable, a global index can be created with `CONSISTENCY=EVENTUAL` to move its maintenance off the data-table write path. See [Eventually Consistent Global Indexes](/docs/features/eventually-consistent-indexes). + ## Local Indexes Local indexing targets _write heavy_, _space constrained_ use cases. Just like with global indexes, Phoenix will automatically select whether or not to use a local index at query-time. With local indexes, index data and table data co-reside on same server preventing any network overhead during writes. Local indexes can be used even when the query isn't fully covered (i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table). Unlike global [...] diff --git a/app/pages/_landing/downloads/content.md b/app/pages/_landing/downloads/content.md index 57843259..6cfa7a95 100644 --- a/app/pages/_landing/downloads/content.md +++ b/app/pages/_landing/downloads/content.md @@ -11,6 +11,7 @@ Current release 4.16.1 can run on Apache HBase 1.3, 1.4, 1.5 and 1.6. Current release 5.1.3 can run on Apache HBase 2.1, 2.2, 2.3, 2.4 and 2.5. Current release 5.2.1 can run on Apache HBase 2.4, 2.5 and 2.6 versions. Current release 5.3.0 can run on Apache HBase 2.5 and 2.6 versions. +Current release 5.3.1 can run on Apache HBase 2.5 and 2.6 versions. Please follow the appropriate link depending on your HBase version. @@ -18,6 +19,7 @@ Please follow the appropriate link depending on your HBase version. | Phoenix Version | Release Date | Download | Release Notes | Changes | | --------------- | ------------ | ------------------------------------------------- | ------------------------------- | ----------------------------------------------------------------------------------------------------- | +| 5.3.1 | 18 May 2026 | [Download](https://downloads.apache.org/phoenix/) | [Release Notes](/release-notes) | [Changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334393&projectId=12315120) | | 5.3.0 | 2 Oct 2025 | [Download](https://downloads.apache.org/phoenix/) | [Release Notes](/release-notes) | [Changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334393&projectId=12315120) | | 5.2.1 | 12 Nov 2024 | [Download](https://downloads.apache.org/phoenix/) | [Release Notes](/release-notes) | [Changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334393&projectId=12315120) | | 5.2.0 | 16 Apr 2024 | [Download](https://downloads.apache.org/phoenix/) | [Release Notes](/release-notes) | [Changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334393&projectId=12315120) | diff --git a/app/pages/_landing/home/whats-new.tsx b/app/pages/_landing/home/whats-new.tsx index ebe46397..5d81ae2c 100644 --- a/app/pages/_landing/home/whats-new.tsx +++ b/app/pages/_landing/home/whats-new.tsx @@ -16,7 +16,14 @@ // limitations under the License. // -import { ArrowRight, Braces, Globe, RadioTower, Sparkles } from "lucide-react"; +import { + ArrowRight, + Braces, + Globe, + RadioTower, + Sparkles, + Workflow +} from "lucide-react"; import { Link } from "@/components/link"; export function WhatsNewSection() { @@ -38,6 +45,12 @@ export function WhatsNewSection() { desc: "Stream row-level changes as ordered, partitioned events — read with standard SQL, with full split/merge lineage.", href: "/docs/features/change-data-capture", Icon: RadioTower + }, + { + title: "Eventually Consistent Indexes", + desc: "Move global secondary index maintenance off the write path for write-heavy workloads — higher throughput, bounded staleness, no query-side changes.", + href: "/docs/features/eventually-consistent-indexes", + Icon: Workflow } ]; @@ -47,16 +60,16 @@ export function WhatsNewSection() { <div className="mb-8 text-center"> <div className="bg-primary/10 text-primary border-primary/20 inline-flex items-center gap-2 rounded-full border px-3 py-1 text-xs font-semibold tracking-wide uppercase"> <Sparkles className="size-3.5" aria-hidden /> - What's New in 5.3.0 + What's New in 5.3.1 </div> <h2 className="mt-4 text-3xl font-semibold tracking-tight md:text-4xl"> Latest Phoenix Highlights </h2> <p className="text-muted-foreground mt-2"> - Recent capabilities now available in the 5.3.0 release. + Recent capabilities now available in the 5.3.1 release. </p> </div> - <div className="grid grid-cols-1 gap-4 sm:grid-cols-2 lg:grid-cols-3"> + <div className="grid grid-cols-1 gap-4 sm:grid-cols-2 lg:grid-cols-2 xl:grid-cols-4"> {items.map(({ title, desc, href, Icon }) => ( <Link key={title} @@ -85,10 +98,10 @@ export function WhatsNewSection() { </div> <div className="mt-8 text-center"> <Link - to="/recent-improvements#release-5-3-0" + to="/recent-improvements#release-5-3-1" className="text-primary inline-flex items-center gap-1 text-sm font-medium hover:underline" > - See all features in 5.3.0 + See all features in 5.3.1 <ArrowRight className="size-3.5" aria-hidden /> </Link> </div> diff --git a/app/pages/_landing/recent-improvements/content.mdx b/app/pages/_landing/recent-improvements/content.mdx index 32f132b5..86be0f6e 100644 --- a/app/pages/_landing/recent-improvements/content.mdx +++ b/app/pages/_landing/recent-improvements/content.mdx @@ -1,5 +1,14 @@ # New Features +## 5.3.1 [#release-5-3-1] + +1. **[Eventually Consistent Global Indexes](/docs/features/eventually-consistent-indexes)**. Relaxes the strong-consistency contract on global indexes for write-heavy workloads that can tolerate brief index/data divergence in exchange for higher write throughput and reduced write amplification ([PHOENIX-7794](https://issues.apache.org/jira/browse/PHOENIX-7794)). +1. **Multi-row `UPSERT ... VALUES`**. Standard SQL multi-row value constructors are now supported in a single `UPSERT` statement, eliminating per-row round-trips for client-side bulk inserts ([PHOENIX-7198](https://issues.apache.org/jira/browse/PHOENIX-7198)). +1. **[`ROW_SIZE()` SQL function](/docs/features/row-size)**. Returns the serialized byte size of a row — useful for hot-row diagnosis and capacity planning directly from SQL ([PHOENIX-7705](https://issues.apache.org/jira/browse/PHOENIX-7705)). +1. **[Graceful Failover for Phoenix HA](/docs/features/high-availability#ha-graceful)**. Coordinated active-to-standby role transitions in the Failover HA policy so in-flight clients drain cleanly during planned failover instead of being hard-killed ([PHOENIX-7493](https://issues.apache.org/jira/browse/PHOENIX-7493)). +1. **Improved Scan Metrics**. Phoenix now surfaces HBase's [per-scan latency metrics](/docs/features/metrics#scan-latency-metrics) end-to-end and adds [top-N slowest parallel scan reporting](/docs/features/metrics#top-n-slowest-scans), giving operators per-query visibility into scan tail-latency without resorting to RegionServer-level metrics ([PHOENIX-7704](https://issues.apache.org/jira/browse/PHOENIX-7704), [PHOENIX-7729](https://issues.apache.org/jira/browse/PHOENIX-7729)). +1. **[`PhoenixSyncTable` data-validation tool](/docs/features/phoenix-sync-table)**. New MapReduce-based tool that compares row data between a source and a target cluster for the same Phoenix table — useful for verifying replication, snapshot-based migrations, and DR drills ([PHOENIX-7751](https://issues.apache.org/jira/browse/PHOENIX-7751)). + ## 5.3.0 [#release-5-3-0] 1. **[Change Data Capture](/docs/features/change-data-capture)**. Stream row-level changes as ordered, partitioned events with configurable pre/post/change scopes. Includes TTL-delete events and partition tracking that survives region splits, merges, and table drops ([PHOENIX-7001](https://issues.apache.org/jira/browse/PHOENIX-7001)). diff --git a/phoenix-version.ts b/phoenix-version.ts index 475e6ea8..2566023d 100644 --- a/phoenix-version.ts +++ b/phoenix-version.ts @@ -17,4 +17,4 @@ // // Update this value when a new Phoenix release becomes current in docs. -export const PHOENIX_VERSION = "5.3.0"; +export const PHOENIX_VERSION = "5.3.1";
