[jira] [Commented] (KUDU-3577) Altering a table with per-range hash partitions might make the table unusable

ASF subversion and git services (Jira) Fri, 07 Jun 2024 15:47:14 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853303#comment-17853303
 ]


ASF subversion and git services commented on KUDU-3577:
-------------------------------------------------------

Commit d254964e6037f6ae0c9459d99cffa13303596f07 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=d254964e6 ]

KUDU-3577 fix altering tables with custom hash schemas

Since partition boundaries for ranges with custom hash schemas are
represented via RowOperationsPB (see RangeWithHashSchemaPB::range_bounds
field in src/kudu/common/common.proto), addressing this design defect
requires re-encoding the information as a part of PartitionSchemaPB
stored in the system catalog upon particular modifications of the
table's schema.  This patch does exactly so, and also adds corresponding
test scenario which would fail without the fix.

A proper solution would be using primary-key-only projection of a
table's schema to encode the information on range boundaries, but it's
necessary to provide backwards compatibility with already released Kudu
clients.  See KUDU-3577 for details.

Change-Id: I21a775538063768b986edd2b6bc25d03360b5216
Reviewed-on: http://gerrit.cloudera.org:8080/21486
Tested-by: Alexey Serbin <ale...@apache.org>
Reviewed-by: Mahesh Reddy <mre...@cloudera.com>
Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com>


> Altering a table with per-range hash partitions might make the table unusable
> -----------------------------------------------------------------------------
>
>                 Key: KUDU-3577
>                 URL: https://issues.apache.org/jira/browse/KUDU-3577
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, master, tserver
>    Affects Versions: 1.17.0
>            Reporter: Alexey Serbin
>            Priority: Major
>
> For particular table schemas with per-range hash schemas, dropping a nullable 
> column from might make the table unusable.  A workaround exists: just add the 
> dropped column back using the {{kudu table add_column}} CLI tool.  For 
> example, for the reproduction scenario below, use the following command to 
> restore the access to the table's data:
> {noformat}
> $ kudu table add_column $M test city string
> {noformat}
> As for the reproduction scenario, see below for the sequence of {{kudu}} CLI 
> commands.
> Set environment variable for the Kudu cluster's RPC endpoint:
> {noformat}
> $ export M=<master_RPC_address(es)>
> {noformat}
> Create a table with two range partitions.  It's crucial that the {{city}} 
> column is nullable.
> {noformat}
> $ kudu table create $M '{ "table_name": "test", "schema": { "columns": [ { 
> "column_name": "id", "column_type": "INT64" }, { "column_name": "name", 
> "column_type": "STRING" }, { "column_name": "age", "column_type": "INT32" }, 
> { "column_name": "city", "column_type": "STRING", "is_nullable": true } ], 
> "key_column_names": ["id", "name", "age"] }, "partition": { 
> "hash_partitions": [ {"columns": ["id"], "num_buckets": 4, "seed": 1}, 
> {"columns": ["name"], "num_buckets": 4, "seed": 2} ], "range_partition": { 
> "columns": ["age"], "range_bounds": [ { "lower_bound": {"bound_type": 
> "inclusive", "bound_values": ["30"]}, "upper_bound": {"bound_type": 
> "exclusive", "bound_values": ["60"]} }, { "lower_bound": {"bound_type": 
> "inclusive", "bound_values": ["60"]}, "upper_bound": {"bound_type": 
> "exclusive", "bound_values": ["90"]} } ] } }, "num_replicas": 1 }'
> {noformat}
> Add an extra range partition with custom hash schema:
> {noformat}
> $ kudu table add_range_partition $M test '[90]' '[120]' --hash_schema 
> '{"hash_schema": [ {"columns": ["id"], "num_buckets": 3, "seed": 5}, 
> {"columns": ["name"], "num_buckets": 3, "seed": 6} ]}'
> {noformat}
> Check the updated partitioning info:
> {noformat}
> $ kudu table describe $M test
> TABLE test (
>     id INT64 NOT NULL,
>     name STRING NOT NULL,
>     age INT32 NOT NULL,
>     city STRING NULLABLE,
>     PRIMARY KEY (id, name, age)
> )
> HASH (id) PARTITIONS 4 SEED 1,
> HASH (name) PARTITIONS 4 SEED 2,
> RANGE (age) (
>     PARTITION 30 <= VALUES < 60,
>     PARTITION 60 <= VALUES < 90,
>     PARTITION 90 <= VALUES < 120 HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 3
> )
> OWNER root
> REPLICAS 1
> COMMENT 
> {noformat}
> Drop the {{city}} column:
> {noformat}
> $ kudu table delete_column $M test city
> {noformat}
> Now try to run the {{kudu table describe}} against the table once the 
> {{city}} column is dropped.  It errors out with {{Invalid argument}}:
> {noformat}
> $ kudu table describe $M test
> Invalid argument: Invalid split row type UNKNOWN
> {noformat}
> A similar issue manifests itself when trying to run {{kudu table scan}} 
> against the table:
> {noformat}
> $ kudu table scan $M test
> Invalid argument: Invalid split row type UNKNOWN
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-3577) Altering a table with per-range hash partitions might make the table unusable

Reply via email to