[jira] [Updated] (KUDU-3577) Altering a table with per-range hash partitions might make the table unusable

Alexey Serbin (Jira) Thu, 15 Aug 2024 17:44:24 -0700


     [ 
https://issues.apache.org/jira/browse/KUDU-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Serbin updated KUDU-3577:
--------------------------------
    Description: 
For tables with per-range hash schemas, dropping or adding a particular number 
of columns might make the table inaccessible for Kudu client applications.

For example, dropping a nullable column from a table with per-range hash 
bucketing might make the table unusable.  In this particular case, a workaround 
exists: just add the dropped column back using the {{kudu table add_column}} 
CLI tool.

For example, in the reproduction scenario below, use the following command to 
restore the access to the table's data:
{noformat}
$ kudu table add_column $M test city string
{noformat}

As for the reproduction scenario, see below for the sequence of {{kudu}} CLI 
commands.

Set environment variable for the Kudu cluster's RPC endpoint:
{noformat}
$ export M=<master_RPC_address(es)>
{noformat}

Create a table with two range partitions.  It's crucial that the {{city}} 
column is nullable.
{noformat}
$ kudu table create $M '{ "table_name": "test", "schema": { "columns": [ { 
"column_name": "id", "column_type": "INT64" }, { "column_name": "name", 
"column_type": "STRING" }, { "column_name": "age", "column_type": "INT32" }, { 
"column_name": "city", "column_type": "STRING", "is_nullable": true } ], 
"key_column_names": ["id", "name", "age"] }, "partition": { "hash_partitions": 
[ {"columns": ["id"], "num_buckets": 4, "seed": 1}, {"columns": ["name"], 
"num_buckets": 4, "seed": 2} ], "range_partition": { "columns": ["age"], 
"range_bounds": [ { "lower_bound": {"bound_type": "inclusive", "bound_values": 
["30"]}, "upper_bound": {"bound_type": "exclusive", "bound_values": ["60"]} }, 
{ "lower_bound": {"bound_type": "inclusive", "bound_values": ["60"]}, 
"upper_bound": {"bound_type": "exclusive", "bound_values": ["90"]} } ] } }, 
"num_replicas": 1 }'
{noformat}

Add an extra range partition with custom hash schema:
{noformat}
$ kudu table add_range_partition $M test '[90]' '[120]' --hash_schema 
'{"hash_schema": [ {"columns": ["id"], "num_buckets": 3, "seed": 5}, 
{"columns": ["name"], "num_buckets": 3, "seed": 6} ]}'
{noformat}

Check the updated partitioning info:
{noformat}
$ kudu table describe $M test
TABLE test (
    id INT64 NOT NULL,
    name STRING NOT NULL,
    age INT32 NOT NULL,
    city STRING NULLABLE,
    PRIMARY KEY (id, name, age)
)
HASH (id) PARTITIONS 4 SEED 1,
HASH (name) PARTITIONS 4 SEED 2,
RANGE (age) (
    PARTITION 30 <= VALUES < 60,
    PARTITION 60 <= VALUES < 90,
    PARTITION 90 <= VALUES < 120 HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 3
)
OWNER root
REPLICAS 1
COMMENT 
{noformat}

Drop the {{city}} column:
{noformat}
$ kudu table delete_column $M test city
{noformat}

Now try to run the {{kudu table describe}} against the table once the {{city}} 
column is dropped.  It errors out with {{Invalid argument}}:
{noformat}
$ kudu table describe $M test
Invalid argument: Invalid split row type UNKNOWN
{noformat}

A similar issue manifests itself when trying to run {{kudu table scan}} against 
the table:
{noformat}
$ kudu table scan $M test
Invalid argument: Invalid split row type UNKNOWN
{noformat}

  was:
For particular table schemas with per-range hash schemas, dropping a nullable 
column from might make the table unusable.  A workaround exists: just add the 
dropped column back using the {{kudu table add_column}} CLI tool.  For example, 
for the reproduction scenario below, use the following command to restore the 
access to the table's data:
{noformat}
$ kudu table add_column $M test city string
{noformat}

As for the reproduction scenario, see below for the sequence of {{kudu}} CLI 
commands.

Set environment variable for the Kudu cluster's RPC endpoint:
{noformat}
$ export M=<master_RPC_address(es)>
{noformat}

Create a table with two range partitions.  It's crucial that the {{city}} 
column is nullable.
{noformat}
$ kudu table create $M '{ "table_name": "test", "schema": { "columns": [ { 
"column_name": "id", "column_type": "INT64" }, { "column_name": "name", 
"column_type": "STRING" }, { "column_name": "age", "column_type": "INT32" }, { 
"column_name": "city", "column_type": "STRING", "is_nullable": true } ], 
"key_column_names": ["id", "name", "age"] }, "partition": { "hash_partitions": 
[ {"columns": ["id"], "num_buckets": 4, "seed": 1}, {"columns": ["name"], 
"num_buckets": 4, "seed": 2} ], "range_partition": { "columns": ["age"], 
"range_bounds": [ { "lower_bound": {"bound_type": "inclusive", "bound_values": 
["30"]}, "upper_bound": {"bound_type": "exclusive", "bound_values": ["60"]} }, 
{ "lower_bound": {"bound_type": "inclusive", "bound_values": ["60"]}, 
"upper_bound": {"bound_type": "exclusive", "bound_values": ["90"]} } ] } }, 
"num_replicas": 1 }'
{noformat}

Add an extra range partition with custom hash schema:
{noformat}
$ kudu table add_range_partition $M test '[90]' '[120]' --hash_schema 
'{"hash_schema": [ {"columns": ["id"], "num_buckets": 3, "seed": 5}, 
{"columns": ["name"], "num_buckets": 3, "seed": 6} ]}'
{noformat}

Check the updated partitioning info:
{noformat}
$ kudu table describe $M test
TABLE test (
    id INT64 NOT NULL,
    name STRING NOT NULL,
    age INT32 NOT NULL,
    city STRING NULLABLE,
    PRIMARY KEY (id, name, age)
)
HASH (id) PARTITIONS 4 SEED 1,
HASH (name) PARTITIONS 4 SEED 2,
RANGE (age) (
    PARTITION 30 <= VALUES < 60,
    PARTITION 60 <= VALUES < 90,
    PARTITION 90 <= VALUES < 120 HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 3
)
OWNER root
REPLICAS 1
COMMENT 
{noformat}

Drop the {{city}} column:
{noformat}
$ kudu table delete_column $M test city
{noformat}

Now try to run the {{kudu table describe}} against the table once the {{city}} 
column is dropped.  It errors out with {{Invalid argument}}:
{noformat}
$ kudu table describe $M test
Invalid argument: Invalid split row type UNKNOWN
{noformat}

A similar issue manifests itself when trying to run {{kudu table scan}} against 
the table:
{noformat}
$ kudu table scan $M test
Invalid argument: Invalid split row type UNKNOWN
{noformat}


> Altering a table with per-range hash partitions might make the table unusable
> -----------------------------------------------------------------------------
>
>                 Key: KUDU-3577
>                 URL: https://issues.apache.org/jira/browse/KUDU-3577
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, master, tserver
>    Affects Versions: 1.17.0
>            Reporter: Alexey Serbin
>            Priority: Major
>             Fix For: 1.17.1 1.18.0
>
>
> For tables with per-range hash schemas, dropping or adding a particular 
> number of columns might make the table inaccessible for Kudu client 
> applications.
> For example, dropping a nullable column from a table with per-range hash 
> bucketing might make the table unusable.  In this particular case, a 
> workaround exists: just add the dropped column back using the {{kudu table 
> add_column}} CLI tool.
> For example, in the reproduction scenario below, use the following command to 
> restore the access to the table's data:
> {noformat}
> $ kudu table add_column $M test city string
> {noformat}
> As for the reproduction scenario, see below for the sequence of {{kudu}} CLI 
> commands.
> Set environment variable for the Kudu cluster's RPC endpoint:
> {noformat}
> $ export M=<master_RPC_address(es)>
> {noformat}
> Create a table with two range partitions.  It's crucial that the {{city}} 
> column is nullable.
> {noformat}
> $ kudu table create $M '{ "table_name": "test", "schema": { "columns": [ { 
> "column_name": "id", "column_type": "INT64" }, { "column_name": "name", 
> "column_type": "STRING" }, { "column_name": "age", "column_type": "INT32" }, 
> { "column_name": "city", "column_type": "STRING", "is_nullable": true } ], 
> "key_column_names": ["id", "name", "age"] }, "partition": { 
> "hash_partitions": [ {"columns": ["id"], "num_buckets": 4, "seed": 1}, 
> {"columns": ["name"], "num_buckets": 4, "seed": 2} ], "range_partition": { 
> "columns": ["age"], "range_bounds": [ { "lower_bound": {"bound_type": 
> "inclusive", "bound_values": ["30"]}, "upper_bound": {"bound_type": 
> "exclusive", "bound_values": ["60"]} }, { "lower_bound": {"bound_type": 
> "inclusive", "bound_values": ["60"]}, "upper_bound": {"bound_type": 
> "exclusive", "bound_values": ["90"]} } ] } }, "num_replicas": 1 }'
> {noformat}
> Add an extra range partition with custom hash schema:
> {noformat}
> $ kudu table add_range_partition $M test '[90]' '[120]' --hash_schema 
> '{"hash_schema": [ {"columns": ["id"], "num_buckets": 3, "seed": 5}, 
> {"columns": ["name"], "num_buckets": 3, "seed": 6} ]}'
> {noformat}
> Check the updated partitioning info:
> {noformat}
> $ kudu table describe $M test
> TABLE test (
>     id INT64 NOT NULL,
>     name STRING NOT NULL,
>     age INT32 NOT NULL,
>     city STRING NULLABLE,
>     PRIMARY KEY (id, name, age)
> )
> HASH (id) PARTITIONS 4 SEED 1,
> HASH (name) PARTITIONS 4 SEED 2,
> RANGE (age) (
>     PARTITION 30 <= VALUES < 60,
>     PARTITION 60 <= VALUES < 90,
>     PARTITION 90 <= VALUES < 120 HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 3
> )
> OWNER root
> REPLICAS 1
> COMMENT 
> {noformat}
> Drop the {{city}} column:
> {noformat}
> $ kudu table delete_column $M test city
> {noformat}
> Now try to run the {{kudu table describe}} against the table once the 
> {{city}} column is dropped.  It errors out with {{Invalid argument}}:
> {noformat}
> $ kudu table describe $M test
> Invalid argument: Invalid split row type UNKNOWN
> {noformat}
> A similar issue manifests itself when trying to run {{kudu table scan}} 
> against the table:
> {noformat}
> $ kudu table scan $M test
> Invalid argument: Invalid split row type UNKNOWN
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KUDU-3577) Altering a table with per-range hash partitions might make the table unusable

Reply via email to