[ https://issues.apache.org/jira/browse/KUDU-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853303#comment-17853303 ]
ASF subversion and git services commented on KUDU-3577: ------------------------------------------------------- Commit d254964e6037f6ae0c9459d99cffa13303596f07 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=d254964e6 ] KUDU-3577 fix altering tables with custom hash schemas Since partition boundaries for ranges with custom hash schemas are represented via RowOperationsPB (see RangeWithHashSchemaPB::range_bounds field in src/kudu/common/common.proto), addressing this design defect requires re-encoding the information as a part of PartitionSchemaPB stored in the system catalog upon particular modifications of the table's schema. This patch does exactly so, and also adds corresponding test scenario which would fail without the fix. A proper solution would be using primary-key-only projection of a table's schema to encode the information on range boundaries, but it's necessary to provide backwards compatibility with already released Kudu clients. See KUDU-3577 for details. Change-Id: I21a775538063768b986edd2b6bc25d03360b5216 Reviewed-on: http://gerrit.cloudera.org:8080/21486 Tested-by: Alexey Serbin <ale...@apache.org> Reviewed-by: Mahesh Reddy <mre...@cloudera.com> Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com> > Altering a table with per-range hash partitions might make the table unusable > ----------------------------------------------------------------------------- > > Key: KUDU-3577 > URL: https://issues.apache.org/jira/browse/KUDU-3577 > Project: Kudu > Issue Type: Bug > Components: client, master, tserver > Affects Versions: 1.17.0 > Reporter: Alexey Serbin > Priority: Major > > For particular table schemas with per-range hash schemas, dropping a nullable > column from might make the table unusable. A workaround exists: just add the > dropped column back using the {{kudu table add_column}} CLI tool. For > example, for the reproduction scenario below, use the following command to > restore the access to the table's data: > {noformat} > $ kudu table add_column $M test city string > {noformat} > As for the reproduction scenario, see below for the sequence of {{kudu}} CLI > commands. > Set environment variable for the Kudu cluster's RPC endpoint: > {noformat} > $ export M=<master_RPC_address(es)> > {noformat} > Create a table with two range partitions. It's crucial that the {{city}} > column is nullable. > {noformat} > $ kudu table create $M '{ "table_name": "test", "schema": { "columns": [ { > "column_name": "id", "column_type": "INT64" }, { "column_name": "name", > "column_type": "STRING" }, { "column_name": "age", "column_type": "INT32" }, > { "column_name": "city", "column_type": "STRING", "is_nullable": true } ], > "key_column_names": ["id", "name", "age"] }, "partition": { > "hash_partitions": [ {"columns": ["id"], "num_buckets": 4, "seed": 1}, > {"columns": ["name"], "num_buckets": 4, "seed": 2} ], "range_partition": { > "columns": ["age"], "range_bounds": [ { "lower_bound": {"bound_type": > "inclusive", "bound_values": ["30"]}, "upper_bound": {"bound_type": > "exclusive", "bound_values": ["60"]} }, { "lower_bound": {"bound_type": > "inclusive", "bound_values": ["60"]}, "upper_bound": {"bound_type": > "exclusive", "bound_values": ["90"]} } ] } }, "num_replicas": 1 }' > {noformat} > Add an extra range partition with custom hash schema: > {noformat} > $ kudu table add_range_partition $M test '[90]' '[120]' --hash_schema > '{"hash_schema": [ {"columns": ["id"], "num_buckets": 3, "seed": 5}, > {"columns": ["name"], "num_buckets": 3, "seed": 6} ]}' > {noformat} > Check the updated partitioning info: > {noformat} > $ kudu table describe $M test > TABLE test ( > id INT64 NOT NULL, > name STRING NOT NULL, > age INT32 NOT NULL, > city STRING NULLABLE, > PRIMARY KEY (id, name, age) > ) > HASH (id) PARTITIONS 4 SEED 1, > HASH (name) PARTITIONS 4 SEED 2, > RANGE (age) ( > PARTITION 30 <= VALUES < 60, > PARTITION 60 <= VALUES < 90, > PARTITION 90 <= VALUES < 120 HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 3 > ) > OWNER root > REPLICAS 1 > COMMENT > {noformat} > Drop the {{city}} column: > {noformat} > $ kudu table delete_column $M test city > {noformat} > Now try to run the {{kudu table describe}} against the table once the > {{city}} column is dropped. It errors out with {{Invalid argument}}: > {noformat} > $ kudu table describe $M test > Invalid argument: Invalid split row type UNKNOWN > {noformat} > A similar issue manifests itself when trying to run {{kudu table scan}} > against the table: > {noformat} > $ kudu table scan $M test > Invalid argument: Invalid split row type UNKNOWN > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)