github-actions[bot] commented on code in PR #64007: URL: https://github.com/apache/doris/pull/64007#discussion_r3371098337
########## regression-test/suites/external_table_p0/hive/test_hive_view_schema_drift.groovy: ########## @@ -0,0 +1,115 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// Regression test for: LogicalView.computeOutput() IndexOutOfBoundsException when +// an underlying Hive table gains new columns (schema drift) after the Hive view was created. +// +// Repro: +// 1. Create a Hive base table (3 cols) and a native Hive VIEW on it. +// 2. In Doris, register the Hive catalog and query the external view — OK (3 cols). +// 3. ADD COLUMN to the Hive base table via hive_docker. +// 4. REFRESH TABLE <base_table> in Doris (view HMS schema NOT refreshed). +// 5. Query the external view again — used to crash: +// errCode = 2, detailMessage = Index 3 out of bounds for length 3 +// because LogicalView.computeOutput() iterated childOutput (4 slots from the +// re-analyzed view body) but called view.getFullSchema().get(i) on a 3-element +// list (the Hive view's HMS schema at creation time). +// +// The fix: use Math.min(childOutput.size(), fullSchema.size()) as the loop bound, +// preserving the view's declared output contract while preventing the crash. + +suite("test_hive_view_schema_drift", "p0,external,hive_docker") { + + String enabled = context.config.otherConfigs.get("enableHiveTest") + if (enabled == null || !enabled.equalsIgnoreCase("true")) { + logger.info("disable Hive test.") + return; + } + + for (String hivePrefix : ["hive2", "hive3"]) { + setHivePrefix(hivePrefix) + String hms_port = context.config.otherConfigs.get(hivePrefix + "HmsPort") + String externalEnvIp = context.config.otherConfigs.get("externalEnvIp") + String catalog_name = "test_${hivePrefix}_view_schema_drift" + String db = "test_view_schema_drift_db" + String base_table = "test_view_schema_drift_base" + String hive_view = "test_view_schema_drift_view" + + try { + // ---- Register Hive catalog in Doris ---- + sql """drop catalog if exists ${catalog_name}""" + sql """CREATE CATALOG ${catalog_name} PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://${externalEnvIp}:${hms_port}', + 'hadoop.username' = 'hive' + )""" + + // ---- Create Hive database, base table (3 cols), and a native Hive VIEW ---- + // The view is created through hive_docker so it is a native Hive view + // (ExternalView in Doris). Its HMS schema records exactly 3 columns. + hive_docker """drop database if exists ${db} cascade""" + hive_docker """create database ${db}""" + hive_docker """ + create table ${db}.${base_table} ( + id bigint, + name string, + age string + ) + partitioned by (dt string) + stored as parquet + """ + hive_docker """ + create view ${db}.${hive_view} as Review Comment: This regression still does not exercise the overflow path fixed in `LogicalView.computeOutput()`. The Hive view body is defined with an explicit projection (`select id, name, age`), and `BindRelation.parseAndAnalyzeExternalView()` re-analyzes the SQL returned by `HMSExternalTable.getViewText()`. After adding `score` to the base table, that view body still produces only the original three slots, so `childOutput.size()` does not exceed `view.getFullSchema().size()` and the old code would not hit `view.getFullSchema().get(3)`. As a result, the test can pass without the production fix. Please make the view body actually re-expand to include the newly added base-table column (or otherwise assert a pre-fix failure), so this end-to-end test proves the schema-drift crash is covered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
