GoGoWen opened a new pull request, #11742: URL: https://github.com/apache/doris/pull/11742
# Proposed changes enhance loading from parquet or orc file, when given column not exist in file, set to null default instead of fails with "Invalid Column" ## Problem summary 1, create table: CREATE TABLE `t22` ( `name` bigint(20) NOT NULL COMMENT "", `id` bigint(20) NOT NULL COMMENT "", `id2` bigint(20) NULL COMMENT "", `impressions` bigint(20) SUM NULL DEFAULT "0" COMMENT "用户总展现", `click` double SUM NULL DEFAULT "0" COMMENT "用户总点击", `cost` bigint(20) SUM NULL DEFAULT "0" COMMENT "用户总消费" ) ENGINE=OLAP AGGREGATE KEY(`name`, `id`, `id2`) COMMENT "OLAP" PARTITION BY RANGE(`name`) (PARTITION p201901 VALUES [("1"), ("100"))) DISTRIBUTED BY HASH(`id`) BUCKETS 16 PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "in_memory" = "false", "storage_format" = "V2" ) 2, try to load data in parquet or orc with broker load 2.1 data like below: name,id,click,impressions,cost 1,111,1111,11111,111111 1,11,,1111,11111 22,222,2222,22222,22222 3,33,333,3333,33333 4,44,444,4444,44444 5,55,555,5555,55555 2.2 broker load without columns like below: LOAD LABEL label1 (DATA INFILE (hdfs://filepath) into table 't22' format as "parquet" with broker broker_name (....) 3, the result should like below instead of failed with "Invalid Column with ", the column id2 is NULL as it not exist in file. +------+------+------+-------------+-------+--------+ | name | id | id2 | impressions | click | cost | +------+------+------+-------------+-------+--------+ | 22 | 222 | NULL | 22222 | 2222 | 22222 | | 11 | 111 | NULL | 11111 | 1111 | 111111 | | 5 | 55 | NULL | 5555 | 555 | 55555 | | 4 | 44 | NULL | 4444 | 444 | 44444 | | 3 | 33 | NULL | 3333 | 333 | 33333 | | 1 | 11 | NULL | 1111 | NULL | 11111 | +------+------+------+-------------+-------+--------+ Describe your changes. ## Checklist(Required) 1. Does it affect the original behavior: - [Y ] Yes - [ ] No - [ ] I don't know 2. Has unit tests been added: - [ ] Yes - [ ] No - [ ] No Need 3. Has document been added or modified: - [ ] Yes - [ ] No - [ ] No Need 4. Does it need to update dependencies: - [ ] Yes - [ ] No 5. Are there any changes that cannot be rolled back: - [ ] Yes (If Yes, please explain WHY) - [ ] No ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org