GoGoWen opened a new pull request, #11742:
URL: https://github.com/apache/doris/pull/11742

   # Proposed changes
   
   enhance loading from parquet or orc file, when given column not exist in 
file, set to null default instead of fails with "Invalid Column"
   
   ## Problem summary
   1, create table:
   CREATE TABLE `t22` (
     `name` bigint(20) NOT NULL COMMENT "",
     `id` bigint(20) NOT NULL COMMENT "",
     `id2` bigint(20) NULL COMMENT "",
     `impressions` bigint(20) SUM NULL DEFAULT "0" COMMENT "用户总展现",
     `click` double SUM NULL DEFAULT "0" COMMENT "用户总点击",
     `cost` bigint(20) SUM NULL DEFAULT "0" COMMENT "用户总消费"
   ) ENGINE=OLAP
   AGGREGATE KEY(`name`, `id`, `id2`)
   COMMENT "OLAP"
   PARTITION BY RANGE(`name`)
   (PARTITION p201901 VALUES [("1"), ("100")))
   DISTRIBUTED BY HASH(`id`) BUCKETS 16
   PROPERTIES (
   "replication_allocation" = "tag.location.default: 3",
   "in_memory" = "false",
   "storage_format" = "V2"
   )
   2, try to load data in parquet or orc  with  broker load 
   2.1 data like below:
   name,id,click,impressions,cost
   1,111,1111,11111,111111
   1,11,,1111,11111
   22,222,2222,22222,22222
   3,33,333,3333,33333
   4,44,444,4444,44444
   5,55,555,5555,55555
   
   2.2 broker load without columns  like below:
   LOAD LABEL label1 (DATA INFILE (hdfs://filepath) into table 't22' format as 
"parquet" with broker broker_name (....)
   
   3, the result should like below instead of failed with "Invalid Column with 
", the column id2 is NULL as it not exist in file.
   +------+------+------+-------------+-------+--------+
   | name | id   | id2  | impressions | click | cost   |
   +------+------+------+-------------+-------+--------+
   |   22 |  222 | NULL |       22222 |  2222 |  22222 |
   |   11 |  111 | NULL |       11111 |  1111 | 111111 |
   |    5 |   55 | NULL |        5555 |   555 |  55555 |
   |    4 |   44 | NULL |        4444 |   444 |  44444 |
   |    3 |   33 | NULL |        3333 |   333 |  33333 |
   |    1 |   11 | NULL |        1111 |  NULL |  11111 |
   +------+------+------+-------------+-------+--------+
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [Y ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to