imay opened a new issue #1155: Support format in LoadStmt
URL: https://github.com/apache/incubator-doris/issues/1155
 
 
   **Is your feature request related to a problem? Please describe.**
   
   For current load statement in Doris, only CSV file format is supported. 
However we need support more source file format for load or external table, 
such as Parquet, JSON. 
   
   In this issue, I want to discuss about add format support in our `Load` 
statement.
    
   **Describe the solution you'd like**
   
   Now, our Load statement syntax is 
   
   ```
       LOAD LABEL load_label
       (
       data_desc1[, data_desc2, ...]
       )
       [opt_properties];
   
   data_desc := 
               DATA INFILE
               (
               "file_path1"[, file_path2, ...]
               )
               [NEGATIVE]
               INTO TABLE `table_name`
               [PARTITION (p1, p2)]
               [COLUMNS TERMINATED BY "column_separator"]
               [(column_list)]
               [SET (k1 = func(k2))]
   ```
   I want to add `FORMAT` in `data_desc` clause. If FORMAT is not exist, FORMAT 
is decided by the file name's suffix, if the suffix isn't known by Doris, we 
seem it as a CSV format.
   
   After supporting `FORMAT`, the syntax of `data_desc` will change to 
   
   ```
   data_desc := 
               DATA INFILE
               (
               "file_path1"[, file_path2, ...]
               )
               [NEGATIVE]
               INTO TABLE `table_name`
               [PARTITION (p1, p2)]
               [COLUMNS TERMINATED BY "column_separator"]
               [FORMAT AS format]
               [(column_list)]
               [SET (k1 = func(k2))]
   ```
   
   And for `SET` clause, we currently only support some function. I want to 
support column reference in `column_list`. For example in following statement
   ```
   DATA INFILE ("file_path") INTO TABLE testTable (c1_tmp, c2_tmp) SET 
(c1=c1_tmp, c2=c2_tmp)
   ```
   `c1_tmp` and `c2_tmp` will be the name in source file. because for some 
formats, such as parquet, name is contained in file. so we need to use 
`column_list` to express which fields are we need from source file. And we use 
SET to convert fields in source file to content which is need in Doris table. 
   
   So I would support column reference in SET clause
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org

Reply via email to