github-actions[bot] commented on code in PR #64071:
URL: https://github.com/apache/doris/pull/64071#discussion_r3374434556


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsLoadScanProvider.java:
##########
@@ -195,9 +196,16 @@ private void 
fillContextExprMap(List<NereidsImportColumnDesc> columnDescList, Ne
         // If user does not specify the file field names, generate it by using 
base schema of table.
         // So that the following process can be unified
         boolean specifyFileFieldNames = copiedColumnExprs.stream().anyMatch(p 
-> p.isColumn());
-        if (!specifyFileFieldNames) {
+        if (!specifyFileFieldNames || isFillMissingColumns(fileGroup)) {

Review Comment:
   For routine load this condition will still be false even when the job 
property was set. `KafkaRoutineLoadJob.toNereidsRoutineLoadTaskInfo()` copies 
`jobProperties` into `NereidsRoutineLoadTaskInfo`, but 
`NereidsDataDescription(NereidsLoadTaskInfo)` only copies JSON props like 
`strip_outer_array`, `jsonpaths`, `json_root`, `fuzzy_parse`, 
`read_json_by_line`, and `num_as_string` into `analysisMap`. It never copies 
`fill_missing_columns`, so `analyzeFileFormatProperties()` builds a 
`JsonFileFormatProperties` with the default `false`, and this new branch is not 
taken for the routine-load execution path. Please add the property to the 
Nereids task-info/data-description propagation path and cover it with a test 
that reaches `NereidsLoadScanProvider`.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsLoadScanProvider.java:
##########
@@ -420,15 +428,37 @@ private void 
fillContextExprMap(List<NereidsImportColumnDesc> columnDescList, Ne
     }
 
     /**
-     * if not set sequence column and column size is null or only have deleted 
sign ,return true
+     * Returns true when the sequence column should be auto-added, i.e.,
+     * if not set sequence column and column size is null or only have deleted 
sign,
+     * or fill_missing_columns is enabled, meaning schema will be auto-filled.
      */
-    private boolean shouldAddSequenceColumn(List<NereidsImportColumnDesc> 
columnDescList) {
+    private boolean shouldAddSequenceColumn(List<NereidsImportColumnDesc> 
columnDescList,
+            NereidsBrokerFileGroup fileGroup) {
+        if (isFillMissingColumns(fileGroup)) {
+            return true;
+        }
         if (columnDescList.isEmpty()) {
             return true;
         }
         return columnDescList.size() == 1 && 
columnDescList.get(0).getColumnName().equalsIgnoreCase(Column.DELETE_SIGN);
     }
 
+    /**
+     * Returns true if the file format is JSON and fill_missing_columns is 
enabled. Only meaningful for JSON.
+     */
+    private boolean isFillMissingColumns(NereidsBrokerFileGroup fileGroup) {
+        return fileGroup.getFileFormatProperties() instanceof 
JsonFileFormatProperties
+                && ((JsonFileFormatProperties) 
fileGroup.getFileFormatProperties()).isFillMissingColumns();
+    }
+
+    /**
+     * Returns true if the file format is JSON and fill_missing_columns is 
enabled. Only meaningful for JSON.
+     */
+    private boolean isFillMissingColumns(NereidsBrokerFileGroup fileGroup) {

Review Comment:
   This duplicates the `isFillMissingColumns(NereidsBrokerFileGroup)` method 
declared just above at lines 449-452, so `NereidsLoadScanProvider` will not 
compile (`method isFillMissingColumns(...) is already defined`). Please remove 
one of the duplicate declarations before this can pass FE compilation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to