github-actions[bot] commented on code in PR #64071:
URL: https://github.com/apache/doris/pull/64071#discussion_r3465871456
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsLoadScanProvider.java:
##########
@@ -195,9 +196,21 @@ private void
fillContextExprMap(List<NereidsImportColumnDesc> columnDescList, Ne
// If user does not specify the file field names, generate it by using
base schema of table.
// So that the following process can be unified
boolean specifyFileFieldNames = copiedColumnExprs.stream().anyMatch(p
-> p.isColumn());
- if (!specifyFileFieldNames) {
+ boolean fillMissing = isFillMissingColumns(fileGroup);
+ if (!specifyFileFieldNames || fillMissing) {
+ // Only dedup against already-present columns for the
fill_missing_columns path so that
+ // the existing !specifyFileFieldNames behavior stays
byte-for-byte identical.
+ Set<String> existingColumns = new
TreeSet<>(String.CASE_INSENSITIVE_ORDER);
Review Comment:
This de-dup should not treat every existing descriptor as proof that the
matching file slot is available. With `fill_missing_columns=true`, a same-name
mapping like `COLUMNS(k1 = k1)` starts `copiedColumnExprs` with
`NereidsImportColumnDesc("k1", UnboundSlot("k1"))`. Adding that target to
`existingColumns` makes the base-schema loop skip the plain `k1` descriptor,
and the later scan-slot loop only creates file slots for descriptors whose
`expr == null`.
The resulting reduced plan is:
```text
LogicalLoadProject(k1 := UnboundSlot(k1))
LogicalProject(scanSlots without k1)
LogicalOneRowRelation(scanSlots without k1)
```
So the mapping still references the input `k1`, but its child no longer
outputs that slot. The old `!specifyFileFieldNames` path added the base-schema
descriptor and kept this source slot available. Please de-dup only true
file-field descriptors and constant mappings, or otherwise add the base scan
descriptor whenever a mapping expression can still reference the same source
column.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]