killme2008 opened a new issue, #20544: URL: https://github.com/apache/datafusion/issues/20544
### Describe the bug When an `INSERT INTO ... VALUES` statement contains a mix of bare placeholders (`$1`, `$3`) and function-wrapped placeholders (`func($2)`), the inferred types for placeholders can be incorrectly assigned due to an index corruption bug in [insert_to_plan](https://github.com/apache/datafusion/blob/e937cadbcceff6a42bee2c5fc8d03068fa0eb30c/datafusion/sql/src/statement.rs#L2257). It scans the `VALUES` clause and collects `placeholder-to-column-type` mappings into a `BTreeMap<usize, FieldRef>`. Only direct placeholders are matched — placeholders nested inside function calls are skipped. The `BTreeMap is` then converted to a `Vec` via `into_values().collect()`, which discards the keys and compresses the sparse entries into a dense Vec. For example, given: ```sql INSERT INTO t (id, name, age) VALUES ($1, upper($2), $3) -- columns: id(UInt32), name(Utf8), age(Int32) ``` 1. The scan produces `BTreeMap { 0: UInt32, 2: Int32 }` (index 1 skipped because $2 is inside upper()) 2. `into_values().collect()` produces `Vec [UInt32, Int32]` — indices 0 and 2 are collapsed to 0 and 1 3. When `create_placeholder_expr` resolves $2, it computes `idx = 2 - 1 = 1`, then looks up Vec[1] = Int32 4. `$2` is incorrectly inferred as `Int32` (the type of age) instead of being left as unknown (None) This causes downstream type-checking failures. In real-world usage (e.g., GreptimeDB https://github.com/GreptimeTeam/greptimedb/issues/7610), `INSERT INTO t VALUES ($1, parse_json($2), $3)` fails because `$2` is wrongly inferred as `Timestamp` instead of `String`, making `parse_json(Timestamp)` fail type validation. # Fix Replace into_values().collect() with a conversion that preserves index positions, filling gaps with None: ```rust // Before (broken): let prepare_param_data_types = prepare_param_data_types.into_values().collect(); // After (correct): let prepare_param_data_types: Vec<Option<FieldRef>> = if let Some(&max_idx) = prepare_param_data_types.keys().last() { (0..=max_idx) .map(|i| prepare_param_data_types.remove(&i)) .collect() } else { vec![] }; ``` This requires changing the `prepare_param_data_types` type from `Vec<FieldRef>` to `Vec<Option<FieldRef>>` throughout the PlannerContext API to properly represent "type unknown for this placeholder position". I'd love to create a PR for this fix if you agree. Thanks! ### To Reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
