YoannAbriel opened a new pull request, #62865:
URL: https://github.com/apache/airflow/pull/62865
# fix: stop parameterizing SQL keywords in MySQL bulk_load_custom
## Problem
`MySqlHook.bulk_load_custom()` passes `duplicate_key_handling` (e.g.
`IGNORE`, `REPLACE`) and `extra_options` as parameterized query values via
`cursor.execute(sql, parameters)`. The MySQL driver treats parameterized values
as data and quotes them as string literals, producing invalid SQL like:
```sql
LOAD DATA LOCAL INFILE '/tmp/file' 'IGNORE' INTO TABLE `my_table` 'FIELDS
TERMINATED BY ...'
```
This was introduced in PR #33328 which changed from string concatenation to
parameterization for these keywords. A prior fix attempt existed (#41078) but
was closed without merge.
## Root Cause
In `providers/mysql/src/airflow/providers/mysql/hooks/mysql.py`, the
`bulk_load_custom` method builds the SQL as:
```python
sql_statement = f"LOAD DATA LOCAL INFILE %s %s INTO TABLE `{table}` %s"
parameters = (tmp_file, duplicate_key_handling, extra_options)
```
Both `duplicate_key_handling` and `extra_options` are SQL syntax keywords,
not data values. Only `tmp_file` is actual data that should be parameterized.
## Fix
Changed `bulk_load_custom` to interpolate `duplicate_key_handling` and
`extra_options` directly into the SQL statement via f-string, while keeping
`tmp_file` as the sole parameterized value:
```python
sql_statement = f"LOAD DATA LOCAL INFILE %s {duplicate_key_handling} INTO
TABLE `{table}` {extra_options}"
parameters = (tmp_file,)
```
Updated existing tests (`test_bulk_load_custom`,
`test_bulk_load_custom_hook_lineage`) to assert the new SQL shape and parameter
tuple. Added a new parametrized test
`test_bulk_load_custom_duplicate_key_not_parameterized` that validates both
`IGNORE` and `REPLACE` appear literally in the executed SQL and only `tmp_file`
is parameterized.
Closes: #62506
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude Code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]