Jens-G opened a new pull request, #4813: URL: https://github.com/apache/cassandra/pull/4813
## Summary `COPY TO` followed by `COPY FROM` corrupts text column values that contain backslashes: each round-trip doubles the backslash count. Reported in [CASSANDRA-21131](https://issues.apache.org/jira/browse/CASSANDRA-21131). **Before (one round-trip):** - Stored: `V\S` → exported CSV: `V\\\\S` → re-imported: `V\\S` ❌ - Stored: `\"Marianne"\` → re-imported: `\\"Marianne"\\` ❌ `list<text>`, `set<text>`, `map<text,text>`, tuples and UDTs with text fields are affected in the same way. ## Root Cause `format_value_text` in `formatting.py` doubles backslashes unconditionally: ```python escapedval = val.replace('\\', '\\\\') ``` This is intentional for **terminal display** (SELECT output shows `V\\S` so the backslash is visible). However, `ExportProcess.format_value` in `copyutil.py` calls the same function when writing CSV. The `csv.writer` (configured with `escapechar='\\'`) then escapes backslashes a **second time**, quadrupling them in the CSV file. On `COPY FROM` the `csv.reader` unescapes once, leaving doubled backslashes in Cassandra. ## Fix Add an `escape_backslash` parameter (default `True`, preserving existing terminal display behaviour) to `format_value_text`, `format_simple_collection`, and all collection formatters. Pass `escape_backslash=False` from `ExportProcess.format_value` so the `csv.writer` handles all backslash escaping exclusively. Changed functions: - `format_value_text` — new parameter - `format_simple_collection` — new parameter, propagated to element `format_value` calls - `format_value_list`, `format_value_set`, `format_value_tuple` — new parameter, forwarded to `format_simple_collection` - `format_value_map` — new parameter, propagated through `subformat` - `format_value_utype` — new parameter, propagated through `format_field_value` - `ExportProcess.format_value` in `copyutil.py` — passes `escape_backslash=False` ## Test Plan Two standalone Python test scripts (no running Cassandra cluster required) are attached to the JIRA ticket and verify the bug and fix: - `test_cassandra_21131.py` — 10 test cases for plain `text` columns: **5/10 pass before fix → 10/10 after** - `test_cassandra_21131_collections.py` — 12 test cases for `list/set/map<text>`: **3/12 before → 12/12 after** Integration testing against a live cluster with the exact scenario from the bug report (`COPY TO` → `TRUNCATE` → `COPY FROM` → `SELECT`) is needed before merge. ## Notes - A separate but related bug (`UNICODE_CONTROLCHARS_RE` converting control chars like `\n` to repr-notation `\\n` during CSV export) was discovered and will be tracked in a separate ticket. - The `Generated-by:` commit token is included per [ASF generative tooling policy](https://www.apache.org/legal/generative-tooling.html). The fix was developed with AI assistance (Claude Sonnet 4.6 / Anthropic) under human review and direction. All code has been verified manually. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

