arvindKandpal-ksolves opened a new pull request, #4819:
URL: https://github.com/apache/cassandra/pull/4819
### What this PR does / why we need it:
This PR fixes CASSANDRA-21381, where exporting data via `COPY TO` corrupts
control characters (like newlines, null bytes) by replacing them with their
Python `repr()` notation (e.g., `\n`).
**Root Cause:** `format_value_text` is shared between terminal display
(SELECT) and CSV export (`COPY TO`). The `UNICODE_CONTROLCHARS_RE` substitution
was running unconditionally, corrupting multi-line data in CSVs.
**Fix:**
Introduced an `escape_control_chars` parameter (defaulting to `True` for
terminal display) to `format_value_text` and all related collection formatters
(`format_value_list`, `format_value_set`, `format_value_map`,
`format_value_tuple`, `format_value_utype`). During CSV export in `copyutil.py`
(`ExportProcess.format_value`), we now pass `escape_control_chars=False` to
preserve actual control characters. This approach is analogous to the
`escape_backslash` fix introduced in CASSANDRA-21131.
### Steps to reproduce the issue / verify the fix:
You can verify the fix using `cqlsh` with the following schema containing
normal text and collections:
```sql
CREATE KEYSPACE IF NOT EXISTS test WITH replication = {'class':
'SimpleStrategy', 'replication_factor': 1};
USE test;
CREATE TYPE IF NOT EXISTS my_type (a int, b text);
CREATE TABLE IF NOT EXISTS test_collections (
id int PRIMARY KEY,
my_list list<text>,
my_set set<text>,
my_map map<text, text>,
my_tuple tuple<int, text>,
my_udt frozen<my_type>
);
INSERT INTO test_collections (id, my_list, my_set, my_map, my_tuple, my_udt)
VALUES (
1,
['list
item'],
{'set
item'},
{'map_key': 'map
value'},
(10, 'tuple
item'),
{a: 100, b: 'udt
item'}
);
COPY test_collections TO 'test_collections.csv';
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]