arvindKandpal-ksolves opened a new pull request, #4819:
URL: https://github.com/apache/cassandra/pull/4819

   ### What this PR does / why we need it:
   This PR fixes CASSANDRA-21381, where exporting data via `COPY TO` corrupts 
control characters (like newlines, null bytes) by replacing them with their 
Python `repr()` notation (e.g., `\n`).
   
   **Root Cause:** `format_value_text` is shared between terminal display 
(SELECT) and CSV export (`COPY TO`). The `UNICODE_CONTROLCHARS_RE` substitution 
was running unconditionally, corrupting multi-line data in CSVs.
   
   **Fix:**
   Introduced an `escape_control_chars` parameter (defaulting to `True` for 
terminal display) to `format_value_text` and all related collection formatters 
(`format_value_list`, `format_value_set`, `format_value_map`, 
`format_value_tuple`, `format_value_utype`). During CSV export in `copyutil.py` 
(`ExportProcess.format_value`), we now pass `escape_control_chars=False` to 
preserve actual control characters. This approach is analogous to the 
`escape_backslash` fix introduced in CASSANDRA-21131.
   
   ### Steps to reproduce the issue / verify the fix:
   You can verify the fix using `cqlsh` with the following schema containing 
normal text and collections:
   
   ```sql
   CREATE KEYSPACE IF NOT EXISTS test WITH replication = {'class': 
'SimpleStrategy', 'replication_factor': 1};
   USE test;
   
   CREATE TYPE IF NOT EXISTS my_type (a int, b text);
   
   CREATE TABLE IF NOT EXISTS test_collections (
       id int PRIMARY KEY,
       my_list list<text>,
       my_set set<text>,
       my_map map<text, text>,
       my_tuple tuple<int, text>,
       my_udt frozen<my_type>
   );
   
   INSERT INTO test_collections (id, my_list, my_set, my_map, my_tuple, my_udt) 
   VALUES (
       1, 
       ['list
   item'], 
       {'set
   item'}, 
       {'map_key': 'map
   value'}, 
       (10, 'tuple
   item'), 
       {a: 100, b: 'udt
   item'}
   );
   
   COPY test_collections TO 'test_collections.csv';


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to