arvindKandpal-ksolves commented on PR #4819:
URL: https://github.com/apache/cassandra/pull/4819#issuecomment-4495599327

   Hi @bschoening and @Jens-G,
   
   Thanks for the suggestions! I investigated the alternative approach of 
bypassing `format_value` for `text/varchar/ascii` columns directly within 
`copyutil.py`. While it cleanly handles top-level text columns, it 
unfortunately breaks for text embedded inside collections (List, Map, Set, 
Tuple, UDT).
   
   ### Technical Analysis:
   1. **The Collection Path:** For types like `list<text>`, 
`ExportProcess.format_value` only sees the outer type (`list`) and falls 
through to `format_value_list` -> `format_simple_collection`. The recursive 
descent happens entirely inside `formatting.py` and routes the elements back 
through the dispatch table straight to `format_value_text`. The bypass in 
`copyutil.py` never gets a second chance, causing nested strings to be 
re-corrupted (`\n` → `\\n` and `\` → `\\\\`).
   2. **The UDT Path:** `format_value_utype` explicitly invokes 
`format_value_text(name, ...)` by name for field identifiers rather than type 
dispatch. A bypass or generic formatter-swap misses this path entirely.
   
   To maintain architecture consistency, explicit kwarg propagation mirrors 
exactly how context-of-rendering parameters (`colormap`, `nullval`, `encoding`, 
`decimal_sep`, etc.) are already plumbed through these layers. 
   
   ### Latest Updates in this PR:
   * **Jens-G's Feedback Resolved:** I have fixed the unconditional backslash 
substitution. The replacement `val.replace('\\', '\\\\')` is now correctly 
wrapped inside a standard `if escape_control_chars:` block so that CSV export 
mode leaves backslashes completely untouched.
   * **Expanded Test Coverage:** The unit tests in `test_formatting.py` have 
been updated with a mixed string (`"C:\\Users\\alice\nHello\x00"`) and strict 
regression assertions to prevent backslash doubling from ever creeping back. 
All 10 tests are passing green.
   
   *(Note: This architectural deep-dive and regression analysis was conducted 
with the assistance of Claude Code / Anthropic).*
   
   Please review the updated changes whenever you have time. I look forward to 
your thoughts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to