mistercrunch commented on PR #34235:
URL: https://github.com/apache/superset/pull/34235#issuecomment-3104642667

   My take here is that we should just make things work properly with UTF-8 out 
of the box, but I get why we need the BOM now.
   
   From my understanding, the issue is Excel - without the BOM signature bytes, 
Excel can't figure out that our UTF-8 CSV files are actually UTF-8, so it 
mangles any non-English characters. That's why `utf-8-sig` matters - it adds 
those 3 magic bytes (`\xEF\xBB\xBF`) that tell Excel "hey, this is UTF-8!"
   
   But I still think we should avoid config complexity. A few thoughts:
   
   - Let's just default to `utf-8-sig` for all CSV exports - it's backward 
compatible and fixes the Excel issue
   - The BOM is valid UTF-8 and most modern tools handle it fine (unlike the 
bad old days)
   - No need for a `CSV_EXPORT` config - this should "just work" without users 
having to know about BOMs
   
   Can we simplify this to:
   1. Always use `utf-8-sig` for CSV exports (handles Excel + international 
chars)  
   2. Remove the config option entirely
   3. Add tests for various UTF-8 scenarios (Arabic, Chinese, etc.)
   
   What do you think about just making this the default behavior? (while noting 
the change in UPDATING.md)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to