[issue27580] CSV Null Byte Error

Daniel Jewell Fri, 29 May 2020 23:07:05 -0700


Daniel Jewell <danieljew...@gmail.com> added the comment:


Forgive my frustration, but @Skip I really don't see how the definition of CSV 
relating to Excel (or Gnumeric or LibreOffice) has any relevance as to whether 
or not the module (and perhaps Python more generally) supports chr(0x00) as a 
delimiter. (Neither you nor I get to decide how someone else might write output 
data...) 

 While the module is called CSV, it's really not just *Comma* Separated Values 
- rather it's a rough approximation of a database table with an optional header 
row where rows/records are separated by <record separator> and fields are 
separated by <field separator>. Sometimes the record separator is chr(0x2c) 
(e.g. a comma) sometimes it's chr(0x09) (e.g. a tab - or in ASCII parlance 
"Horizontal Tab/HT") ... or maybe even the actual ASCII "Record Separator" 
character (e.g. chr(0x1e)) ... or maybe NUL chr(0x00). 

(1) The module should be 100% agnostic about the separator - the current 
(3.8.3) error text when trying to use csv.reader(..., delimiter=chr(0x00)) is 
"TypeError: "delimiter" must be a 1-character string" ... well, chr(0x00) *is* 
a 1-character string. It's not a 1-character *printable* string... But then 
again neither is chr(0x1e) (ASCII "RS" Record Separator) .. and csv.reader(..., 
delimiter=chr(0x1e)) appears to work (I haven't tried actual data yet). 


(1a) The use of chr(0x00) or '\0' is used quite often in the *NIX world as a 
convenient record separator that doesn't have escaping problems because by it's 
very nature it's non-printable. e.g. "find . -iname "*something*" -print0 | 
xargs -0 <program>" ... 

As to the difficulty in handling 0x00 characters, I dunno ... it appears that 
GNU find, xargs, gawk... same with FreeBSD. FreeBSD writes the output for 
"-print0" like this: 
https://github.com/freebsd/freebsd/blob/508f3673dec94b03f89b9ce9569390d6d9b86a89/usr.bin/find/function.c#L1383
 ... and bsd xargs handles it too. I haven't looked at the CPython source to 
see what's going on - it might be tricky to modify the code to support this... 
(but then again, IMHO, this sort of thing should have been a consideration in 
the first place....) 

I suppose in many ways, the very existence of this specific issue at all is 
just one example of what seems to be a larger issue with Python's overall 
development: It's a great language for *many* things and in many ways. But I've 
run into so many little fringe "gotchas" where something doesn't work or is 
limited in some way because, seemingly, functionality is designed 
around/defined by a practical-example-use-case and not what is or might be 
*possible* (e.g. the CSV-as-only-a-spreadsheet-interface example -- and I 
really *don't* mean that as a personal attack @Skip - I am very appreciative of 
the time and effort you and everyone has poured into the project...) Is it 
possible to write a NUL (0x00) character to a file? Through a *NIX pipe? You 
bet. 

(I got a little rant-y .. sorry... I'm sure there's a _lot_ more going on 
underneath the covers and there are a lot of factors - not limited to just the 
csv module - as you mentioned. I just really feel like something is "off". 
Maybe it's my brain - ha. :))

----------
nosy: +danieljewell
type: enhancement -> behavior
versions: +Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue27580>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27580] CSV Null Byte Error

Reply via email to