On Thu, Nov 27, 2014 at 2:40 PM, Albert-Jan Roskam <fo...@yahoo.com> wrote: > >>CsvIter._get_row_lookup should work on a regular file from built-in >>open (not codecs.open), opened in binary mode. I/O on a regular file >>will release the GIL back to the main thread. mmap objects don't do >>this. > > Will io.open also work? Until today I thought that Python 3's open was what is > codecs.open in Python 2 (probably because Python3 is all about ustrings, and > py3-open has an encoding argument).
If you're using mmap in __getitem__, then open the file in binary mode to parse the byte offsets for lines. This makes the operation of __getitem__ lockless, except for initialization. If you instead use the file interface (tell, seek, read) in __getitem__, you'll have to synchronize access to protect the file pointer. >>Binary mode ensures the offsets are valid for use with >>the mmap object in __getitem__. This requires an ASCII compatible >>>encoding such as UTF-8. > > What do you mean exactly with "ascii compatible"? Does it mean 'superset of ascii', > such as utf-8, windows-1252, latin-1? Hmmm, but Asian encodings like cp874 and > shift-JIS are thai/japanese on top of ascii, so this makes me doubt. In my code I > am using icu to guess the encoding; I simply put 'utf-8' in the sample code for > brevity. The 2.x csv module only works with byte strings that are ASCII compatible. It doesn't support encodings such as UTF-16 that have nulls. Also, the reader is hard-coded to use ASCII '\r' and '\n' as line terminators. I'd have to read the source to see what else is hard coded.
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor