Re: [Tutor] multiprocessing question

eryksun Fri, 28 Nov 2014 09:17:37 -0800

On Thu, Nov 27, 2014 at 2:40 PM, Albert-Jan Roskam <[email protected]> wrote:
>
>>CsvIter._get_row_lookup should work on a regular file from built-in
>>open (not codecs.open), opened in binary mode. I/O on a regular file
>>will release the GIL back to the main thread. mmap objects don't do
>>this.
>
> Will io.open also work? Until today I thought that Python 3's open was
what is
> codecs.open in Python 2 (probably because Python3 is all about ustrings,
and
> py3-open has an encoding argument).


If you're using mmap in __getitem__, then open the file in binary mode to
parse the byte offsets for lines. This makes the operation of __getitem__
lockless, except for initialization. If you instead use the file interface
(tell, seek, read) in __getitem__, you'll have to synchronize access to
protect the file pointer.

>>Binary mode ensures the offsets are valid for use with
>>the mmap object in __getitem__. This requires an ASCII compatible
>>>encoding such as UTF-8.
>
> What do you mean exactly with "ascii compatible"? Does it mean 'superset
of ascii',
> such as utf-8, windows-1252, latin-1? Hmmm, but Asian encodings like
cp874 and
> shift-JIS are thai/japanese on top of ascii, so this makes me doubt. In
my code I
> am using icu to guess the encoding; I simply put 'utf-8' in the sample
code for
> brevity.

The 2.x csv module only works with byte strings that are ASCII compatible.
It doesn't support encodings such as UTF-16 that have nulls. Also, the
reader is hard-coded to use ASCII '\r' and '\n' as line terminators. I'd
have to read the source to see what else is hard coded.

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] multiprocessing question

Reply via email to