Daniel Geržo wrote: > Thomas 'PointedEars' Lahn wrote: >> Chris Rebert wrote: >>> Daniel Geržo wrote: >>>> [f.newlines is None after f.readlines() >>>> when f = codecs.open(…, mode='rU', encoding='ascii'), >>>> but not when f = codecs.open(…, mode='rU')] >>> >>> […] >>> I would speculate that the upshot of this is that codecs.open() ends >>> up calling built-in open() with a nonsense `mode` of "rUb" or similar, >>> resulting in strange behavior. >>> >>> If this explanation is correct, then there are 2 bugs: >>> 1. Built-in open() should treat "b" and "U" as mutually exclusive and >>> reject mode strings which involve both. >>> 2. codecs.open() should either reject modes involving "U", or be fixed >>> so that they work as expected. >> >> You might be correct that it is a bug (already fixed in versions newer >> than 2.5), since codecs.open() from my Python 2.6 reads as follows: > > Well I am doing this on: > Python 2.7.1 (r271:86832, Mar 7 2011, 14:28:09) > [GCC 4.2.1 (Apple Inc. build 5664)] on darwin > > So what do you guys advise me to do?
RTSL, fix when necessary (see my other follow-up), check the trunk, and if necessary submit a patch. For an immediate solution, do not do what is not supposed to work (calling codecs.open(…, mode='U')). You can find the three kinds of newlines in the text with, e.g. self.newline = list( set(re.findall(r'\r?\n|\r', ''.join(fobj.readlines())))) Please trim your quotes to the relevant minimum (see above for example). -- PointedEars -- http://mail.python.org/mailman/listinfo/python-list