[issue9985] difflib.SequenceMatcher has slightly buggy and undocumented caching behavior
Christoph Burgmer added the comment: Here's a test case and a fix for get_matching_blocks() to return the same content on subsequent calls. -- keywords: +patch nosy: +christoph Added file: http://bugs.python.org/file19084/get_matching_blocks.diff ___ Python tracker <http://bugs.python.org/issue9985> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9985] difflib.SequenceMatcher has slightly buggy and undocumented caching behavior
Christoph Burgmer added the comment: BTW, here's the commit that broke the behavior in the first place: http://svn.python.org/view/python/trunk/Lib/difflib.py?r1=54230&r2=59907 -- ___ Python tracker <http://bugs.python.org/issue9985> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8192] SQLite3 PRAGMA table_info doesn't respect database on Win32
New submission from Christoph Burgmer : 'PRAGMA database.table_info("SOME_TABLE_NAME")' will report table metadata for the given database. The main database called 'main', can be extended by attaching further databases via 'ATTACH DATABASE'. The above PRAGMA should respect the chosen database, but fails to do so on Win32 (tested on Wine) while it does on Linux. How to reproduce: FILE 'first.db' has table: CREATE TABLE "First" ( "Test" INTEGER NOT NULL ); FILE 'second.db' has table: CREATE TABLE "Second" ( "Test" INTEGER NOT NULL ); The final result of the following code shoule be empty, but returns table data from second.db instead. Y:\>python Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sqlite3 >>> conn = sqlite3.connect('first.db') >>> c = conn.cursor() >>> c.execute("ATTACH DATABASE 'second.db' AS 'second'") >>> for row in c: ... print repr(row) ... >>> c.execute("PRAGMA 'main'.table_info('Second')") >>> for row in c: ... print repr(row) ... (0, u'Test', u'INTEGER', 99, None, 0) >>> In contrast sqlite3.exe respects the value for the same command: Y:\>sqlite3.exe first.db SQLite version 3.6.23 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> .tables First sqlite> ATTACH DATABASE 'second.db' AS 'second'; sqlite> .tables First sqlite> PRAGMA main.table_info('Second'); sqlite> PRAGMA second.table_info('Second'); 0|Test|INTEGER|1||0 sqlite> Advice on further debugging possibilities is requested. I do not have a Windows system available though, nor can I currently compile for Win32. -- components: Library (Lib) messages: 101440 nosy: christoph severity: normal status: open title: SQLite3 PRAGMA table_info doesn't respect database on Win32 versions: Python 2.6 ___ Python tracker <http://bugs.python.org/issue8192> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2517] Error when printing an exception containing a Unicode string
New submission from Christoph Burgmer <[EMAIL PROTECTED]>: Python seems to have problems when an exception is thrown that contains non-ASCII text as a message and is converted to a string. >>> try: ... raise Exception(u'Error when printing ü') ... except Exception, e: ... print e ... Traceback (most recent call last): File "", line 4, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 20: ordinal not in range(128) See http://www.stud.uni-karlsruhe.de/~uyhc/de/content/python-and-exceptions-containing-unicode-messages -- components: Unicode messages: 64770 nosy: christoph severity: normal status: open title: Error when printing an exception containing a Unicode string type: behavior versions: Python 2.4, Python 2.5 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2517> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2517] Error when printing an exception containing a Unicode string
Christoph Burgmer <[EMAIL PROTECTED]> added the comment: To be more precise: I see no way to convert the encapsulated non-ASCII data from the string in an easy way. Taking e from my last post none of the following will work: str(e) # UnicodeDecodeError e.__str__() # UnicodeDecodeError e.__unicode__() # AttributeError unicode(e) # UnicodeDecodeError unicode(e, 'utf8') # TypeError My solution around this right now is raising an exception with an already converted string (see the link I provided). But as the tutorials speak of simply "print e" I guess the behaviour described above is some kind of a bug. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2517> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2517] Error when printing an exception containing a Unicode string
Christoph Burgmer <[EMAIL PROTECTED]> added the comment: Thanks, this does work. But, where can I find the piece of information you just gave to me in the docs? I couldn't find any interface definition for Exceptions. Further more will this be regarded as a bug? >From [1] I understand that "unicode(e)" and "unicode(e, 'utf8')" are supposed to work. No limitations are made on the type of the object. And I suppose that unicode() is the exact equivalent of str() in that it copes with unicode strings. Not expecting the string representation of an Exception to return a Unicode string when its content is non-ASCII where as this kind of behaviour of simple string conversion is wished for with ASCII text seems unlikely cumbersome. Please reopen if my report does have a point. [1] http://docs.python.org/lib/built-in-funcs.html __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2517> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2517] Error when printing an exception containing a Unicode string
Christoph Burgmer <[EMAIL PROTECTED]> added the comment: Though I welcome the reopening of the bug for Python 3.0 I must say that plans of not fixing a core element rather surprises me. I never believed Python to be a programming language with good Unicode integration. Several points were missing that would've been nice or even essential to have for good development with Unicode, most ignored for the sake of maintaining backward compatibility. This though is not the fault of the Unicode class itself and supporting packages. Some modules like the one for CSV are lacking full Unicode support. But nevertheless the basic Python would always give you the possibility to use Unicode in (at least) a consistent way. For me raising exceptions does count as basic support like this. So I still hope to see this solved for the 2.x versions which I read will be maintained even after the release of 3.0. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2517> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2517] Error when printing an exception containing a Unicode string
Christoph Burgmer <[EMAIL PROTECTED]> added the comment: JFTR: > print unicode(e.message).encode("utf-8") only works for Python 2.5, not downwards. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2517> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6412] Titlecase as defined in Unicode Case Mappings not followed
Christoph Burgmer added the comment: Casing algorithms should follow Section 3.13 "Default Case Algorithms" in the standard itself, not UTR#21. See http://www.unicode.org/Public/5.2.0/ucd/DerivedCoreProperties-5.2.0d11. Unicode 5.2. A nice mail on the Unicode mail list has a bit explanation to that: http://www.unicode.org/mail-arch/unicode-ml/y2009- -- ___ Python tracker <http://bugs.python.org/issue6412> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6625] UnicodeEncodeError on pydoc's CLI
New submission from Christoph Burgmer : pydoc fails with a UnicodeEncodeError for properly specified Unicode docstrings (u"""...""") on the command line interface. See attached patch that encodes the output with the system's encoding. -- components: Extension Modules files: unicode.patch keywords: patch messages: 91182 nosy: christoph severity: normal status: open title: UnicodeEncodeError on pydoc's CLI versions: Python 2.5, Python 2.6 Added file: http://bugs.python.org/file14626/unicode.patch ___ Python tracker <http://bugs.python.org/issue6625> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6625] UnicodeEncodeError on pydoc's CLI
Christoph Burgmer added the comment: Here is a diff for test/test_pydoc.py (against Python2.6) which though doesn't trigger due to how Python handles output encoding. This test here will pass, but pydoc will still fail: $ pydoc test/pydoc_mod.py > /dev/null Traceback (most recent call last): File "/usr/bin/pydoc", line 5, in pydoc.cli() File "/usr/lib/python2.5/pydoc.py", line 2226, in cli help.help(arg) File "/usr/lib/python2.5/pydoc.py", line 1691, in help else: doc(request, 'Help on %s:') File "/usr/lib/python2.5/pydoc.py", line 1482, in doc pager(title % desc + '\n\n' + text.document(object, name)) File "/usr/lib/python2.5/pydoc.py", line 1300, in pager pager(text) File "/usr/lib/python2.5/pydoc.py", line 1398, in plainpager sys.stdout.write(plain(text)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 936: ordinal not in range(128) -- Added file: http://bugs.python.org/file14656/pydoc_unicode_testcase_notworking.diff ___ Python tracker <http://bugs.python.org/issue6625> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6656] locale.format_string fails on escaped percentage
New submission from Christoph Burgmer : locale.format_string doesn't return same result as a normal "string" % format directive, but raises a TypeError. See attached test case for Python 2.6. >>> locale.format_string('%f%%', 1.0) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.5/locale.py", line 195, in format_string return new_f % val TypeError: not enough arguments for format string >>> '%f%%' % 1.0 '1.00%' -- components: Library (Lib) files: locale_percents_test.diff keywords: patch messages: 91352 nosy: christoph severity: normal status: open title: locale.format_string fails on escaped percentage versions: Python 2.5, Python 2.6 Added file: http://bugs.python.org/file14665/locale_percents_test.diff ___ Python tracker <http://bugs.python.org/issue6656> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6656] locale.format_string fails on escaped percentage
Christoph Burgmer added the comment: This patch removes '%%' entities from the regex results and only replaces other matches with '%s' which later then get replaced by localized versions so that escaped percentage entities don't show up in localized parsing anymore. Removing case '%%' from the regex completely does not sound feasible and will result in '%%d' having a match '%d', though d should be a normal character. The replacing of regex matches does not look that beautiful, feel free to rewrite said part. -- Added file: http://bugs.python.org/file14666/locale_percents.diff ___ Python tracker <http://bugs.python.org/issue6656> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6412] Titlecase as defined in Unicode Case Mappings not followed
Christoph Burgmer added the comment: Implementing full patch solving it the old way (UTR#21). The correct way for the latest Unicode version would be to implement the word breaking algorithm described in (UAX#29) [1] first. [1] http://www.unicode.org/reports/tr29/#Word_Boundaries -- Added file: http://bugs.python.org/file14890/unicodeobject.titlecase.2.diff ___ Python tracker <http://bugs.python.org/issue6412> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6412] Titlecase as defined in Unicode Case Mappings not followed
Christoph Burgmer added the comment: I should add that I didn't include the two header files generated by Tools/unicode/makeunicodedata.py -- ___ Python tracker <http://bugs.python.org/issue6412> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6412] Titlecase as defined in Unicode Case Mappings not followed
Christoph Burgmer added the comment: > * U+0027 APOSTROPHE hardcoded (see below) > * U+00AD SOFT HYPHEN (SHY) has the "Format (Cf)" property and thus is included automatically > * U+2019 RIGHT SINGLE QUOTATION MARK hardcoded (see below) I hardcoded some characters into Tools/unicode/makeunicodedata.py: >>> print ' '.join([u':', u'\xb7', u'\u0387', u'\u05f4', u'\u2027', u'\ufe13', u'\ufe55', u'\uff1a'] + [u"'", u'.', u'\u2018', u'\u2019', u'\u2024', u'\ufe52', u'\uff07', u'\uff0e']) : · · ״ ‧ ︓ ﹕ : ' . ‘ ’ ․ ﹒ ' . Those cannot currently be extracted automatically, as neither DerivedCoreProperties.txt nor the source file for property "Word_Break(C) = MidLetter or MidNumLet" are provided in the script. As I said, the patch is only a second best solution, as the correct path would be implementing the word breaking algorithm as described in the newest standard. This patch is just an improvement over the current situation. -- ___ Python tracker <http://bugs.python.org/issue6412> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7008] str.title() misbehaves with apostrophes
Christoph Burgmer added the comment: I admit I don't fully understand the semantics of capwords(). But from what I believe what it should do, this function could be happily replaced by the word-breaking algorithm as defined in http://www.unicode.org/reports/tr29/. This algorithm should be implemented anyway, to properly solve issue6412. -- nosy: +christoph ___ Python tracker <http://bugs.python.org/issue7008> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6412] Titlecase as defined in Unicode Case Mappings not followed
Christoph Burgmer added the comment: New patch - updated comments to reflect needed integration of DerivedCoreProperties.txt - cleaned up if(...) construct - updated (from issue7008) and integrated testcase When applying this patch, run Tools/unicode/makeunicodedata.py to regenerate the header files. Note though, that with this patch str and unicode objects will not behave equally: >>> s = "This isn't right" >>> s.title() == unicode(s).title() False -- Added file: http://bugs.python.org/file14994/unicodeobject.titlecase.3.diff ___ Python tracker <http://bugs.python.org/issue6412> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7008] str.title() misbehaves with apostrophes
Christoph Burgmer added the comment: Antoine Pitrou wrote: > capwords() itself could be deprecated, since it's an obvious one- > Replacing in with another method, however, will just confuse and annoy > existing users. Yes, sorry, I meant the semantics, where as you are right for the specific function. Marc-Andre Lemburg wrote: > Note however, that word boundaries are just as complicated as casing: > there are lots of special cases in different languages or locales > (see the notes after the word boundary rules in the TR29). ICU already has the full implementation, so Python could get away with just supporting the default implementation (as seen with other case mappings). >>> from PyICU import UnicodeString, Locale, BreakIterator >>> en_US_locale = Locale('en_US') >>> breakIter = BreakIterator.createWordInstance(en_US_locale) >>> s = UnicodeString("There's a hole in the bucket.") >>> print s.toTitle(breakIter, en_US_locale) There's A Hole In The Bucket. >>> breakIter.setText("There's a hole in the bucket.") >>> last = 0 >>> for i in breakIter: ... print s[last:i] ... last = i ... There's A Hole In The Bucket . -- ___ Python tracker <http://bugs.python.org/issue7008> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3955] maybe doctest doesn't understand unicode_literals?
Christoph Burgmer added the comment: OutputChecker.check_output() seems to be responsible for comparing 'example.want' and 'got' literals and this is obviously done literally. So as "u'1'" is different to "'1'" this is reflected in the result. This gets more complicated with literals like "[u'1', u'2']" I believe. So, eval() could be used for testing for equality: >>> repr(['1', '2']) == repr([u'1', u'2']) False but >>> eval(repr(['1', '2'])) == eval(repr([u'1', u'2'])) True doctests are already compiled and executed, but evaluating the doctest code's result is probably a security issue, so a method doing the invers of repr() could be used, that only works on variables; something like Pickle, but without its own protocol. -- nosy: +christoph ___ Python tracker <http://bugs.python.org/issue3955> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3955] maybe doctest doesn't understand unicode_literals?
Christoph Burgmer added the comment: This problem seems more severe as the appended test case shows. That gives me: Expected: u'ī' Got: u'\u012b' Both literals are the same. Unicode literals in doc strings are not treated as other escaped characters: >>> repr(r'\n') "'n'" >>> repr('\n') "'\\n'" but: >>> repr(ur'\u012b') "u'\\u012b'" >>> repr(u'\u012b') "u'\\u012b'" So there is no work around in the docstring's reference itself. I file this here, even though the problems are not strictly equal. I do believe though that there is or should be a common solution to these issues. Both results need to be interpreted on a more abstract scale. -- Added file: http://bugs.python.org/file14406/test.py ___ Python tracker <http://bugs.python.org/issue3955> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1293741] doctest runner cannot handle non-ascii characters
Christoph Burgmer added the comment: See attached patch which works for error reporting and verbose output. -- keywords: +patch nosy: +christoph Added file: http://bugs.python.org/file14407/doctest.unicode.patch ___ Python tracker <http://bugs.python.org/issue1293741> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1293741] doctest runner cannot handle non-ascii characters
Christoph Burgmer added the comment: My last patch only changed the encoding used in DocTestRunner.run(). This new patch will apply the same to DocTestCase.runTest(). -- Added file: http://bugs.python.org/file14422/doctest.unicode.patch ___ Python tracker <http://bugs.python.org/issue1293741> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3955] maybe doctest doesn't understand unicode_literals?
Christoph Burgmer added the comment: JFTR: To yield the results of my last comment, you need to apply the patch posted in http://bugs.python.org/issue1293741 -- ___ Python tracker <http://bugs.python.org/issue3955> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6412] Titlecase as defined in Unicode Case Mappings not followed
New submission from Christoph Burgmer : Titlecase, i.e. istitle() and title(), is buggy when the string includes combining diacritical marks. >>> u'H\u0301ngh'.istitle() False >>> u'H\u0301ngh'.title() u'H\u0301Ngh' >>> The string given already is in titlecase so that the following result is expected: >>> u'H\u0301ngh'.istitle() True >>> u'H\u0301ngh'.title() u'H\u0301ngh' >>> UTR#21 Case Mappings defines the following algorithm for titlecase mapping [1]: For each character C, find the preceding character B. ignore any intervening case-ignorable characters when finding B. If B exists, and is cased map C to UCD_lower(C) Otherwise, map C to UCD_title(C) The class of 'case-ignorable' is defined under [2] and includes Nonspacing Marks (Mn) as listed in [3]. This includes diacritcal marks and others. These should not be handled similar to spaces which they currently are, thus dividing words. A patch including the above test case is attached. [1] http://unicode.org/reports/tr21/tr21-5.html#Case_Conversion_of_Strings [2] http://unicode.org/reports/tr21/tr21-5.html#Definitions [3] http://www.fileformat.info/info/unicode/category/Mn/list.htm -- components: Library (Lib) files: test_unicode.titlecase.diff keywords: patch messages: 90086 nosy: christoph severity: normal status: open title: Titlecase as defined in Unicode Case Mappings not followed versions: Python 2.5, Python 2.6 Added file: http://bugs.python.org/file14443/test_unicode.titlecase.diff ___ Python tracker <http://bugs.python.org/issue6412> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6412] Titlecase as defined in Unicode Case Mappings not followed
Christoph Burgmer added the comment: Adding a incomplete patch in need of a function Py_UNICODE_ISCASEIGNORABLE defining the case-ignorable class. I don't want to touch capitalize() as I don't fully understand the semantics, where it is different to title(). It seems though following UTR#21 not the first character should be uppercased, but the first character with casing. -- Added file: http://bugs.python.org/file1/unicodeobject.titlecase.diff ___ Python tracker <http://bugs.python.org/issue6412> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
Christoph Burgmer added the comment: Will this bug be tackled or Python2.7? And is there a way to get hold of the access denied error? Here are my steps to reproduce: I started the console with "cmd /u /k chcp 65001" ___ Aktive Codepage: 65001. C:\Dokumente und Einstellungen\root>set PYTHONIOENCODING=UTF-8 C:\Dokumente und Einstellungen\root>d: D:\>cd Python31 D:\Python31>python Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> print("\u573a") 场 Traceback (most recent call last): File "", line 1, in IOError: [Errno 13] Permission denied >>> ___ I see a rectangle on screen but obviously c&p works. -- nosy: +christoph ___ Python tracker <http://bugs.python.org/issue1602> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6412] Titlecase as defined in Unicode Case Mappings not followed
Christoph Burgmer added the comment: @Terry How is the behavior changed? To me it seems the same to as initially reported. The results are consistent but nonetheless wrong. It's not about whether your agree with the result, but rather about following the Unicode standard. -- ___ Python tracker <http://bugs.python.org/issue6412> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com