[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6
Peter Landgren added the comment: I could add what I have found using bsddb in Python 2.5 and 2.6 under Windows XP SP3. In my installation: Python 2.5.4 bsddb 4.4.5.3 Python 2.6.4 bsddb 4.7.3 What I did: In Gramps imported an XML backup file to a empty bsddb database. It took about 1 hour with 2.6.4 and 2 minutes with 2.5.4! I have also instelled bsddb3: Python 2.6.4 bsddb3 4.8.4 and with the same import I'm back to 2 minutes. I have pstat logs which I could provide. -- nosy: +PeterL ___ Python tracker <http://bugs.python.org/issue8504> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6
Peter Landgren added the comment: Requested data on my Windows box: Python 2.5 bsddb 4.4.5.3 4.4.20 Python 2.6 bsddb 4.7.3 4.7.25 Python 2.6 bsddb 4.8.4 4.8.26 OK? -- ___ Python tracker <http://bugs.python.org/issue8504> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6
Peter Landgren added the comment: Maybe I should add that there is no speed degradation between 2.5 and 2.5 when doing the same thing in Linux. -- ___ Python tracker <http://bugs.python.org/issue8504> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6
Peter Landgren added the comment: In Linux it is: 4.4.5.3 (4, 6, 21) You asked for a test case. I'm not sure how I can provide one without you having Gramps installed to test it. Do you mean the whole database environment? -- ___ Python tracker <http://bugs.python.org/issue8504> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6
Peter Landgren added the comment: To make it 100% clear: The versions are almost the same for Linux and Windows. Python 2.5Python 2.6 Windows 4.4.5.3 (4, 6, 20)4.7.3 (4.7.25) Linux4.4.5.3 (4, 6, 21)4.7.3 (4.7.25) -- ___ Python tracker <http://bugs.python.org/issue8504> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8516] Speed difference between Python 2.5 and 2.6 during filling bsddb database.
New submission from Peter Landgren : The time it takes, in the application Gramps, to fill an empty bsddb database by importing an XML backup or a GECDOM file, incrises from about 2 minutes to about an hour in Windows XP ana Windows 7. No such degradation has been sen in Linux. The Gramps code was the same in all test cases. The running conditions were: Python 2.5 Python 2.6 Windows 4.4.5.3 (4, 6, 20)4.7.3 (4.7.25) Linux4.4.5.3 (4, 6, 21)4.7.3 (4.7.25) Note one little version difference between Windows and Python. If I install bsddb3 and change Gramps code for that, no noticable speed degradation can be seen. Windows only with Python 2.6 bsddb3 4.8.4 (4.8.26). I have run profiling and attach the results. (Sorry for the fuzz I made in issue 8504.) The only way of providing a test case,as far as I can find, is to install Gramps, create a new Family Tree (empty database) and import an test XML backup. There are two testcases (*.gramps) available in: http://gramps.svn.sourceforge.net/viewvc/gramps/branches/maintenance/gramps32/example/gramps/ Gramps can be found at: http://www.gramps-project.org/wiki/index.php?title=Installation -- components: Library (Lib) files: statistics_for_python_25_26_run.txt.tar.gz messages: 104067 nosy: PeterL severity: normal status: open title: Speed difference between Python 2.5 and 2.6 during filling bsddb database. type: performance versions: Python 2.6 Added file: http://bugs.python.org/file17063/statistics_for_python_25_26_run.txt.tar.gz ___ Python tracker <http://bugs.python.org/issue8516> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8516] Speed difference between Python 2.5 and 2.6 during filling bsddb database.
Peter Landgren added the comment: 1. Sorry, I made a mistake this morning. (Had to run to a funeral.) These are the correct version: Python 2.5 Python 2.6 Windows 4.4.5.3 (4, 4, 20)4.7.3 (4.7.25) Linux4.4.5.3 (4, 6, 21)4.7.3 (4.7.25) So, the same versions of bsddb and DB in Python 2.6 gives the slow speed performance with Windows but not with Linux. This means that the Windows and Linux environments are equal as far as I can see. 2. I installed bsddb3 5.0.0 without any problem, but I had to move libdb48.dll from c:\Python26\bsddb3\utils\ to c:\Python26\Lib\site-packages\bsddb3\ otherwise it could not be found. Any explanation for this? 3. Could not run Gramps in Windows with Py 2.7 as Gramps needs pygtk, pycairo and pygobject to run. It seems to be a strange issue. It can be worked around by using bsddb3 in stead in Gramps for those who needs it. It is only a problem when you import a backup or a GEDCOM and when you rebuild reference maps, which you don't do very often. It's not a issue with normal usage of Gramps. So, maybe let it wait until 2.7 is out? -- ___ Python tracker <http://bugs.python.org/issue8516> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5200] unicode.normalize gives wrong result for some characters
New submission from Peter Landgren : If any of the Swedish characters "åäöÅÄÖ" are input to unicode.normalize(form, ustr) with form = "NFD" or "NFKD" the result will be "aaoAAO". "åäöÅÄÖ" are normal character and should be the same after normalize. They are not connected to aaoAAO other than for historic reasons, but not in modern languages. It's a common misinterpretation that the dots and circle above them are diacritic signs, but those letters should behave as the (Danish) "Ø" which is normalized correctly. From Wikipedia: Å is often perceived as an A with a ring, interpreting the ring as a diacritical mark. However, in the languages that use it, the ring is not considered a diacritic but part of the letter. The letter Ö in the Swedish and Icelandic alphabets historically arises from the Germanic umlaut, but it is considered a separate letter from O. See http://en.wikipedia.org/wiki/%C3%85 I think this is pobably impossible to solve as it will be mixed up with "umlaut" and you don't know what language the specific word is connected to. -- components: Library (Lib) messages: 81536 nosy: PeterL severity: normal status: open title: unicode.normalize gives wrong result for some characters type: behavior versions: Python 2.5 ___ Python tracker <http://bugs.python.org/issue5200> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5200] unicode.normalize gives wrong result for some characters
Peter Landgren added the comment: Thanks for the fast response. I understand that python follows the unicode specification. I think the unicode standard is not correct in this case for the Swedish letters. I have asked unicode.org for an explanation. Should not the Danish letter "Ø" be normalized as "O"? I get "Ø" for all NFC/NFD/NFKC/NFKD normalizations? Regards, Peter Landgren Added file: http://bugs.python.org/file13018/unnamed ___ Python tracker <http://bugs.python.org/issue5200> ___http://www.w3.org/TR/REC-html40/strict.dtd";> p, li { white-space: pre-wrap; } Thanks for the fast response. I understand that python follows the unicode specification. I think the unicode standard is not correct in this case for the Swedish letters. I have asked unicode.org for an explanation. Should not the Danish letter "Ã" be normalized as "O"? I get "Ã" for all NFC/NFD/NFKC/NFKD normalizations? Regards, Peter Landgren > Martin v. Löwis <mar...@v.loewis.de> added the comment: > > It is not true that normalize produces "aaoAAO". Instead, it produces > > u'a\u030aa\u0308o\u0308A\u030aA\u0308O\u0308' > > This is the correct result, according to the Unicode specification. It > would be incorrect to normalize them unchanged under the Unicode Normal > Form D (for decomposed); the decomposed character for 'LATIN SMALL > LETTER A WITH RING ABOVE' (for example) is 'LATIN SMALL LETTER A' + > 'COMBINING RING ABOVE'. > > The wikipedia article is irrelevant; refer to the Unicode specification > for a normative reference. > > Closing as invalid. > > -- > nosy: +loewis > resolution: -> invalid > status: open -> closed > > ___ > Python tracker <rep...@bugs.python.org> > <http://bugs.python.org/issue5200>; > ___ -- Peter Landgren Talken Hagen 671 94 BRUNSKOG 0570-530 21 070-635 4719 peter.tal...@telia.com Skype: pgl4820.2 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5200] unicode.normalize gives wrong result for some characters
Peter Landgren added the comment: The same applies "Å" and "A", "Ä" and "A" and "Ö" and "O" which also are also different letters as "Ø" and "O" are. ("Ø" is the Danish version of "Ö" ) Maybe not in the unicode world but in treal life. That's why I'm a little confused. Will wait and see what/if the unicode people says. In any case, thanks for the discussion. Regards, /Peter Added file: http://bugs.python.org/file13019/unnamed ___ Python tracker <http://bugs.python.org/issue5200> ___http://www.w3.org/TR/REC-html40/strict.dtd";> p, li { white-space: pre-wrap; } > > I think you have a fundamental misunderstanding what a "decomposition" > is. "Ã" should *not* be decomposed as "O", because clearly, "Ã" and "O" > are different letters. If anything, it would be decomposed as > "O" + PLUS SOME COMBINING MARK The same applies "Ã " and "A", "Ã" and "A" and "Ã" and "O" which also are also different letters as "Ã" and "O" are. ("Ã" is the Danish version of "Ã" ) Maybe not in the unicode world but in treal life. That's why I'm a little confused. Will wait and see what/if the unicode people says. In any case, thanks for the discussion. Regards, /Peter ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5200] unicode.normalize gives wrong result for some characters
Peter Landgren added the comment: > Martin v. Löwis added the comment: > > The same applies "Å" and "A", "Ä" and "A" and "Ö" and "O" > > which also are also different letters as "Ø" and "O" are. > > Sure. And rightfully, they "Å" is *not* (I repeat: not) > normalized as "A", under NFD: > > py> unicodedata.normalize("NFD", u"Å") > u'A\u030a' > > > Maybe not in the unicode world but in treal life. > > They are different letters also in the Unicode world. > > > That's why I'm a little confused. > > I think the confusion comes from your assumption that > normalizing "Å" produces "A". It does not. Really not. Yes, you are right. However the confusion/problem shows up when it is used in the application to build an alphabet and group for example all version of E, É, È, Ë, Ê together under E. The first character in the result of normalize is used to build alphabet labels for surnames: letter = normalize("NFD", surname)[0].upper() if letter != last_letter: last_letter = letter and this is why I get "A" when the surname begins with "Å". This way it works for all variations of E to be grouped under "E", but fails as "Å" is shown under the label "A", not the "A" in the beginning of the alphabet but after "Z", where "ÅÄÖ" comes. So a previous sorting of the surnames works correctly. (The Swedish alphabet has 29 letters: A,B,C... X,Y,Z,Å,Ä,Ö) Can you think of any solution to this conflict? u'\xd8' u'A\u030a' u'\xc5' This is obviously the result of how the unicode spec is written interpreting "Å" as a variation of "A". which it is not. I have asked the unicode people, but not got any answer yet. The application is GRAMPS: http://gramps-project.org/ Once again thanks for make some of the unicode stuff clear! Regards, Peter Landgren Added file: http://bugs.python.org/file13025/unnamed ___ Python tracker <http://bugs.python.org/issue5200> ___http://www.w3.org/TR/REC-html40/strict.dtd";> p, li { white-space: pre-wrap; } > Martin v. Löwis <mar...@v.loewis.de> added the comment: > > The same applies "à " and "A", "Ã" and "A" and "Ã" and "O" > > which also are also different letters as "Ã" and "O" are. > > Sure. And rightfully, they "à " is *not* (I repeat: not) > normalized as "A", under NFD: > > py> unicodedata.normalize("NFD", u"à ") > u'A\u030a' > > > Maybe not in the unicode world but in treal life. > > They are different letters also in the Unicode world. > > > That's why I'm a little confused. > > I think the confusion comes from your assumption that > normalizing "à " produces "A". It does not. Really not. Yes, you are right. However the confusion/problem shows up when it is used in the application to build an alphabet and group for example all version of E, Ã, Ã, Ã, à together under E. The first character in the result of normalize is used to build alphabet labels for surnames: letter = normalize("NFD", surname)[0].upper() if letter != last_letter: last_letter = letter and this is why I get "A" when the surname begins with "à ". This way it works for all variations of E to be grouped under "E", but fails as "à " is shown under the label "A", not the "A" in the beginning of the alphabet but after "Z", where "à ÃÃ" comes. So a previous sorting of the surnames works correctly. (The Swedish alphabet has 29 letters: A,B,C... X,Y,Z,à ,Ã,Ã) Can you think of any solution to this conflict? I still think "à " or "Ã" or "Ã" should behave as "Ã": >>> unicodedata.normalize("NFD",u"Ã") u'\xd8' Now, as you said: >>> unicodedata.normalize("NFD",u"à ") u'A\u030a' But it should be (in my opinion): >>> unicodedata.normalize("NFD",u"à ") u'\xc5' This is obviously the result of how the unicode spec is written interpreting "à " as a variation of "A". which it is not. I have asked the unicode people, but not got any answer yet. The application is GRAMPS: http://gramps-project.org/ Once again thanks for make some of the unicode stuff clear! Regards, Peter Landgren ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5200] unicode.normalize gives wrong result for some characters
Peter Landgren added the comment: The È... comes from French surnames and our French developer wants to group all versions of E together. The É... can be found in French surnames in Sweden as well as in Germany. The program, GRAMPS is a genealogy program used in about 20 languages, so there is no preferred language. I know. However, Swedish telephone books and dictionaries are sorted the same: A,B,C... X,Y,Z,Å,Ä,Ö. True. I agree. GRAMPS runs in the locale of the user, but must be able to handle information coming from many other languages/countries. That's why it's hard to be universal. We can have them in names. See above. I think we have found a solution that can handle most cases. We treat surnames beginning with "ÅÄÖ" special. I don't think that there are many surnames outside the Nordic countries that starts with any of these three letters. Vielen dank! /Peter Added file: http://bugs.python.org/file13034/unnamed ___ Python tracker <http://bugs.python.org/issue5200> ___http://www.w3.org/TR/REC-html40/strict.dtd";> p, li { white-space: pre-wrap; } > I don't quite understand why you want to place Ã, Ã, Ã, à all along > with E, yet à ,Ã,à after Z. Because that's what the Swedish alphabet > says? The Ã... comes from French surnames and our French developer wants to group all versions of E together. The Ã... can be found in French surnames in Sweden as well as in Germany. The program, GRAMPS is a genealogy program used in about 20 languages, so there is no preferred language. > Please understand that collation varies across languages. For example > in German, we also have Ã, but it does *not* come after Z. Instead, > there are two ways to collate à (telephone book vs. dictionary): > 1. à sorts exactly like A > 2. à sorts as if it was transcribed as Ae I know. However, Swedish telephone books and dictionaries are sorted the same: A,B,C... X,Y,Z,à ,Ã,Ã. > So there is no one true collation of Ã, but you have to take into > account what language rules you want to follow. True. I agree. GRAMPS runs in the locale of the user, but must be able to handle information coming from many other languages/countries. That's why it's hard to be universal. > If you want to implement Swedish rules, why then do you also want > to support Ã, Ã, Ã, Ã? Do you have these letters in Swedish at all? We can have them in names. See above. > If you want to use obscure collation rules, you might have to > implement the collation algorithm yourself. For example, assign > each letter a unique number (different from the Unicode ordinal), > and then sort by these numbers. > > Take a look at ICU, which already includes collation algorithms > for many locales. I think we have found a solution that can handle most cases. We treat surnames beginning with "à ÃÃ" special. I don't think that there are many surnames outside the Nordic countries that starts with any of these three letters. Vielen dank! /Peter ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6525] Problem with string.lowercase in Windows XP
New submission from Peter Landgren : string.lowercase is changed after locale.setlocale(locale.LC_ALL,'') in Windows XP but not in Linux. This little test script on Windows XP and Linux explains the problem: import locale import string print string.lowercase print locale.setlocale(locale.LC_ALL,'C') print string.lowercase print locale.setlocale(locale.LC_ALL,'') print string.lowercase Result on Win XP with Python 2.5.1: abcdefghijklmnopqrstuvwxyz C abcdefghijklmnopqrstuvwxyz Swedish_Sweden.1252 abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■ Result on Linux with Python 2.5.2: abcdefghijklmnopqrstuvwxyz C abcdefghijklmnopqrstuvwxyz sv_SE.UTF-8 abcdefghijklmnopqrstuvwxyz -- components: Extension Modules messages: 90733 nosy: PeterL severity: normal status: open title: Problem with string.lowercase in Windows XP type: crash versions: Python 2.5 ___ Python tracker <http://bugs.python.org/issue6525> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6525] Problem with string.lowercase in Windows XP
Peter Landgren added the comment: Thru, but later in the application code like this a = u"qaz" + string.lowercase[26] causes a = u"qaz" + string.lowercase[26] UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0: ordinal not in range(128) 0x83 corresponds to â. -- ___ Python tracker <http://bugs.python.org/issue6525> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6525] Problem with string.lowercase in Windows XP
Peter Landgren added the comment: True, but later in the application code like this a = u"qaz" + string.lowercase[26] causes a = u"qaz" + string.lowercase[26] UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0: ordinal not in range(128) 0x83 corresponds to â. -- ___ Python tracker <http://bugs.python.org/issue6525> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6525] Problem with string.lowercase in Windows XP
Peter Landgren added the comment: OK about 2.5 Downloaded and installed Python 2.6.2 on my Win XP box and get the same error as with Python 2.5.1. Ok about Python 3, it will be nice when we have upgraded our application, Gramps, to this version and get rid of all kind of coding issues. -- ___ Python tracker <http://bugs.python.org/issue6525> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6525] Problem with string.lowercase in Windows XP
Peter Landgren added the comment: Just some more test. I compared the result of string.letters, string.uppercase and string.lowercase in 2.5 and 2.6: Python25: Letters= ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzâèîÄܣ׃¬Á║└┴┬├─┼ãà ╚╔╩╦╠═╬¤ðÐÊËÈıÍÏ┘┌█▄¦Ì▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■ Upper= ABCDEFGHIJKLMNOPQRSTUVWXYZèîă└┴┬├─┼ãÃ╚╔╩╦╠═╬¤ðÐÊËÈıÍÏ┘┌█▄¦Ì Lower= abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■ Python26: Letters= ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzƒSOZsozYªµºÀÁÂÃÄÅÆÇ ÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ Upper= ABCDEFGHIJKLMNOPQRSTUVWXYZSOZYÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ Lower= abcdefghijklmnopqrstuvwxyzƒsozªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ They return different contents, but the length are the same! -- ___ Python tracker <http://bugs.python.org/issue6525> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6525] Problem with string.lowercase in Windows XP
Peter Landgren added the comment: OK, Agreed for 2.6. But for 2.5 many of the characters returned by string.lowercase: âܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´±‗¾¶§÷°¨·¹³²■ are not lowercase letters at all, but that is history now, as 2.5 is history. We solved it by using ascii_lowercase. Thanks, Peter Landgren > Georg Brandl added the comment: > > This behavior is not a bug - when setting the locale, string.lowercase > and friends are augmented by whatever the locale considers uppercase and > lowercase letters, as byte strings. This will lead to decoding errors > when these strings are combined with Unicode strings. > > Either you use string.ascii_lowercase and friends, or you make sure you > know what encoding the strings will be in, and decode accordingly. > > -- > nosy: +georg.brandl > resolution: -> wont fix > status: open -> closed > > ___ > Python tracker > <http://bugs.python.org/issue6525> > ___ -- ___ Python tracker <http://bugs.python.org/issue6525> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6525] Problem with string.lowercase in Windows XP
Peter Landgren added the comment: Obviously, 2.5 and 2.6 decode the "string.lowercase" when print is used and 2.6 seems to be the correct. Yes. I get exactly the same result in both Python 2.5.2 (r252:60911, Jan 8 2009, 12:17:37) and Python 2.6.2 (r262:71600, Jul 23 2009, 09:01:02) showing that string.lowercase does NOT change with locale. 'sv_SE.UTF-8' >>> a = string.lowercase >>> len(a) 26 >>> a 'abcdefghijklmnopqrstuvwxyz' >>> print a abcdefghijklmnopqrstuvwxyz >>> string.ascii_lowercase == string.lowercase True >>> -- ___ Python tracker <http://bugs.python.org/issue6525> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8859] split() splits on non whitespace char when ther is no separator given.
New submission from Peter Landgren : When the variable label is equal to '\xc5\xa0 Z\nX W' this line sequence label = " ".join(label.split()) label = unicode(label) results in: 7347: ERROR: gramps.py: line 138: Unhandled exception Traceback (most recent call last): File "C:\Program Files (x86)\gramps\gui\views\listview.py", line 660, in row_changed self.uistate.modify_statusbar(self.dbstate) File "C:\Program Files (x86)\gramps\DisplayState.py", line 521, in modify_statusbar name, obj = navigation_label(dbstate.db, nav_type, active_handle) File "C:\Program Files (x86)\gramps\Utils.py", line 1358, in navigation_label label = unicode(label) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid data While this line sequence: label = unicode(label) label = " ".join(label.split()) gives correct result and no error. With the error the variable label changes from '\xc5\xa0 Z\nX W' to '\xc5 Z X W' by the line: label = " ".join(label.split()) Note '\xa0' has been dropped, interpreted as "whitespace"? This happens on Windows. It works perfectly well on Linux. -- components: Library (Lib) messages: 106773 nosy: PeterL priority: normal severity: normal status: open title: split() splits on non whitespace char when ther is no separator given. type: behavior versions: Python 2.6 ___ Python tracker <http://bugs.python.org/issue8859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8859] split() splits on non whitespace char when ther is no separator given.
Peter Landgren added the comment: I am not sure I can follow you. I will try to be more specific. The test string consists originally of one character; the Czech Š. 1. On Linux with Python 2.6.4 1.1 If I keep the original code line order: label = obj.get() print type(label), repr(label) label = " ".join(label.split()) print type(label), repr(label) label = unicode(label) if len(label) > 40: label = label[:40] + "..." Both lines print type(label), repr(label) gives: '\xc5\xa0' 1.2 If I change order and take the unicode conversion first: label = obj.get() label = unicode(label) print type(label), repr(label) label = " ".join(label.split()) print type(label), repr(label) if len(label) > 40: label = label[:40] + "..." Both lines print type(label), repr(label) gives: u'\u0160' 2. On Windows with Python 2.6.5 2.1 The original code line order: The lines print type(label), repr(label) gives '\xc5\xa0' '\xc5' 8217: ERROR: gramps.py: line 138: Unhandled exception 2.2 If I change order and take the unicode conversion first: Both lines print type(label), repr(label) gives: u'\u0160' 3. If I use this little code: # -*- coding: utf-8 -*- label = 'Š' print type(label), repr(label) label = " ".join(label.split()) print type(label), repr(label) I get '\xc5\xa0' '\xc5\xa0' on both Linux and Windows. The examples above under 1. and 2. comes from an application, Gramps. There is still something I don't understand. -- ___ Python tracker <http://bugs.python.org/issue8859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8859] split() splits on non whitespace char when ther is no separator given.
Peter Landgren added the comment: So as a summary to what Ezio Melotti said: I should always specify encoding when calling split() to be sure nothing nasty happens? (Belive Ezio Melotti meant "calling split()" not "calling unicode()" in his last answer?) Thanks for pointing this out. -- ___ Python tracker <http://bugs.python.org/issue8859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8924] Error in error message in logging
New submission from Peter Landgren : This is in Windows. I got an error message in Logging in my application Gramps. However, there is an error message generated by by this logging, so the original message is never output. The last line indicate a problem with bytes in certain positions. I was able to check it and it is the Swedish character "å". The logging call is: ... except (IOError, OSError), msg: msg = unicode(str(msg), sys.getfilesystemencoding()) LOG.error(_("Could not make database directory: ") + msg) which results in: 3165: ERROR: clidbman.py: line 335: Kunde inte skapa databasmappen: [Error 3] Det gÃ¥r inte att hitta sökväge n: u'F:\\' Traceback (most recent call last): File "C:\Python26\lib\logging\__init__.py", line 769, in emit msg = self.format(record) File "C:\Python26\lib\logging\__init__.py", line 649, in format return fmt.format(record) File "C:\Python26\lib\logging\__init__.py", line 448, in format s = s + record.exc_text UnicodeDecodeError: 'utf8' codec can't decode bytes in position 608-610: invalid data 4893: ERROR: gramps.py: line 138: Unhandled exception Traceback (most recent call last): File "C:\Program Files (x86)\gramps\gui\grampsgui.py", line 353, in __startgramps "Error details: %s %s" % (repr(e), fn), exc_info=True) File "C:\Python26\lib\logging\__init__.py", line 1075, in error self._log(ERROR, msg, args, **kwargs) File "C:\Python26\lib\logging\__init__.py", line 1166, in _log self.handle(record) File "C:\Python26\lib\logging\__init__.py", line 1176, in handle self.callHandlers(record) File "C:\Python26\lib\logging\__init__.py", line 1213, in callHandlers hdlr.handle(record) File "C:\Python26\lib\logging\__init__.py", line 674, in handle self.emit(record) File "C:\Program Files (x86)\gramps\GrampsLogger\_GtkHandler.py", line 26, in emit ErrorView(error_detail=self,rotate_handler=self._rotate_handler) File "C:\Program Files (x86)\gramps\GrampsLogger\_ErrorView.py", line 39, in __init__ self.draw_window() File "C:\Program Files (x86)\gramps\GrampsLogger\_ErrorView.py", line 94, in draw_window tb_label.get_buffer().set_text(self._error_detail.get_formatted_log()) File "C:\Program Files (x86)\gramps\GrampsLogger\_GtkHandler.py", line 29, in get_formatted_log return self.format(self._record) File "C:\Python26\lib\logging\__init__.py", line 649, in format return fmt.format(record) File "C:\Python26\lib\logging\__init__.py", line 448, in format s = s + record.exc_text UnicodeDecodeError: 'utf8' codec can't decode bytes in position 608-610: invalid data If I change line 448 in "C:\Python26\lib\logging\__init__.py" to: s = s + unicode(str(record.exc_text), sys.getfilesystemencoding()) I get the correct message: 4523: ERROR: grampsgui.py: line 353: Gramps terminated because of OS Error Error details: WindowsError(3, 'Det g\xe5r inte att hitta s\xf6kv\xe4gen') F:\grdbtest\*.* Traceback (most recent call last): File "C:\Program Files (x86)\gramps\gui\grampsgui.py", line 337, in __startgramps Gramps(argparser) File "C:\Program Files (x86)\gramps\gui\grampsgui.py", line 268, in __init__ gui=True) File "C:\Program Files (x86)\gramps\cli\arghandler.py", line 81, in __init__ self.dbman = CLIDbManager(self.dbstate) File "C:\Program Files (x86)\gramps\cli\clidbman.py", line 100, in __init__ self._populate_cli() File "C:\Program Files (x86)\gramps\cli\clidbman.py", line 175, in _populate_cli for dpath in os.listdir(dbdir): WindowsError: [Error 3] Det gÃ¥r inte att hitta sökvägen: u'F:\\grdbtest\\*.*' (There is a secondary problem with rendering some characers. All strings were generated on a Windows system, but I report using a Linux system.) -- components: Library (Lib) messages: 107208 nosy: PeterL priority: normal severity: normal status: open title: Error in error message in logging type: behavior versions: Python 2.6 ___ Python tracker <http://bugs.python.org/issue8924> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8924] Error in error message in logging
Peter Landgren added the comment: Answer to your first question: - The variable s is of type 'unicode' - The variable record.exc_text, which is what Formatter.formatException returns, is of type 'str' For your second question; I'm not a python expert, so I can't follow you there. I don't know what to do to test this. -- ___ Python tracker <http://bugs.python.org/issue8924> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com