[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6

2010-04-23 Thread Peter Landgren

Peter Landgren  added the comment:

I could add what I have found using bsddb in Python 2.5 and 2.6 under Windows 
XP SP3. In my installation:
Python 2.5.4 bsddb 4.4.5.3
Python 2.6.4 bsddb 4.7.3
What I did: In Gramps imported an XML backup file to a empty bsddb database. It 
took about 1 hour with 2.6.4 and 2 minutes with 2.5.4!
I have also instelled bsddb3:
Python 2.6.4 bsddb3 4.8.4
and with the same import I'm back to 2 minutes.
I have pstat logs which I could provide.

--
nosy: +PeterL

___
Python tracker 
<http://bugs.python.org/issue8504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6

2010-04-23 Thread Peter Landgren

Peter Landgren  added the comment:

Requested data on my Windows box:
Python 2.5  bsddb 4.4.5.3   4.4.20
Python 2.6  bsddb 4.7.3 4.7.25
Python 2.6  bsddb 4.8.4 4.8.26

OK?

--

___
Python tracker 
<http://bugs.python.org/issue8504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6

2010-04-23 Thread Peter Landgren

Peter Landgren  added the comment:

Maybe I should add that there is no speed degradation between 2.5 and 2.5 when 
doing the same thing in Linux.

--

___
Python tracker 
<http://bugs.python.org/issue8504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6

2010-04-23 Thread Peter Landgren

Peter Landgren  added the comment:

In Linux it is:
4.4.5.3 (4, 6, 21)

You asked for a test case. I'm not sure how I can provide one without you 
having Gramps installed to test it.
Do you mean the whole database environment?

--

___
Python tracker 
<http://bugs.python.org/issue8504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8504] bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6

2010-04-23 Thread Peter Landgren

Peter Landgren  added the comment:

To make it 100% clear:

The versions are almost the same for Linux and Windows.
   Python 2.5Python 2.6
Windows  4.4.5.3 (4, 6, 20)4.7.3 (4.7.25)
Linux4.4.5.3 (4, 6, 21)4.7.3 (4.7.25)

--

___
Python tracker 
<http://bugs.python.org/issue8504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8516] Speed difference between Python 2.5 and 2.6 during filling bsddb database.

2010-04-24 Thread Peter Landgren

New submission from Peter Landgren :

The time it takes, in the application Gramps, to fill an empty bsddb database 
by importing an XML backup or a GECDOM file, incrises from about 2 minutes to 
about an hour in Windows XP ana Windows 7. No such degradation has been sen in 
Linux.

The Gramps code was the same in all test cases.
The running conditions were:
 
Python 2.5 Python 2.6
Windows  4.4.5.3 (4, 6, 20)4.7.3 (4.7.25)
Linux4.4.5.3 (4, 6, 21)4.7.3 (4.7.25)

Note one little version difference between Windows and Python.

If I install bsddb3 and change Gramps code for that, no noticable speed 
degradation can be seen.
Windows only with Python 2.6  bsddb3 4.8.4 (4.8.26).

I have run profiling and attach the results.

(Sorry for the fuzz I made in issue 8504.)

The only way of providing a test case,as far as I can find, is to install 
Gramps, create a new Family Tree (empty database) and import an test XML 
backup. There are two testcases (*.gramps)  available in:
http://gramps.svn.sourceforge.net/viewvc/gramps/branches/maintenance/gramps32/example/gramps/

Gramps can be found at: 
http://www.gramps-project.org/wiki/index.php?title=Installation

--
components: Library (Lib)
files: statistics_for_python_25_26_run.txt.tar.gz
messages: 104067
nosy: PeterL
severity: normal
status: open
title: Speed difference between Python 2.5 and 2.6 during filling bsddb 
database.
type: performance
versions: Python 2.6
Added file: 
http://bugs.python.org/file17063/statistics_for_python_25_26_run.txt.tar.gz

___
Python tracker 
<http://bugs.python.org/issue8516>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8516] Speed difference between Python 2.5 and 2.6 during filling bsddb database.

2010-04-24 Thread Peter Landgren

Peter Landgren  added the comment:

1. Sorry, I made a mistake this morning. (Had to run to a funeral.)
These are the correct version:
Python 2.5 Python 2.6
Windows  4.4.5.3 (4, 4, 20)4.7.3 (4.7.25)
Linux4.4.5.3 (4, 6, 21)4.7.3 (4.7.25)

So, the same versions of bsddb and DB in Python 2.6 gives the slow speed 
performance with Windows but not with Linux. This means that the Windows and 
Linux environments are equal as far as I can see.
 
2. I installed bsddb3 5.0.0 without any problem, but I had to move libdb48.dll 
from c:\Python26\bsddb3\utils\
to c:\Python26\Lib\site-packages\bsddb3\
otherwise it could not be found. Any explanation for this?

3. Could not run Gramps in Windows with Py 2.7 as Gramps needs pygtk, pycairo 
and pygobject to run.

It seems to be a strange issue. It can be worked around by using bsddb3 in 
stead in Gramps for those who needs it. It is only a problem when you import a 
backup or a GEDCOM and when you rebuild reference maps, which you don't do very 
often. It's not a issue with normal usage of Gramps.

So, maybe let it wait until 2.7 is out?

--

___
Python tracker 
<http://bugs.python.org/issue8516>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Peter Landgren

New submission from Peter Landgren :

If any of the Swedish characters "åäöÅÄÖ" are input to
unicode.normalize(form, ustr) with form = "NFD" or "NFKD" the result
will be "aaoAAO". "åäöÅÄÖ" are normal character and should be the same
after normalize. They are not connected to aaoAAO other than for
historic reasons, but not in modern languages. It's a common
misinterpretation that the dots and circle above them are diacritic
signs, but those letters should behave as the (Danish)
"Ø" which is normalized correctly.

From Wikipedia:
Å is often perceived as an A with a ring, interpreting the ring as a
diacritical mark. However, in the languages that use it, the ring is not
considered a diacritic but part of the letter.
The letter Ö in the Swedish and Icelandic alphabets historically arises
from the Germanic umlaut, but it is considered a separate letter from O.
See http://en.wikipedia.org/wiki/%C3%85

I think this is pobably impossible to solve as it will be mixed up with
"umlaut" and you don't know what language the specific word is connected to.

--
components: Library (Lib)
messages: 81536
nosy: PeterL
severity: normal
status: open
title: unicode.normalize gives wrong result for some characters
type: behavior
versions: Python 2.5

___
Python tracker 
<http://bugs.python.org/issue5200>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Peter Landgren

Peter Landgren  added the comment:

Thanks for the fast response.

I understand that python follows the unicode specification. I think the unicode 
standard 
is not correct in this case for the Swedish letters. I have asked unicode.org 
for an 
explanation. 

Should not the Danish letter "Ø" be normalized as "O"? I get "Ø" for all 
NFC/NFD/NFKC/NFKD 
normalizations?

Regards,
Peter Landgren

Added file: http://bugs.python.org/file13018/unnamed

___
Python tracker 
<http://bugs.python.org/issue5200>
___http://www.w3.org/TR/REC-html40/strict.dtd";>

p, li { white-space: pre-wrap; }

Thanks for the fast response.

I 
understand that python follows the unicode specification. I think the unicode 
standard is not correct in this case for the Swedish letters. I have asked 
unicode.org for an explanation. 

Should not the Danish letter "Ø" be normalized as "O"? I 
get "Ø" for all NFC/NFD/NFKC/NFKD normalizations?

Regards,
Peter 
Landgren

> 
Martin v. Löwis <mar...@v.loewis.de> added the comment:
>
> 
It is not true that normalize produces "aaoAAO". Instead, it produces
>
> 
u'a\u030aa\u0308o\u0308A\u030aA\u0308O\u0308'
>
> 
This is the correct result, according to the Unicode specification. It
> 
would be incorrect to normalize them unchanged under the Unicode Normal
> 
Form D (for decomposed); the decomposed character for 'LATIN SMALL
> 
LETTER A WITH RING ABOVE' (for example) is 'LATIN SMALL LETTER A' +
> 
'COMBINING RING ABOVE'.
>
> 
The wikipedia article is irrelevant; refer to the Unicode specification
> 
for a normative reference.
>
> 
Closing as invalid.
>
> 
--
> 
nosy: +loewis
> 
resolution:  -> invalid
> 
status: open -> closed
>
> 
___
> 
Python tracker <rep...@bugs.python.org>
> 
<http://bugs.python.org/issue5200>;
> 
___
-- 

Peter 
Landgren
Talken Hagen   
671 
94  BRUNSKOG
0570-530 21
070-635 4719
peter.tal...@telia.com
Skype: pgl4820.2
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Peter Landgren

Peter Landgren  added the comment:

The same applies  "Å" and "A", "Ä" and "A" and "Ö" and "O"
which also are also different letters as "Ø" and "O" are. ("Ø" is the Danish 
version of 
"Ö" )
Maybe not in the unicode world but in treal life.

That's why I'm a little confused.
Will wait and see what/if the unicode people says.
In any case, thanks for the discussion.

Regards,
/Peter

Added file: http://bugs.python.org/file13019/unnamed

___
Python tracker 
<http://bugs.python.org/issue5200>
___http://www.w3.org/TR/REC-html40/strict.dtd";>

p, li { white-space: pre-wrap; }

>
> 
I think you have a fundamental misunderstanding what a "decomposition"
> 
is. "Ø" should *not* be decomposed as "O", because clearly, "Ø" and "O"
> 
are different letters. If anything, it would be decomposed as
> 
"O" + PLUS SOME COMBINING MARK
The 
same applies  "Å" and "A", "Ä" and "A" and "Ö" and "O"
which 
also are also different letters as "Ø" and "O" are. ("Ø" is the Danish 
version of "Ö" )
Maybe 
not in the unicode world but in treal life.

That's why I'm a little confused.
Will 
wait and see what/if the unicode people says.
In 
any case, thanks for the discussion.

Regards,
/Peter

___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5200] unicode.normalize gives wrong result for some characters

2009-02-11 Thread Peter Landgren

Peter Landgren  added the comment:

> Martin v. Löwis  added the comment:
> > The same applies  "Å" and "A", "Ä" and "A" and "Ö" and "O"
> > which also are also different letters as "Ø" and "O" are.
>
> Sure. And rightfully, they "Å" is *not* (I repeat: not)
> normalized as "A", under NFD:
>
> py> unicodedata.normalize("NFD", u"Å")
> u'A\u030a'
>
> > Maybe not in the unicode world but in treal life.
>
> They are different letters also in the Unicode world.
>
> > That's why I'm a little confused.
>
> I think the confusion comes from your assumption that
> normalizing "Å" produces "A". It does not. Really not.

Yes, you are right.

However the confusion/problem shows up when it is used in the application to
build an alphabet and group for example all version of E, É, È, Ë, Ê
together under E. The first character in the result of normalize is
used to build alphabet labels for surnames:

letter = normalize("NFD", surname)[0].upper()
if letter != last_letter:
last_letter = letter

and this is why I get "A" when the surname begins with "Å".

This way it works for all variations of E to be grouped under "E",
but fails as "Å" is shown under the label "A", not the "A" in the
beginning of the alphabet but after "Z", where "ÅÄÖ" comes.
So a previous sorting of the surnames works correctly.
(The Swedish alphabet has 29 letters: A,B,C... X,Y,Z,Å,Ä,Ö)

Can you think of any solution to this conflict? 

u'\xd8'

u'A\u030a'

u'\xc5'

This is obviously the result of how the unicode spec is written
interpreting "Å" as a variation of "A". which it is not.

I have asked the unicode people, but not got any answer yet.

The application is GRAMPS: http://gramps-project.org/

Once again thanks for make some of the unicode stuff clear!
Regards,
Peter Landgren

Added file: http://bugs.python.org/file13025/unnamed

___
Python tracker 
<http://bugs.python.org/issue5200>
___http://www.w3.org/TR/REC-html40/strict.dtd";>

p, li { white-space: pre-wrap; }

> 
Martin v. Löwis <mar...@v.loewis.de> added the comment:
> 
> The same applies  "Å" and "A", "Ä" and "A" and "Ö" and "O"
> 
> which also are also different letters as "Ø" and "O" are.
>
> 
Sure. And rightfully, they "Å" is *not* (I repeat: not)
> 
normalized as "A", under NFD:
>
> 
py> unicodedata.normalize("NFD", u"Å")
> 
u'A\u030a'
>
> 
> Maybe not in the unicode world but in treal life.
>
> 
They are different letters also in the Unicode world.
>
> 
> That's why I'm a little confused.
>
> 
I think the confusion comes from your assumption that
> 
normalizing "Å" produces "A". It does not. Really not.

Yes, 
you are right.

However the confusion/problem shows up when it is used in 
the application to
build 
an alphabet and group for example all version of E, É, È, Ë, Ê
together under E. The first character in the result of 
normalize is
used 
to build alphabet labels for surnames:

letter = normalize("NFD", surname)[0].upper()
if 
letter != last_letter:

last_letter = letter

and 
this is why I get "A" when the surname begins with "Å".

This 
way it works for all variations of E to be grouped under "E",
but 
fails as "Å" is shown under the label "A", not the "A" in the
beginning of the alphabet but after "Z", where "ÅÄÖ" 
comes.
So a 
previous sorting of the surnames works correctly.
(The 
Swedish alphabet has 29 letters: A,B,C... X,Y,Z,Å,Ä,Ö)

Can 
you think of any solution to this conflict? 

I 
still think "Å" or "Ä" or "Ö" should behave as "Ø":
>>> unicodedata.normalize("NFD",u"Ø")
u'\xd8'

Now, 
as you said:
>>> unicodedata.normalize("NFD",u"Å")
u'A\u030a'

But 
it should be (in my opinion):
>>> unicodedata.normalize("NFD",u"Å")
u'\xc5'

This 
is obviously the result of how the unicode spec is written
interpreting "Å" as a variation of "A". which it is not.

I 
have asked the unicode people, but not got any answer yet.

The 
application is GRAMPS: http://gramps-project.org/

Once 
again thanks for make some of the unicode stuff clear!
Regards,
Peter 
Landgren
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5200] unicode.normalize gives wrong result for some characters

2009-02-11 Thread Peter Landgren

Peter Landgren  added the comment:

The È... comes from French surnames and our French developer wants to group all 
versions 
of E together. The É... can be found in French surnames in Sweden as well as in 
Germany.
The program, GRAMPS is a genealogy program used in about 20 languages, so there 
is no 
preferred language.

I know. However, Swedish telephone books and dictionaries are sorted the same:
A,B,C... X,Y,Z,Å,Ä,Ö.

True. I agree. 
GRAMPS runs in the locale of the user, but must be able to handle information 
coming from 
many other languages/countries. That's why it's hard to be universal.

We can have them in names. See above.

I think we have found a solution that can handle most cases.
We treat surnames beginning with "ÅÄÖ" special. I don't think that there are 
many surnames 
outside the Nordic countries that starts with any of these three letters.

Vielen dank!

/Peter

Added file: http://bugs.python.org/file13034/unnamed

___
Python tracker 
<http://bugs.python.org/issue5200>
___http://www.w3.org/TR/REC-html40/strict.dtd";>

p, li { white-space: pre-wrap; }

> 
I don't quite understand why you want to place É, È, Ë, Ê all along
> 
with E, yet Å,Ä,Ö after Z. Because that's what the Swedish alphabet
> 
says? 
The 
È... comes from French surnames and our French developer wants to group all 
versions of E together. The É... can be found in French surnames in Sweden as 
well as in Germany.
The 
program, GRAMPS is a genealogy program used in about 20 languages, so there is 
no preferred language.

> 
Please understand that collation varies across languages. For example
> 
in German, we also have Ä, but it does *not* come after Z. Instead,
> 
there are two ways to collate Ä (telephone book vs. dictionary):
> 
1. Ä sorts exactly like A
> 
2. Ä sorts as if it was transcribed as Ae
I 
know. However, Swedish telephone books and dictionaries are sorted the same:
A,B,C... X,Y,Z,Å,Ä,Ö.

> 
So there is no one true collation of Ä, but you have to take into
> 
account what language rules you want to follow.
True. 
I agree. 
GRAMPS runs in the locale of the user, but must be able to 
handle information coming from many other languages/countries. That's why it's 
hard to be universal.

> 
If you want to implement Swedish rules, why then do you also want
> 
to support É, È, Ë, Ê? Do you have these letters in Swedish at all?
We 
can have them in names. See above.

> 
If you want to use obscure collation rules, you might have to
> 
implement the collation algorithm yourself. For example, assign
> 
each letter a unique number (different from the Unicode ordinal),
> 
and then sort by these numbers.
>
> 
Take a look at ICU, which already includes collation algorithms
> 
for many locales.
I 
think we have found a solution that can handle most cases.
We 
treat surnames beginning with "ÅÄÖ" special. I don't think that there are 
many surnames outside the Nordic countries that starts with any of these three 
letters.

Vielen dank!

/Peter
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6525] Problem with string.lowercase in Windows XP

2009-07-20 Thread Peter Landgren

New submission from Peter Landgren :

string.lowercase is changed after locale.setlocale(locale.LC_ALL,'') in
Windows XP but not in Linux.
This little test script on Windows XP and Linux explains the problem:

import locale
import string
print string.lowercase
print locale.setlocale(locale.LC_ALL,'C')
print string.lowercase
print locale.setlocale(locale.LC_ALL,'')
print string.lowercase

Result on Win XP with Python 2.5.1:
abcdefghijklmnopqrstuvwxyz
C
abcdefghijklmnopqrstuvwxyz
Swedish_Sweden.1252
abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■

Result on Linux with Python 2.5.2:
abcdefghijklmnopqrstuvwxyz
C
abcdefghijklmnopqrstuvwxyz
sv_SE.UTF-8
abcdefghijklmnopqrstuvwxyz

--
components: Extension Modules
messages: 90733
nosy: PeterL
severity: normal
status: open
title: Problem with string.lowercase in Windows XP
type: crash
versions: Python 2.5

___
Python tracker 
<http://bugs.python.org/issue6525>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6525] Problem with string.lowercase in Windows XP

2009-07-20 Thread Peter Landgren

Peter Landgren  added the comment:

Thru, but later in the application code like this
a = u"qaz" + string.lowercase[26]

causes
   a = u"qaz" + string.lowercase[26]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0:
ordinal not in range(128)

0x83 corresponds to â.

--

___
Python tracker 
<http://bugs.python.org/issue6525>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6525] Problem with string.lowercase in Windows XP

2009-07-20 Thread Peter Landgren

Peter Landgren  added the comment:

True, but later in the application code like this
a = u"qaz" + string.lowercase[26]

causes
   a = u"qaz" + string.lowercase[26]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0:
ordinal not in range(128)

0x83 corresponds to â.

--

___
Python tracker 
<http://bugs.python.org/issue6525>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6525] Problem with string.lowercase in Windows XP

2009-07-20 Thread Peter Landgren

Peter Landgren  added the comment:

OK about 2.5
Downloaded and installed Python 2.6.2 on my Win XP box and get the same
error as with Python 2.5.1.

Ok about Python 3, it will be nice when we have upgraded our
application, Gramps, to this version and get rid of all kind of coding
issues.

--

___
Python tracker 
<http://bugs.python.org/issue6525>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6525] Problem with string.lowercase in Windows XP

2009-07-20 Thread Peter Landgren

Peter Landgren  added the comment:

Just some more test. I compared the result of string.letters,
string.uppercase and string.lowercase in 2.5 and 2.6:

Python25:
Letters=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzâèîÄܣ׃¬Á║└┴┬├─┼ãÃ
╚╔╩╦╠═╬¤ðÐÊËÈıÍÏ┘┌█▄¦Ì▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■ 
Upper= ABCDEFGHIJKLMNOPQRSTUVWXYZèîă└┴┬├─┼ãÃ╚╔╩╦╠═╬¤ðÐÊËÈıÍÏ┘┌█▄¦Ì
Lower= abcdefghijklmnopqrstuvwxyzâܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■ 

Python26:
Letters=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzƒSOZsozYªµºÀÁÂÃÄÅÆÇ
ÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
Upper= ABCDEFGHIJKLMNOPQRSTUVWXYZSOZYÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
Lower= abcdefghijklmnopqrstuvwxyzƒsozªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ

They return different contents, but the length are the same!

--

___
Python tracker 
<http://bugs.python.org/issue6525>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6525] Problem with string.lowercase in Windows XP

2009-07-22 Thread Peter Landgren

Peter Landgren  added the comment:

OK,
Agreed for 2.6.

But for 2.5 many of the characters returned by string.lowercase:
âܣ׬Á║▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷°¨·¹³²■ 
are not lowercase letters at all, but that is history now, as 2.5 is history.
We solved it by using ascii_lowercase.
Thanks,
Peter Landgren

> Georg Brandl  added the comment:
>
> This behavior is not a bug - when setting the locale, string.lowercase
> and friends are augmented by whatever the locale considers uppercase and
> lowercase letters, as byte strings.  This will lead to decoding errors
> when these strings are combined with Unicode strings.
>
> Either you use string.ascii_lowercase and friends, or you make sure you
> know what encoding the strings will be in, and decode accordingly.
>
> --
> nosy: +georg.brandl
> resolution:  -> wont fix
> status: open -> closed
>
> ___
> Python tracker 
> <http://bugs.python.org/issue6525>
> ___

--

___
Python tracker 
<http://bugs.python.org/issue6525>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6525] Problem with string.lowercase in Windows XP

2009-07-23 Thread Peter Landgren

Peter Landgren  added the comment:

Obviously, 2.5 and 2.6 decode the "string.lowercase"  when print is used and 
2.6 seems to 
be the correct.

Yes. I get exactly the same result in both
Python 2.5.2 (r252:60911, Jan  8 2009, 12:17:37)
and
Python 2.6.2 (r262:71600, Jul 23 2009, 09:01:02)
showing that string.lowercase does NOT change with locale.

'sv_SE.UTF-8'
>>> a = string.lowercase
>>> len(a)
26
>>> a
'abcdefghijklmnopqrstuvwxyz'
>>> print a
abcdefghijklmnopqrstuvwxyz
>>> string.ascii_lowercase == string.lowercase
True
>>>

--

___
Python tracker 
<http://bugs.python.org/issue6525>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8859] split() splits on non whitespace char when ther is no separator given.

2010-05-30 Thread Peter Landgren

New submission from Peter Landgren :

When the variable label is equal to '\xc5\xa0 Z\nX W'
this line sequence
label = " ".join(label.split())
label = unicode(label)
results in:
7347: ERROR: gramps.py: line 138: Unhandled exception
Traceback (most recent call last):
  File "C:\Program Files (x86)\gramps\gui\views\listview.py", line 660, in 
row_changed
self.uistate.modify_statusbar(self.dbstate)
  File "C:\Program Files (x86)\gramps\DisplayState.py", line 521, in 
modify_statusbar
name, obj = navigation_label(dbstate.db, nav_type, active_handle)
  File "C:\Program Files (x86)\gramps\Utils.py", line 1358, in navigation_label
label = unicode(label)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid 
data

While this line sequence:
label = unicode(label)
label = " ".join(label.split())
gives correct result and no error.

With the error the variable label changes from
'\xc5\xa0 Z\nX W'
to
'\xc5 Z X W'
by the line:
label = " ".join(label.split())
Note '\xa0' has been dropped, interpreted as "whitespace"?
This happens on Windows. It works perfectly well on Linux.

--
components: Library (Lib)
messages: 106773
nosy: PeterL
priority: normal
severity: normal
status: open
title: split() splits on non whitespace char when ther is no separator given.
type: behavior
versions: Python 2.6

___
Python tracker 
<http://bugs.python.org/issue8859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8859] split() splits on non whitespace char when ther is no separator given.

2010-05-30 Thread Peter Landgren

Peter Landgren  added the comment:

I am not sure I can follow you. I will try to be more specific.

The test string consists originally of one character; the Czech Š.

1. On Linux with Python 2.6.4
1.1 If I keep the original code line order:
label = obj.get()
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
label = unicode(label)
if len(label) > 40:
label = label[:40] + "..."

Both lines print type(label), repr(label) gives:
 '\xc5\xa0'

1.2 If I change order and take the unicode conversion first:
label = obj.get()
label = unicode(label)
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
if len(label) > 40:
label = label[:40] + "..."

Both lines print type(label), repr(label) gives:
 u'\u0160'

2. On Windows with Python 2.6.5
2.1 The original code line order:
The lines print type(label), repr(label) gives
 '\xc5\xa0'
 '\xc5'
 8217: ERROR: gramps.py: line 138: Unhandled exception
 

2.2 If I change order and take the unicode conversion first:
Both lines print type(label), repr(label) gives:
 u'\u0160'

3.
If I use this little code:
# -*- coding: utf-8 -*-
label = 'Š'
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
I get 
 '\xc5\xa0'
 '\xc5\xa0'
on both Linux and Windows.

The examples above under 1. and 2. comes from an application, Gramps.

There is still something I don't understand.

--

___
Python tracker 
<http://bugs.python.org/issue8859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8859] split() splits on non whitespace char when ther is no separator given.

2010-05-31 Thread Peter Landgren

Peter Landgren  added the comment:

So as a summary to what Ezio Melotti said:
I should always specify encoding when calling split() to be sure nothing nasty 
happens? (Belive Ezio Melotti meant "calling split()" not "calling unicode()" 
in his last answer?)

Thanks for pointing this out.

--

___
Python tracker 
<http://bugs.python.org/issue8859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8924] Error in error message in logging

2010-06-06 Thread Peter Landgren

New submission from Peter Landgren :

This is in Windows.
I got an error message in Logging in my application Gramps.
However, there is an error message generated by by this logging, so the 
original message is never output.
The last line indicate a problem with bytes in certain positions. I was able to 
check it and it is the Swedish character "å".

The logging call is:
...
except (IOError, OSError), msg:
msg = unicode(str(msg), sys.getfilesystemencoding())
LOG.error(_("Could not make database directory: ") + msg)
which results in:
3165: ERROR: clidbman.py: line 335: Kunde inte skapa databasmappen: [Error 3] 
Det går inte att hitta sökväge
n: u'F:\\'

Traceback (most recent call last):
  File "C:\Python26\lib\logging\__init__.py", line 769, in emit
msg = self.format(record)
  File "C:\Python26\lib\logging\__init__.py", line 649, in format
return fmt.format(record)
  File "C:\Python26\lib\logging\__init__.py", line 448, in format
s = s + record.exc_text
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 608-610: 
invalid data
4893: ERROR: gramps.py: line 138: Unhandled exception
Traceback (most recent call last):
  File "C:\Program Files (x86)\gramps\gui\grampsgui.py", line 353, in 
__startgramps
"Error details: %s %s" % (repr(e), fn), exc_info=True)
  File "C:\Python26\lib\logging\__init__.py", line 1075, in error
self._log(ERROR, msg, args, **kwargs)
  File "C:\Python26\lib\logging\__init__.py", line 1166, in _log
self.handle(record)
  File "C:\Python26\lib\logging\__init__.py", line 1176, in handle
self.callHandlers(record)
  File "C:\Python26\lib\logging\__init__.py", line 1213, in callHandlers
hdlr.handle(record)
  File "C:\Python26\lib\logging\__init__.py", line 674, in handle
self.emit(record)
  File "C:\Program Files (x86)\gramps\GrampsLogger\_GtkHandler.py", line 26, in 
emit
ErrorView(error_detail=self,rotate_handler=self._rotate_handler)
  File "C:\Program Files (x86)\gramps\GrampsLogger\_ErrorView.py", line 39, in 
__init__
self.draw_window()
  File "C:\Program Files (x86)\gramps\GrampsLogger\_ErrorView.py", line 94, in 
draw_window
tb_label.get_buffer().set_text(self._error_detail.get_formatted_log())
  File "C:\Program Files (x86)\gramps\GrampsLogger\_GtkHandler.py", line 29, in 
get_formatted_log
return self.format(self._record)
  File "C:\Python26\lib\logging\__init__.py", line 649, in format
return fmt.format(record)
  File "C:\Python26\lib\logging\__init__.py", line 448, in format
s = s + record.exc_text
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 608-610: 
invalid data

If I change line 448 in "C:\Python26\lib\logging\__init__.py" to:
s = s + unicode(str(record.exc_text), sys.getfilesystemencoding())

I get the correct message:
4523: ERROR: grampsgui.py: line 353: Gramps terminated because of OS Error
Error details: WindowsError(3, 'Det g\xe5r inte att hitta s\xf6kv\xe4gen') 
F:\grdbtest\*.*
Traceback (most recent call last):
  File "C:\Program Files (x86)\gramps\gui\grampsgui.py", line 337, in 
__startgramps
Gramps(argparser)
  File "C:\Program Files (x86)\gramps\gui\grampsgui.py", line 268, in __init__
gui=True)
  File "C:\Program Files (x86)\gramps\cli\arghandler.py", line 81, in __init__
self.dbman = CLIDbManager(self.dbstate)
  File "C:\Program Files (x86)\gramps\cli\clidbman.py", line 100, in __init__
self._populate_cli()
  File "C:\Program Files (x86)\gramps\cli\clidbman.py", line 175, in 
_populate_cli
for dpath in os.listdir(dbdir):
WindowsError: [Error 3] Det går inte att hitta sökvägen: u'F:\\grdbtest\\*.*'

(There is a secondary problem with rendering some characers. All strings were 
generated on a Windows system,
but I report using a Linux system.)

--
components: Library (Lib)
messages: 107208
nosy: PeterL
priority: normal
severity: normal
status: open
title: Error in error message in logging
type: behavior
versions: Python 2.6

___
Python tracker 
<http://bugs.python.org/issue8924>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8924] Error in error message in logging

2010-06-11 Thread Peter Landgren

Peter Landgren  added the comment:

Answer to your first question:
- The variable s is of type 'unicode'
- The variable record.exc_text, which is what Formatter.formatException 
returns, is of type 'str'

For your second question; I'm not a python expert, so I can't follow you there. 
I don't know what to do to test this.

--

___
Python tracker 
<http://bugs.python.org/issue8924>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com