accessing individual characters in unicode strings

2008-04-11 Thread Peter Robinson
Dear list
I am at my wits end on what seemed a very simple task:
I have some greek text, nicely encoded in utf8, going in and out of a  
xml database, being passed over and beautifully displayed on the web.   
For example: the most common greek word of all 'kai' (or και if your  
mailer can see utf8)
So all I want to do is:
step through this string a character at a time, and do something for  
each character (actually set a width attribute somewhere else for each  
character)

Should be simple, yes?
turns out to be near impossible.  I tried using a simple index  
character routine such as ustr[0]..ustr[1]... and this gives rubbish.   
So I use len() to find out how long my simple greek string is, and of  
course it is NOT three characters long.

A day of intensive searching around the lists tells me that unicode  
and python is a moving target: so many fixes are suggested for similar  
problems, none apparently working with mine.

Here is the best I can do, so far
I convert the utf8 string using
ustr  = repr(unicode(thisword, 'iso-8859-7'))
for kai this gives the following:

u'\u039e\u038a\u039e\xb1\u039e\u0389'

so now things should be simple, yes? just go through this and identify  
each character...

Not so simple at all.
k, kappa: turns out to be TWO \u strings, not one: thus \u039e\u038a
similarly, iota is also two \u strings:  \u039e\u0389
alpha is a \u string followed by a \x string: \u039e\xb1

looking elsewhere in the record,

my particular favourite is the midpoint character: this comes out as  
\u03b1\x90\xa7 !
and in the middle of all this, there are some non-unicode characters:  
\u039e\u038fc is o followed by c!

well, I don't have many characters to deal this and I could cope with  
this mess by tedious matching character by character.
But surely, there is a better way...
help please


Peter Robinson: [EMAIL PROTECTED]


-- 
http://mail.python.org/mailman/listinfo/python-list

DBXML and removeDocument in Python

2008-04-16 Thread Peter Robinson
I am trying to add and remove documents in a container in Berkeley/ 
Oracle DB XML within Python, on Mac OS X Leopard. putDocument works  
fine, but I keep getting 'attributeError' when I try removeDocument.
I can't find any documentation on removeDocument in Python and it is  
not in the examples.py.
My code looks like:

results = p.query('//[EMAIL PROTECTED]"I-57-1r"]'
xc = p.getcontainer()
xm = p.getxmlmanager()
uc = xm.createUpdateContext()
if results.hasNext() is True:
#delete the document!
xc.removeDocument('57-1r', uc)
#add a new document with the same name
xc.putDocument('57-1r' , ', uc)

Put document works fine. I can remove the document using  
removeDocument from the shell, but not from within Python. Help...
Peter Robinson: [EMAIL PROTECTED]
Scholarly Digital Editions
12 The Old Silverworks
54a Spencer Street
Jewellery Quarter
Birmingham B18 6JT
fax: 44 (0) 121 275 6212


-- 
http://mail.python.org/mailman/listinfo/python-list