Re: [GENERAL] Unicode normalization

2009-09-17 Thread Andreas Kalsch
My standard encoding is UTF-8 on all levels so I don't need this high-cost call: plpy.execute("select setting from pg_settings where name = 'server_encoding'"); Additionally I want to get the original cases. For this purpose my solution is still fitting to my need. But it is not the one you

Re: [GENERAL] Unicode normalization

2009-09-17 Thread Sam Mason
On Thu, Sep 17, 2009 at 12:01:57AM -0400, Alvaro Herrera wrote: > http://wiki.postgresql.org/wiki/Strip_accents_from_strings I'm still confused as to why plpython doesn't know the server's encoding already; seems as though all text operations are predicated on knowing this and hence all but the mo

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Alvaro Herrera
Andreas Kalsch wrote: > 2) Transfering this to PL/Python: > > CREATE OR REPLACE FUNCTION test (str text) > RETURNS text > AS $$ >import unicodedata >return unicodedata.normalize('NFKD', str.decode('UTF-8')) > $$ LANGUAGE plpythonu; This is wrong, which is why we published a correct vers

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Scott Marlowe
On Wed, Sep 16, 2009 at 4:42 PM, Sam Mason wrote: > On Wed, Sep 16, 2009 at 09:35:02PM +0200, Andreas Kalsch wrote: >> CREATE OR REPLACE FUNCTION test (str text) >>  RETURNS text >> AS $$ >>    import unicodedata >>    return unicodedata.normalize('NFKD', str.decode('UTF-8')) >> $$ LANGUAGE plpyth

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Sam Mason
On Wed, Sep 16, 2009 at 09:35:02PM +0200, Andreas Kalsch wrote: > CREATE OR REPLACE FUNCTION test (str text) > RETURNS text > AS $$ >import unicodedata >return unicodedata.normalize('NFKD', str.decode('UTF-8')) > $$ LANGUAGE plpythonu; I'd guess you want that to be: return unicodedata.

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Andreas Kalsch
Update: The error is of course: The function tries to return "str" instead of unicode. It is not str.decode('UTF-8') which causes the error. Andreas Kalsch schrieb: No, I need a solution which is as generic as possible. I use UTF-8 encoded unicode strings on all levels. This is what I have do

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Andreas Kalsch
No, I need a solution which is as generic as possible. I use UTF-8 encoded unicode strings on all levels. This is what I have done so far: 1) Writing a separate Python command line script for testing - works as expected: #!/usr/bin/python import sys import unicodedata str = sys.argv[1].d

Re: [GENERAL] Unicode normalization

2009-09-16 Thread David Fetter
On Wed, Sep 16, 2009 at 07:20:21PM +0200, Andreas Kalsch wrote: > Has somebody integrated Unicode normalization into Postgres? if not, I > would have to implement my own function by using this CPAN module: > http://search.cpan.org/~sadahiro/Unicode-Normalize-1.03/ . > > I need a function which

[GENERAL] Unicode normalization

2009-09-16 Thread Andreas Kalsch
Has somebody integrated Unicode normalization into Postgres? if not, I would have to implement my own function by using this CPAN module: http://search.cpan.org/~sadahiro/Unicode-Normalize-1.03/ . I need a function which removes all diacritics (1) and transforms some characters to a more compa