On Jan 8, 4:14 am, [EMAIL PROTECTED] (Tom Lane) wrote: > [EMAIL PROTECTED] writes: > > I am using tsearch2 with pgsql 8.2.5 and get the following error when > > calling to_tsvector : > > "translation failed from server encoding to wchar_t" > > My database is UTF8 encoded and the data sent to to_tsvector comes > > from a bytea column converted to text with > > encode(COLUMN, 'escape'). > > Two likely theories: > > 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes > some other encoding. > > 2. The encode() is yielding something that isn't valid UTF-8. > > PG 8.3 contains checks that should complain about both of these > scenarios, but IIRC 8.2 does not. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly
Dear Tom, Thanks for your reply. This is the output of `locale` on my system : # locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= As for your second scenario I guess you are right, it's possible encode does not return all UTF8 characters. But to_tsvector() succeeds and fails at random with this kind of characters... So how can I sanitize output from encode before I pipe it to to_tsvector() ? Regards, Ilan ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate