[GENERAL] UNICODE problem on 7.4 with COPY

Toby Doig Mon, 01 Dec 2003 09:14:02 -0800

When I try to import data from a unicode file into PostgreSQL 7.4 under FreeBSD it 
appears to now understand the Unicode file format.


To demonstrate I export a set of Integers into a Unicode file from MSSQL 2000. I samba 
the file to a FreeBSD box and try to import from psql with COPY. It fails. Wordpad and 
Notepad both read the file ok, even after I bounce the file via the FreeBSD box (to 
test samba didn't munge it).

FreeBSD 5.1-RELEASE #0
PGSql 7.4 (dl'd and compiled fri 28th Nov 2003)
Dual 800MHz P3's

I create a database with encoding = UNICODE.
I create a table

CREATE TABLE testunicode
(
  anum int4
) WITHOUT OIDS;

I then use psql to import the file, which is a single column of integers.

copy testunicode from '/home/toby/itxt/anum.txt';     
ERROR:  invalid input syntax for integer: "��1"
CONTEXT:  COPY testunicode, line 1, column anum: "��1"


When viewing the file as hex I see:
FF FE 31 00 31 00 32 00 37 00 39 00 30 00 0D 00 0A 00
 �  �  1  .  1  .  2  .  7  .  9  .  0  .  .  .  .  .

According to http://www.crispen.org/src/archive/0013.html

FF FE   UTF-16/UCS-2, big endian

So, what is going wrong? Why can't I import this very simple unicode file?
I've searched the archives and google, but to no avail.

Btw, the actual stuff I want to import is larger and more complex, this little table 
is to demonstrate the problem.

Help would be muchly appreciated.
Toby

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

[GENERAL] UNICODE problem on 7.4 with COPY

Reply via email to