Andrew Dunstan <[EMAIL PROTECTED]> writes: > Here are some timing tests in 1m rows of random utf8 encoded 100 char > data. It doesn't look to me like the saving you're suggesting is worth > the trouble.
Hmm ... not sure I believe your numbers. Using a test file of 1m lines of 100 random latin1 characters converted to utf8 (thus, about half and half 7-bit ASCII and 2-byte utf8 characters), I get this in SQL_ASCII encoding: regression=# \timing Timing is on. regression=# create temp table test(f1 text); CREATE TABLE Time: 5.047 ms regression=# copy test from '/home/tgl/zzz1m'; COPY 1000000 Time: 4337.089 ms and this in UTF8 encoding: utf8=# \timing Timing is on. utf8=# create temp table test(f1 text); CREATE TABLE Time: 5.108 ms utf8=# copy test from '/home/tgl/zzz1m'; COPY 1000000 Time: 7776.583 ms The numbers aren't super repeatable, but it sure looks to me like the encoding check adds at least 50% to the runtime in this example; so doing it twice seems unpleasant. (This is CVS HEAD, compiled without assert checking, on an x86_64 Fedora Core 6 box.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate