[BUGS] BUG #5532: Valid UTF8 sequence errors as invalid

2010-06-30 Thread Michael Lewis

The following bug has been logged online:

Bug reference:  5532
Logged by:  Michael Lewis
Email address:  mikelikes...@gmail.com
PostgreSQL version: 9.0 trunk
Operating system:   OS X
Description:Valid UTF8 sequence errors as invalid
Details: 

I'm using Python to sanitize my logs from invalid UTF8 characters before
COPYing them into postgres.  I came across this one sequence that seems to
be valid UTF8 (in the extended range I believe).

It goes through both pythons encoding as well as iconv without an error and
is valid as far as my understanding of UTF8 goes so I am assuming it is a
bug.

Test case:

create table t (v varchar);
insert into t values (E'\xed\xbc\xad');


In bash you can do:

echo -e "\xed\xbc\xad" | iconv -f UTF-8 ; echo $?

to validate it


Thanks,
Mike

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #5531: REGEXP_ REPLACE causes connection drop

2010-06-30 Thread Ola Sergatchov

The following bug has been logged online:

Bug reference:  5531
Logged by:  Ola Sergatchov
Email address:  ola_sergatc...@hotmail.com
PostgreSQL version: 8.1.18
Operating system:   RedHat Linux 4.1.2-46
Description:REGEXP_ REPLACE causes connection drop
Details: 

Passing large string to REGEXP_REPLACE function causes the DB connection to
drop. We executed this function with very large strings (20,000 - 250,000
characters) both from RedHat and PgAdmin and in both cases the function
fails to return and eventually the connection drops. From looking in the
documentation, there is no reference to the maximum size of the string that
his function can process.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5531: REGEXP_ REPLACE causes connection drop

2010-06-30 Thread Bruce Momjian
Ola Sergatchov wrote:
> 
> The following bug has been logged online:
> 
> Bug reference:  5531
> Logged by:  Ola Sergatchov
> Email address:  ola_sergatc...@hotmail.com
> PostgreSQL version: 8.1.18
> Operating system:   RedHat Linux 4.1.2-46
> Description:REGEXP_ REPLACE causes connection drop
> Details: 
> 
> Passing large string to REGEXP_REPLACE function causes the DB connection to
> drop. We executed this function with very large strings (20,000 - 250,000
> characters) both from RedHat and PgAdmin and in both cases the function
> fails to return and eventually the connection drops. From looking in the
> documentation, there is no reference to the maximum size of the string that
> his function can process.

Can you show us any relevant entries in the server logs?  FYI, 8.1.18 is
both old for minor and major release.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5531: REGEXP_ REPLACE causes connection drop

2010-06-30 Thread Tom Lane
"Ola Sergatchov"  writes:
> The following bug has been logged online:

> Bug reference:  5531
> Logged by:  Ola Sergatchov
> Email address:  ola_sergatc...@hotmail.com
> PostgreSQL version: 8.1.18
> Operating system:   RedHat Linux 4.1.2-46
> Description:REGEXP_ REPLACE causes connection drop
> Details: 

> Passing large string to REGEXP_REPLACE function causes the DB connection to
> drop. We executed this function with very large strings (20,000 - 250,000
> characters) both from RedHat and PgAdmin and in both cases the function
> fails to return and eventually the connection drops.

I don't see any such misbehavior here: it either completes or reports
"out of memory".  Could we see a specific problematic example?  Also
what sort of hardware are you on?  (32-bit vs 64-bit seems relevant
in particular.)

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5532: Valid UTF8 sequence errors as invalid

2010-06-30 Thread Tom Lane
"Michael Lewis"  writes:
> I'm using Python to sanitize my logs from invalid UTF8 characters before
> COPYing them into postgres.  I came across this one sequence that seems to
> be valid UTF8 (in the extended range I believe).

It is not valid.  See http://tools.ietf.org/html/rfc3629 --- a sequence
beginning with ED must have a second byte in the range 80-9F to be
legal, and this doesn't.  The example you give would decode as U+DF2D,
ie part of a surrogate pair, which is specifically disallowed in UTF8
--- you're supposed to code the original character directly, not via a
surrogate pair.  The primary reason for this rule is that otherwise
there are multiple ways to encode the same character, which can be a
security hazard.

> It goes through both pythons encoding as well as iconv without an error

You should file bugs against those tools.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5532: Valid UTF8 sequence errors as invalid

2010-06-30 Thread Mike Lewis
>
>
>
> It is not valid.  See http://tools.ietf.org/html/rfc3629 --- a sequence
> beginning with ED must have a second byte in the range 80-9F to be
> legal, and this doesn't.  The example you give would decode as U+DF2D,
> ie part of a surrogate pair, which is specifically disallowed in UTF8
> --- you're supposed to code the original character directly, not via a
> surrogate pair.  The primary reason for this rule is that otherwise
> there are multiple ways to encode the same character, which can be a
> security hazard.
>
>
Thanks for the explanation.  Unicode has always given me a hard time.


>
> You should file bugs against those tools.
>
> I certainly will.  I apologize for filing the bug against postgres (I
suppose the "voting" method of figuring out which piece software is the
buggy one has failed me).


I've run into a fair amount of unicode errors when trying to copy in log
files.  Would you recommend using bytea or another data type instead of text
or varchar... or at least copying to a staging table with bytea's and
filtering out invalid rows when moving it to the main table?


Re: [BUGS] BUG #5532: Valid UTF8 sequence errors as invalid

2010-06-30 Thread Tom Lane
Mike Lewis  writes:
> I've run into a fair amount of unicode errors when trying to copy in log
> files.  Would you recommend using bytea or another data type instead of text
> or varchar... or at least copying to a staging table with bytea's and
> filtering out invalid rows when moving it to the main table?

My guess is that you're working with data that was originally
represented in UTF16, and you've used a tool that doesn't really know
what it's doing to convert to UTF8.  A correct conversion has to reunite
surrogate pairs into wider-than-16-bit Unicode characters and then
encode those as single UTF8 sequences.  Dunno if you can easily identify
the culprit, but fixing that conversion is the long-term solution.

(BTW, I should think that iconv or some related tool would have a
solution for fixing this miscoding; it's not an uncommon problem.)

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Libpq.dll: File not recognized

2010-06-30 Thread Bidski
Hi all,

I downloaded the "one-click" installer for PostgreSQL 9.0 beta2 for Windows x64 
(postgresql-9.0.0-beta2-windows-x64.exe) and installed it on my Windows 7 64 
bit system. Installation seemed to go off without a hitch, but when I tried to 
link to libpq in my project I received the following error.

C:\Program Files\PostgreSQL\lib/libpq.dll: file not recognized: File format 
not recognized

collect2: ld returned 1 exit status



I originally thought that this might have something to do with the use of 
different compilers/linkers (compiler/linker that I am using compared to the 
one used with to build libpq), but now Im not so sure.



Some system info.

OS: Windows 7 64-bit

MSYS ver: 1.0.14

MinGW ver: 4.5.0

Make ver: GNU make 3.81

If it matters . My IDE is Eclipse Galileo and I am compiling my project 
using the MinGW toolchain.



Regards

Bidski


Re: [BUGS] Libpq.dll: File not recognized

2010-06-30 Thread John R Pierce

On 06/30/10 3:23 PM, Bidski wrote:

Hi all,
I downloaded the "one-click" installer for PostgreSQL 9.0 beta2 for 
Windows x64 (postgresql-9.0.0-beta2-windows-x64.exe) and installed it 
on my Windows 7 64 bit system. Installation seemed to go off without a 
hitch, but when I tried to link to libpq in my project I received the 
following error.


C:\Program Files\PostgreSQL\lib/libpq.dll: file not recognized: 
File format not recognized


collect2: ld returned 1 exit status

I originally thought that this might have something to do with the use 
of different compilers/linkers (compiler/linker that I am using 
compared to the one used with to build libpq), but now Im not so sure.


Some system info.

OS: Windows 7 64-bit

MSYS ver: 1.0.14

MinGW ver: 4.5.0

Make ver: GNU make 3.81

If it matters . My IDE is Eclipse Galileo and I am compiling my 
project using the MinGW toolchain.





wild guess says your project is 64 bit, but libpq.dll is 32 bit.





--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Libpq.dll: File not recognized

2010-06-30 Thread Bidski

John R Pierce  writes:

wild guess says your project is 64 bit, but libpq.dll is 32 bit.


Why would there be a 32bit dll in the 64bit pre-built package?

Regards
Bidski

--
From: "John R Pierce" 
Sent: Thursday, July 01, 2010 8:34 AM
To: 
Subject: Re: [BUGS] Libpq.dll: File not recognized


On 06/30/10 3:23 PM, Bidski wrote:

Hi all,
I downloaded the "one-click" installer for PostgreSQL 9.0 beta2 for 
Windows x64 (postgresql-9.0.0-beta2-windows-x64.exe) and installed it 
on my Windows 7 64 bit system. Installation seemed to go off without a 
hitch, but when I tried to link to libpq in my project I received the 
following error.


C:\Program Files\PostgreSQL\lib/libpq.dll: file not recognized: 
File format not recognized


collect2: ld returned 1 exit status

I originally thought that this might have something to do with the use 
of different compilers/linkers (compiler/linker that I am using 
compared to the one used with to build libpq), but now Im not so sure.


Some system info.

OS: Windows 7 64-bit

MSYS ver: 1.0.14

MinGW ver: 4.5.0

Make ver: GNU make 3.81

If it matters . My IDE is Eclipse Galileo and I am compiling my 
project using the MinGW toolchain.





wild guess says your project is 64 bit, but libpq.dll is 32 bit.





--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Libpq.dll: File not recognized

2010-06-30 Thread John R Pierce

On 06/30/10 3:40 PM, Bidski wrote:

John R Pierce  writes:

wild guess says your project is 64 bit, but libpq.dll is 32 bit.


Why would there be a 32bit dll in the 64bit pre-built package?


look and see if there's another /lib64/ or similar directory ?

I didn't notice earlier that you are running the postgres 9 beta.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Libpq.dll: File not recognized

2010-06-30 Thread Bidski

John R Pierce  writes:

look and see if there's another /lib64/ or similar directory ?

Nope, no lib64 directory or anything like that, just lib directory.


I didn't notice earlier that you are running the postgres 9 beta.

Yep, Im running PostgreSQL 9.0 beta2

Regards
Bidski



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs