The following bug has been logged on the website: Bug reference: 6375 Logged by: Valentine Gogichashvili Email address: val...@gmail.com PostgreSQL version: 9.1.1 Operating system: Debian 4.4.5-8 Description:
Hello, default tsearch parser does not recognize all valid email addresses and tokenizes them as text, splitting into tokens. For example: postgres=# select to_tsquery('simple', 'nor...@email.com' ); to_tsquery ──────────────────── 'nor...@email.com' (1 row) here it behaves ok; postgres=# select to_tsquery('simple', '-still-nor...@email.com' ); to_tsquery ────────────────────────── 'still-nor...@email.com' (1 row) here it trims '-' from the beginning of an email. This is not correct, but will at least find that email. postgres=# select to_tsquery('simple', '-not-normal-with-da...@email.com' ); to_tsquery ─────────────────────────────────────────────────────────────────────────────── 'not-normal-with-dash' & 'not' & 'normal' & 'with' & 'dash' & 'email.com' (1 row) and this is now a real problem as it leads to finding emails that are not the same, but are "super-sets" of that one. Valid email characters, that are not correctly treated also are at least '+' and '.' With my best regards, -- Valentine Gogichashvili -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs