Andrew, All, > On 5/22/07, Andrew Dunstan <[EMAIL PROTECTED]> wrote: > > But before I commit this I'd appreciate seeing some more testing, both > > for correctness and performance.
I finally found some time to test this patch on our data. As our production database is still using 8.1, I made my tests with 8.1.10 and 8.3devel. As I had very weird results, I tested also 8.2.5. The patch seems to work as expected in my locale. I didn't notice problems during the tests I made except for the performance problem I describe below. The box is a recent dual core box using CentOS 5. It's a test box installed specifically to test PostgreSQL 8.3. Every version is compiled with the same compiler. Locale is fr_FR.UTF-8 and database is UTF-8 too. The table used to make the tests fits entirely in RAM. I tested a simple ILIKE query on our data with 8.3devel and it was far slower than with 8.1.10 (2 times slower). It was obviously not the expected result as it should have been faster considering your work. So I decided to test also with 8.2.5 and it seems a performance regression was introduced in 8.2 (and not in 8.3 which is in fact a bit faster than 8.2). I saw this item in 8.2 release notes: Allow ILIKE to work for multi-byte encodings (Tom) Internally, ILIKE now calls lower() and then uses LIKE. Locale-specific regular expression patterns still do not work in these encodings. Could it be responsible of such a slow down? I attached the results of my tests. If anyone needs more information, I'll be glad to provide them. Regards, -- Guillaume
** Environment ** cityvox=# select version(); version ----------------------------------------------------------------------------------------------------- PostgreSQL 8.1.10 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) (1 row) cityvox=# show lc_collate; lc_collate ------------- fr_FR.UTF-8 (1 row) cityvox=# show lc_ctype; lc_ctype ------------- fr_FR.UTF-8 (1 row) cityvox=# \l List of databases Name | Owner | Encoding -----------+----------+---------- cityvox | postgres | UTF8 postgres | postgres | UTF8 template0 | postgres | UTF8 template1 | postgres | UTF8 (4 rows) cityvox=# show shared_buffers; shared_buffers ---------------- 16384 (1 row) cityvox=# show work_mem; work_mem ---------- 32768 (1 row) ** Seqscan on the table ** cityvox=# select count(*) from evenement; count -------- 128780 (1 row) Time: 57.335 ms cityvox=# select count(*) from evenement; count -------- 128780 (1 row) Time: 57.317 ms ** Query with LIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus pocus%' OR e.mots_cle LIKE '%hocus pocus%'; numeve -------- (0 rows) Time: 188.312 ms cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus pocus%' OR e.mots_cle LIKE '%hocus pocus%'; numeve -------- (0 rows) Time: 188.235 ms ** Query with ILIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 161108 (3 rows) Time: 227.048 ms cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 161108 (3 rows) Time: 226.586 ms cityvox=# EXPLAIN ANALYZE SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on evenement e (cost=0.00..6743.01 rows=1 width=4) (actual time=45.907..226.702 rows=3 loops=1) Filter: (((libgeseve)::text ~~* '%hocus pocus%'::text) OR ((mots_cle)::text ~~* '%hocus pocus%'::text)) Total runtime: 226.736 ms (3 rows) Time: 227.216 ms ** Query with only one condition with ILIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 (2 rows) Time: 177.318 ms
** Environment ** cityvox=# select version(); version ---------------------------------------------------------------------------------------------------- PostgreSQL 8.2.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) (1 row) cityvox=# show lc_collate; lc_collate ------------- fr_FR.UTF-8 (1 row) cityvox=# show lc_ctype; lc_ctype ------------- fr_FR.UTF-8 (1 row) cityvox=# \l List of databases Name | Owner | Encoding -----------+----------+---------- cityvox | postgres | UTF8 postgres | postgres | UTF8 template0 | postgres | UTF8 template1 | postgres | UTF8 (4 rows) cityvox=# show shared_buffers ; shared_buffers ---------------- 128MB (1 row) cityvox=# show work_mem; work_mem ---------- 32MB (1 row) ** Seqscan on the table ** cityvox=# select count(*) from evenement; count -------- 128780 (1 row) Time: 34.524 ms cityvox=# select count(*) from evenement; count -------- 128780 (1 row) Time: 34.509 ms ** Query with LIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus pocus%' OR e.mots_cle LIKE '%hocus pocus%'; numeve -------- (0 rows) Time: 180.982 ms cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus pocus%' OR e.mots_cle LIKE '%hocus pocus%'; numeve -------- (0 rows) Time: 164.684 ms ** Query with ILIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 161108 (3 rows) Time: 617.654 ms cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 161108 (3 rows) Time: 617.568 ms cityvox=# EXPLAIN ANALYZE SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on evenement e (cost=0.00..6743.70 rows=1 width=4) (actual time=125.531..614.073 rows=3 loops=1) Filter: (((libgeseve)::text ~~* '%hocus pocus%'::text) OR ((mots_cle)::text ~~* '%hocus pocus%'::text)) Total runtime: 614.105 ms (3 rows) ** Query with only one condition with ILIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 (2 rows) Time: 411.671 ms
** Environment ** cityvox=# select version(); version ------------------------------------------------------------------------------------------------------- PostgreSQL 8.3devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) (1 row) cityvox=# show lc_collate; lc_collate ------------- fr_FR.UTF-8 (1 row) cityvox=# show lc_ctype; lc_ctype ------------- fr_FR.UTF-8 (1 row) cityvox=# \l List of databases Name | Owner | Encoding -----------+----------+---------- cityvox | postgres | UTF8 postgres | postgres | UTF8 template0 | postgres | UTF8 template1 | postgres | UTF8 (4 rows) cityvox=# show shared_buffers ; shared_buffers ---------------- 128MB (1 row) cityvox=# show work_mem ; work_mem ---------- 32MB (1 row) ** Seqscan on the table ** cityvox=# select count(*) from evenement; count -------- 128780 (1 row) Time: 33.181 ms cityvox=# select count(*) from evenement; count -------- 128780 (1 row) Time: 33.152 ms ** Query with LIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus pocus%' OR e.mots_cle LIKE '%hocus pocus%'; numeve -------- (0 rows) Time: 148.927 ms cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus pocus%' OR e.mots_cle LIKE '%hocus pocus%'; numeve -------- (0 rows) Time: 148.931 ms ** Query with ILIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 161108 (3 rows) Time: 597.008 ms cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 161108 (3 rows) Time: 597.340 ms cityvox=# EXPLAIN ANALYZE SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on evenement e (cost=0.00..5821.35 rows=1 width=4) (actual time=122.567..598.229 rows=3 loops=1) Filter: (((libgeseve)::text ~~* '%hocus pocus%'::text) OR ((mots_cle)::text ~~* '%hocus pocus%'::text)) Total runtime: 598.263 ms (3 rows) Time: 598.899 ms ** Query with only one condition with ILIKE ** cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus pocus%'; numeve ----------- 900024298 87578 (2 rows) Time: 399.534 ms
---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster