Re: [GENERAL] How to speed up pg_trgm / gin index scan

Oleg Bartunov Mon, 22 Jun 2015 16:32:03 -0700

Try 9.4 and you'll surprise.

1. GIN has compression
2. GIN has fast scan feature.


Oleg

On Mon, Jun 22, 2015 at 7:51 AM, Christian Ramseyer <r...@networkz.ch> wrote:

> Hi
>
> I have a pretty large table with syslog messages.
>
> It is already partitioned by month, and for a single month I have e.g.
>
>
> DM=# \d+ logs_01
>
>     Column    |            Type             |
> --------------+-----------------------------+
>  host         | character varying(255)      |
>  facility     | character varying(10)       |
>  priority     | character varying(10)       |
>  tag          | character varying(255)      |
>  log_date     | timestamp without time zone |
>  program      | character varying(255)      |
>  msg          | text                        |
>  seq          | bigint                      |
>
> Indexes:
>     "logs_01_pkey" PRIMARY KEY, btree (seq)
>     "idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops)
>     "logs_01_date_index" btree (log_date)
>     "tridx_logs_01_msg" gin (msg gin_trgm_ops)
>
>
> DM=# select count(*) from logs_01;
>   count
> ----------
>  83052864
>
>
> I'd like to provide a fast "like %x%" search on the msg column, hence I
> added a trigram based gin index on it. It is around 60 GB on the 35 GB
> table:
>
> DM=# select count(*) from logs_01;
>   count
> ----------
>  83052864
>
>
> DM=# \dt+ logs_01
>                      List of relations
>  Schema |  Name   | Type  |  Owner   | Size  | Description
> --------+---------+-------+----------+-------+-------------
>  public | logs_01 | table | postgres | 35 GB |
>
> DM=# \di+ tridx_logs_01_msg
>                                List of relations
>  Schema |       Name        | Type  |  Owner   |  Table  | Size  |
> Description
>
> --------+-------------------+-------+----------+---------+-------+-------------
>  public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB |
>
>
> A typical query on this table looks like this:
>
> explain analyze
> select log_date, host, msg
> from logs_01 as log   where  log.msg like '%192.23.33.177%'
>     and log.log_date >= '2015-1-18 1:45:24'
>     and log.log_date <= '2015-1-19 1:45:24'
>     order by log_date asc offset 200 limit 50;
>
>
> It yields a promising explain that shows that the index is used, but
> unfortunately the bitmap index scan on the GIN index takes quite long (40
> seconds)
>
>
>                       QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=34510.06..34510.06 rows=1 width=195) (actual
> time=42971.002..42971.015 rows=50 loops=1)
>    ->  Sort  (cost=34509.75..34510.06 rows=124 width=195) (actual
> time=42970.960..42970.990 rows=250 loops=1)
>          Sort Key: log_date
>          Sort Method: top-N heapsort  Memory: 152kB
>          ->  Bitmap Heap Scan on logs_01 log  (cost=34009.21..34505.44
> rows=124 width=195) (actual time=42963.969..42969.725 rows=2472 loops=1)
>                Recheck Cond: ((msg ~~ '%192.23.33.177%'::text) AND
> (log_date >= '2015-01-18 01:45:24'::timestamp without time zone) AND
> (log_date <= '2015-01-19 01:45:24'::timestamp without time zone))
>                ->  BitmapAnd  (cost=34009.21..34009.21 rows=124 width=0)
> (actual time=42962.562..42962.562 rows=0 loops=1)
>                      ->  Bitmap Index Scan on tridx_logs_01_msg
> (cost=0.00..6992.15 rows=8020 width=0) (actual time=42731.145..42731.145
> rows=168489 loops=1)
>                            Index Cond: (msg ~~ '%192.23.33.177%'::text)
>                      ->  Bitmap Index Scan on logs_01_date_index
> (cost=0.00..27016.75 rows=1287939 width=0) (actual time=180.055..180.055
> rows=1173048 loops=1)
>                            Index Cond: ((log_date >= '2015-01-18
> 01:45:24'::timestamp without time zone) AND (log_date <= '2015-01-19
> 01:45:24'::timestamp without time zone))
>  Total runtime: 42971.137 ms
>
> (also on http://explain.depesz.com/s/KpaB)
>
>
> Any good ideas on how I could speed this up a bit?
>
> I have already tried to throw quite a bunch of memory at the problem:
>
> shared_buffers = 64GB
> work_mem = 16GB
>
> but it didn't improve between this and the 32GB shared/ 2GB work GB I had
> before.
>
> This is on Postgres 9.1.15 on Linux.
>
> Thanks
> Christian
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] How to speed up pg_trgm / gin index scan

Reply via email to