Try 9.4 and you'll surprise. 1. GIN has compression 2. GIN has fast scan feature.
Oleg On Mon, Jun 22, 2015 at 7:51 AM, Christian Ramseyer <r...@networkz.ch> wrote: > Hi > > I have a pretty large table with syslog messages. > > It is already partitioned by month, and for a single month I have e.g. > > > DM=# \d+ logs_01 > > Column | Type | > --------------+-----------------------------+ > host | character varying(255) | > facility | character varying(10) | > priority | character varying(10) | > tag | character varying(255) | > log_date | timestamp without time zone | > program | character varying(255) | > msg | text | > seq | bigint | > > Indexes: > "logs_01_pkey" PRIMARY KEY, btree (seq) > "idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops) > "logs_01_date_index" btree (log_date) > "tridx_logs_01_msg" gin (msg gin_trgm_ops) > > > DM=# select count(*) from logs_01; > count > ---------- > 83052864 > > > I'd like to provide a fast "like %x%" search on the msg column, hence I > added a trigram based gin index on it. It is around 60 GB on the 35 GB > table: > > DM=# select count(*) from logs_01; > count > ---------- > 83052864 > > > DM=# \dt+ logs_01 > List of relations > Schema | Name | Type | Owner | Size | Description > --------+---------+-------+----------+-------+------------- > public | logs_01 | table | postgres | 35 GB | > > DM=# \di+ tridx_logs_01_msg > List of relations > Schema | Name | Type | Owner | Table | Size | > Description > > --------+-------------------+-------+----------+---------+-------+------------- > public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB | > > > A typical query on this table looks like this: > > explain analyze > select log_date, host, msg > from logs_01 as log where log.msg like '%192.23.33.177%' > and log.log_date >= '2015-1-18 1:45:24' > and log.log_date <= '2015-1-19 1:45:24' > order by log_date asc offset 200 limit 50; > > > It yields a promising explain that shows that the index is used, but > unfortunately the bitmap index scan on the GIN index takes quite long (40 > seconds) > > > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=34510.06..34510.06 rows=1 width=195) (actual > time=42971.002..42971.015 rows=50 loops=1) > -> Sort (cost=34509.75..34510.06 rows=124 width=195) (actual > time=42970.960..42970.990 rows=250 loops=1) > Sort Key: log_date > Sort Method: top-N heapsort Memory: 152kB > -> Bitmap Heap Scan on logs_01 log (cost=34009.21..34505.44 > rows=124 width=195) (actual time=42963.969..42969.725 rows=2472 loops=1) > Recheck Cond: ((msg ~~ '%192.23.33.177%'::text) AND > (log_date >= '2015-01-18 01:45:24'::timestamp without time zone) AND > (log_date <= '2015-01-19 01:45:24'::timestamp without time zone)) > -> BitmapAnd (cost=34009.21..34009.21 rows=124 width=0) > (actual time=42962.562..42962.562 rows=0 loops=1) > -> Bitmap Index Scan on tridx_logs_01_msg > (cost=0.00..6992.15 rows=8020 width=0) (actual time=42731.145..42731.145 > rows=168489 loops=1) > Index Cond: (msg ~~ '%192.23.33.177%'::text) > -> Bitmap Index Scan on logs_01_date_index > (cost=0.00..27016.75 rows=1287939 width=0) (actual time=180.055..180.055 > rows=1173048 loops=1) > Index Cond: ((log_date >= '2015-01-18 > 01:45:24'::timestamp without time zone) AND (log_date <= '2015-01-19 > 01:45:24'::timestamp without time zone)) > Total runtime: 42971.137 ms > > (also on http://explain.depesz.com/s/KpaB) > > > Any good ideas on how I could speed this up a bit? > > I have already tried to throw quite a bunch of memory at the problem: > > shared_buffers = 64GB > work_mem = 16GB > > but it didn't improve between this and the 32GB shared/ 2GB work GB I had > before. > > This is on Postgres 9.1.15 on Linux. > > Thanks > Christian > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >