Re: [GENERAL] grep -f keyword data query

Hiroyuki Sato Sun, 27 Dec 2015 18:02:38 -0800

Hello Tom.

Thank you for replying.


I re-created index with pg_trgm.
Execution time is 210sec.
Yes It is faster than btree index. But still slow.
It is possible to improve this query speed?
Should I use another query or idex?

Best regards.

1, query

    SELECT
      u.url
    FROM
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';


2, create table (gist grgm_ops)

    drop table if exists url_lists4;
    create table url_lists4 (
      id int not null primary key,
      url text not null
    );
    --create index ix_url_url_lists4 on url_lists4(url);
    --
    -- change index
    --
    create index ix_url_url_lists4 on url_lists4 using gist(url
gist_trgm_ops);

    drop table if exists keywords4;
    create table keywords4 (
      id int not null primary key,
      name varchar(40) not null,
      url text not null
    );

    create index ix_url_keywords4 on keywords4(url);
    create index ix_name_keywords4 on keywords4(name);


    \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
    \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';

    vacuum url_lists4;
    vacuum keywords4;
    analyze url_lists4;
    analyze keywords4;

3, EXPLAIN

                                               QUERY PLAN


-------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.41..432684.50 rows=12500000 width=57)
       ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
             Filter: ((name)::text = 'esc_url'::text)
       ->  Index Scan using ix_url_url_lists4 on url_lists4 u
 (cost=0.41..61.52 rows=2500 width=57)
             Index Cond: (url ~~ k.url)
    (5 rows)

4, EXPLAIN ANALYZE

 QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.41..432684.50 rows=12500000 width=57) (actual
time=1048.428..210641.655 rows=4850 loops=1)
       ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
(actual time=0.033..28.878 rows=5000 loops=1)
             Filter: ((name)::text = 'esc_url'::text)
       ->  Index Scan using ix_url_url_lists4 on url_lists4 u
 (cost=0.41..61.52 rows=2500 width=57) (actual time=41.426..42.115 rows=1
loops=5000)
             Index Cond: (url ~~ k.url)
             Rows Removed by Index Recheck: 0
     Planning time: 0.138 ms
     Execution time: 210642.896 ms
    (8 rows)


2015年12月28日(月) 9:36 Tom Lane <t...@sss.pgh.pa.us>:

> Hiroyuki Sato <hiroys...@gmail.com> writes:
> > Sorry, I re-created my questions. I was mis-pasted query log on previous
> > question.
> > (@~ operator is PGroonga extension (http://pgroonga.github.io))
> > [ "like" is actually the operator of interest ]
>
> Ah.  You might get some good results with trigram indexes (ie,
> contrib/pg_trgm and a GIST or GIN index).  See
>
> http://www.postgresql.org/docs/9.4/static/pgtrgm.html
>
>                         regards, tom lane
>

Re: [GENERAL] grep -f keyword data query

Reply via email to