I've got a table that stores some multilingual content:

    CREATE TABLE search (
      content text NOT NULL,
      language regconfig NOT NULL,
      fulltext tsvector
    );
    CREATE INDEX search_fulltext ON search USING GIN(fulltext);

    INSERT INTO search (language, content) VALUES  
      ('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen vliegen 
achterna'),
      ('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali op 
zijn kale koeli-kop.'),
      ('dutch', 'Moeder sneed zeven scheve sneden brood'),
      ('english', 'I saw Susie sitting in a shoe shine shop. Where she sits she 
shines, and where she shines she sits.'),
      ('english', 'How can a clam cram in a clean cream can?'),
      ('english', 'Can you can a can as a canner can can a can?');

    UPDATE search SET fulltext = to_tsvector(language, content);

To make sure I always search in the correct language I use these queries:

    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
    (1 row)

    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
    (1 row)

Because hardcoding the language doesn't give the correct results:

    SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
    (0 rows)

    SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
    (0 rows)

The problem however is that PostgreSQL doesn't use the GIN index when using the 
first set of queries and instead does a sequential scan:

(Note: I've disabled scanning using SET enable_seqscan = OFF; for these 
examples because of the low amount of rows)

    EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ to_tsquery(language, 
'shine’);

    Seq Scan on search  (cost=0.00..17.35 rows=2 width=136) (actual 
time=0.040..0.044 rows=1 loops=1)
        Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
        Rows Removed by Filter: 5
    Planning time: 0.039 ms
    Execution time: 0.064 ms

While it does when hardcoding a language:

    EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 
'vlieg');
    Bitmap Heap Scan on search  (cost=12.63..23.66 rows=82 width=0) (actual 
time=0.044..0.044 rows=1 loops=1)
      Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
      Heap Blocks: exact=1
        ->  Bitmap Index Scan on search_fulltext  (cost=0.00..12.61 rows=82 
width=0) (actual time=0.037..0.037 rows=1 loops=1)
          Index Cond: (fulltext @@ '''vlieg'''::tsquery)
    Planning time: 0.128 ms
    Execution time: 0.065 ms

So my question is: Is it at all possible to use a column in the ts_query to use 
the correct language config and still have Postgres use the GIN index?

I’ve tried this on both PostgreSQL 9.4 and 9.5.

Obviously the real table has a lot more rows, so here's the execution plan from 
the real table:

Using a column for language config:

    Seq Scan on search  (cost=0.00..8727.25 rows=188 width=0) (actual 
time=0.725..352.307 rows=1689 loops=1)
      Filter: (fulltext @@ to_tsquery(language_config, 'example'::text))
      Rows Removed by Filter: 35928
    Planning time: 0.053 ms
    Execution time: 352.915 ms

When hardcoding the language:

    Bitmap Heap Scan on search  (cost=28.65..4088.92 rows=1633 width=0) (actual 
time=0.514..10.475 rows=1684 loops=1)
      Recheck Cond: (fulltext @@ '''exampl'''::tsquery)
      Heap Blocks: exact=1522  
        ->  Bitmap Index Scan on search_fulltext  (cost=0.00..28.24 rows=1633 
width=0) (actual time=0.333..0.333 rows=1684 loops=1)        
          Index Cond: (fulltext @@ '''exampl'''::tsquery)
    Planning time: 0.180 ms
    Execution time: 10.564 ms

Note: I previously asked this on stackexhange 
(http://dba.stackexchange.com/questions/149765/postgresql-gin-index-not-used-when-ts-query-language-is-fetched-from-a-column)
 but I’m assuming there are more knowledgeable people on this mailing list ;-)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to