> On 15 Sep 2016, at 14:46, Jaap Roes <jr...@leukeleu.nl> wrote:
> 
> I've got a table that stores some multilingual content:
> 
>    CREATE TABLE search (
>      content text NOT NULL,
>      language regconfig NOT NULL,
>      fulltext tsvector
>    );
>    CREATE INDEX search_fulltext ON search USING GIN(fulltext);
...
> To make sure I always search in the correct language I use these queries:
> 
>    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
>    (1 row)
> 
>    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
>    (1 row)
> 
> Because hardcoding the language doesn't give the correct results:
> 
>    SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
>    (0 rows)
> 
>    SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
>    (0 rows)
> 
> The problem however is that PostgreSQL doesn't use the GIN index when using 
> the first set of queries and instead does a sequential scan:
...
>    EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ 
> to_tsquery(language, 'shine’);
> 
>    Seq Scan on search  (cost=0.00..17.35 rows=2 width=136) (actual 
> time=0.040..0.044 rows=1 loops=1)
>        Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
>        Rows Removed by Filter: 5
>    Planning time: 0.039 ms
>    Execution time: 0.064 ms
> 
> While it does when hardcoding a language:
> 
>    EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 
> 'vlieg');
>    Bitmap Heap Scan on search  (cost=12.63..23.66 rows=82 width=0) (actual 
> time=0.044..0.044 rows=1 loops=1)
>      Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
>      Heap Blocks: exact=1
>        ->  Bitmap Index Scan on search_fulltext  (cost=0.00..12.61 rows=82 
> width=0) (actual time=0.037..0.037 rows=1 loops=1)
>          Index Cond: (fulltext @@ '''vlieg'''::tsquery)
>    Planning time: 0.128 ms
>    Execution time: 0.065 ms
> 
> So my question is: Is it at all possible to use a column in the ts_query to 
> use the correct language config and still have Postgres use the GIN index?

So I came up with a solution. Pre-localise the query in a join and use that to 
filter the matches:

    SELECT * FROM search s 
    INNER JOIN (
        SELECT 'dutch'::regconfig AS language, to_tsquery('dutch', 'shine') as q
        UNION SELECT 'english'::regconfig AS language, to_tsquery('english', 
'shine') as q
        UNION SELECT 'simple'::regconfig AS language, to_tsquery('simple', 
'shine') as q
    ) q ON (s.language=q.language) 
    WHERE fulltext @@ q;

This seems to work, but the query plan looks a bit confusing, so I’m not super 
confident about the correctness:

    Nested Loop  (cost=205.44..1327.12 rows=188 width=1590) (actual 
time=3.350..7.010 rows=16 loops=1)
      ->  Unique  (cost=0.08..0.11 rows=3 width=0) (actual time=0.010..0.021 
rows=3 loops=1)
            ->  Sort  (cost=0.08..0.09 rows=3 width=0) (actual 
time=0.008..0.011 rows=3 loops=1)
                  Sort Key: ('dutch'::regconfig), ('''vlieg'''::tsquery)
                  Sort Method: quicksort  Memory: 25kB
                  ->  Append  (cost=0.00..0.06 rows=3 width=0) (actual 
time=0.001..0.002 rows=3 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.001..0.001 rows=1 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.001..0.001 rows=1 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.000..0.000 rows=1 loops=1)
      ->  Bitmap Heap Scan on search s  (cost=205.36..441.70 rows=63 
width=1554) (actual time=2.310..2.318 rows=5 loops=3)
            Recheck Cond: ((fulltext @@ ('''vlieg'''::tsquery)) AND 
((language)::oid = (('dutch'::regconfig))::oid))
            Heap Blocks: exact=16
            ->  BitmapAnd  (cost=205.36..205.36 rows=63 width=0) (actual 
time=2.303..2.303 rows=0 loops=3)
                  ->  Bitmap Index Scan on search_fulltext  (cost=0.00..17.41 
rows=188 width=0) (actual time=0.018..0.018 rows=16 loops=3)
                        Index Cond: (fulltext @@ ('''vlieg'''::tsquery))
                  ->  Bitmap Index Scan on search_language  (cost=0.00..187.67 
rows=12539 width=0) (actual time=2.277..2.277 rows=12539 loops=3)
                        Index Cond: ((language_config)::oid = 
(('dutch'::regconfig))::oid)
    Planning time: 0.228 ms
    Execution time: 7.058 ms

Is this the way to go? Or is there a better way, I’m eager to find out!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to