[PERFORM] updating statistics on slow running query

2014-11-10 Thread Eric Ramirez
Hi,
I have created a sample database with test data to help benchmark our
application. The database has ten million records, and is running on a
dedicated server(postgres 9.3) with 8GB of RAM.  Our queries are pretty
slow with this amount of data and is my job to get them to run to at
acceptable speed. First thing that I notice was that the planner's row
estimates are off by a large number or records (millions) I have updated
the statistics target but didn't seem to make a difference. The relevant
output follows.
Am I looking in the wrong place, something else I should be trying?
Thanks in advance for your comments/suggestions,
Eric.


=# show work_mem;
 work_mem
--
 1GB
(1 row)
=# show effective_cache_size;
 effective_cache_size
--
 5GB
(1 row)

=#ALTER TABLE TAR_MVW_TARGETING_RECORD ALTER COLUMN
household_member_first_name SET STATISTICS 5000;
=# vacuum analyse TAR_MVW_TARGETING_RECORD;

=# \d tar_mvw_targeting_record;
 Table "public.tar_mvw_targeting_record"
   Column| Type  | Modifiers
-+---+---
 household_member_id | bigint|
 form_id | bigint|
 status  | character varying(64) |
 gender  | character varying(64) |
 household_member_first_name | character varying(64) |
 household_member_last_name  | character varying(64) |

Indexes:
   "tar_mvw_targeting_record_form_id_household_member_id_idx" UNIQUE, btree
(form_id, household_member_id)
 "tar_mvw_targeting_record_lower_idx" gist
(lower(household_member_first_name::text) extensions.gist_trgm_ops)
 WHERE status::text <> 'ANULLED'::text
"tar_mvw_targeting_record_lower_idx1" gist
(lower(household_member_last_name::text) extensions.gist_trgm_ops)
 WHERE status::text <> 'ANULLED'::text


=# explain (analyse on,buffers on)select T.form_id from
TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED')  AND
LOWER(T.household_member_last_name) LIKE LOWER('%tu%') AND
T.gender='FEMALE' group by T.form_id;

QUERY PLAN

---
---
 HashAggregate  (cost=450994.35..452834.96 rows=184061 width=8) (actual
time=11932.959..12061.206 rows=442453 loops=1)
   Buffers: shared hit=307404 read=109743
   ->  Bitmap Heap Scan on tar_mvw_targeting_record t
(cost=110866.33..448495.37 rows=999592 width=8) (actual
time=3577.301..11629.132 row
s=500373 loops=1)
 Recheck Cond: ((lower((household_member_last_name)::text) ~~
'%tu%'::text) AND ((status)::text <> 'ANULLED'::text))
 Rows Removed by Index Recheck: 979
 Filter: ((gender)::text = 'FEMALE'::text)
 Rows Removed by Filter: 499560
 Buffers: shared hit=307404 read=109743
 ->  Bitmap Index Scan on tar_mvw_targeting_record_lower_idx1
(cost=0.00..110616.43 rows=202 width=0) (actual time=3471.142..3
471.142 rows=1012 loops=1)
   Index Cond: (lower((household_member_last_name)::text) ~~
'%tu%'::text)
   Buffers: shared hit=36583 read=82935
 Total runtime: 12092.059 ms
(12 rows)

Time: 12093.107 ms

p.s. this plan was ran three times, first time took 74 seconds.


Re: [PERFORM] updating statistics on slow running query

2014-11-10 Thread Eric Ramirez
Hi Matteo,
Thanks for your suggestions, I just run some test with ILIKE and LIKE, and
ILIKE is consistently slower so I think I will keep the Lower  functions.
As per your suggestion,  I have switched indexes to use GIN type index,
they seem to build/read a bit faster, still the Recheck task continues to
happen in the query plan though. I have removed the Gender column from the
query since is not relevant in my tests. With all this playing around it
looks like the stats are now a bit more accurate.
The query went down to 9 seconds, ideally I would like to get to execute in
2 seconds..., any thoughts on what else I could try?
Thanks again,
Eric

=# explain (analyse on,buffers on)select T.form_id from
TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED')  AND
LOWER(T.household_member_last_name) LIKE LOWER('%tu%')  group by T.form_id;

QUERY PLAN

---
---
 HashAggregate  (cost=557677.27..561360.83 rows=368356 width=8) (actual
time=10172.672..10410.068 rows=786669 loops=1)
   Buffers: shared hit=304998
   ->  Bitmap Heap Scan on tar_mvw_targeting_record t
(cost=80048.06..552677.27 rows=202 width=8) (actual
time=2481.418..9564.280 rows
=33 loops=1)
 Recheck Cond: ((status)::text <> 'ANULLED'::text)
 Filter: (lower((household_member_last_name)::text) ~~ '%tu%'::text)
 Rows Removed by Filter: 979
 Buffers: shared hit=304998
 ->  Bitmap Index Scan on tar_mvw_targeting_record_lower_idx4
(cost=0.00..79548.06 rows=1012 width=0) (actual time=2375.399..2
375.399 rows=1012 loops=1)
   Buffers: shared hit=7369
 Total runtime: 10475.240 ms




On Mon, Nov 10, 2014 at 1:57 PM, desmodemone  wrote:

>
>
> 2014-11-10 18:43 GMT+01:00 Eric Ramirez :
>
>>
>> Hi,
>> I have created a sample database with test data to help benchmark our
>> application. The database has ten million records, and is running on a
>> dedicated server(postgres 9.3) with 8GB of RAM.  Our queries are pretty
>> slow with this amount of data and is my job to get them to run to at
>> acceptable speed. First thing that I notice was that the planner's row
>> estimates are off by a large number or records (millions) I have updated
>> the statistics target but didn't seem to make a difference. The relevant
>> output follows.
>> Am I looking in the wrong place, something else I should be trying?
>> Thanks in advance for your comments/suggestions,
>> Eric.
>>
>>
>> =# show work_mem;
>>  work_mem
>> --
>>  1GB
>> (1 row)
>> =# show effective_cache_size;
>>  effective_cache_size
>> --
>>  5GB
>> (1 row)
>>
>> =#ALTER TABLE TAR_MVW_TARGETING_RECORD ALTER COLUMN
>> household_member_first_name SET STATISTICS 5000;
>> =# vacuum analyse TAR_MVW_TARGETING_RECORD;
>>
>> =# \d tar_mvw_targeting_record;
>>  Table "public.tar_mvw_targeting_record"
>>Column| Type  | Modifiers
>> -+---+---
>>  household_member_id | bigint|
>>  form_id | bigint|
>>  status  | character varying(64) |
>>  gender  | character varying(64) |
>>  household_member_first_name | character varying(64) |
>>  household_member_last_name  | character varying(64) |
>>
>> Indexes:
>>"tar_mvw_targeting_record_form_id_household_member_id_idx" UNIQUE,
>> btree (form_id, household_member_id)
>>  "tar_mvw_targeting_record_lower_idx" gist
>> (lower(household_member_first_name::text) extensions.gist_trgm_ops)
>>  WHERE status::text <> 'ANULLED'::text
>> "tar_mvw_targeting_record_lower_idx1" gist
>> (lower(household_member_last_name::text) extensions.gist_trgm_ops)
>>  WHERE status::text <> 'ANULLED'::text
>>
>>
>> =# explain (analyse on,buffers on)select T.form_id from
>> TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED')  AND
>> LOWER(T.household_member_last_name) LIKE LOWER('%tu%') AND
>> T.gender='FEMALE' group by T.form_id;
>>
>> QUERY PLAN
>>
>>
>> ---
>> ---
>>  HashAggregate  (cost=450994.35..452834.96 rows=1840