Re: Postgresql RDS DB Latency Chossing Hash join Plan

github kran Thu, 14 Feb 2019 14:11:53 -0800

On Thu, Feb 14, 2019 at 12:43 PM Michael Lewis <mle...@entrata.com> wrote:


> How many total rows in these tables? I am assuming these are partitions
> and those 100 million rows got distributed. If the data difference is
> significant, then you aren't guaranteed similar performance. You may want
> to follow more of the suggested steps on.
>
> https://wiki.postgresql.org/wiki/SlowQueryQuestions
>
>
> *Michael Lewis  |  Software Engineer*
> *Entrata*
> *c: **619.370.8697 <619-370-8697>*
>
> Michael - Yes correct the data of 100 million rows is distributed to all
the partitions.
FInally I feel we have come to conclusion after we changed the
random_page_cost from 4 to 2 in Non prod instance and we see improvements
in the query which use to take from 1 minute to 1 -2 seconds.
That's correct we have around 490 million rows in few of our partition
tables. The partition tables are created for every 7 days.

We ran our API test which hits the backend database Aurora RDS PostgreSQL
and see our query response times , requests/sec are better than before. Do
you recommend this setting on a Production instance? Right now we are
planning to go
implement this option of random_page_cost to 2. ( We know it also works if
we turn off the hash_join but we dont want to implement this change but
rather use random_page_cost to 2).

Questions.
1) What is your recommendation on this ? Can we modify this change on Prod
instance which is performing better today or only keep this change to Non
prod instance ?. ( Actually we want to implement this change on Non Prod
instance)
2) Are there are any downside in changing this value ?.
3) Also another question we have a 64TB storage and running on r4 16 x
large EC2 instance. Few of our partitions are bigger and few of them are
smaller. We have data from last  9 months and we are planning to keep the
data for about
close to 1 year till May. Do you see any concerns on this ?. Eventually we
are thinking to archive this data in next 2 months by dropping of older
partitions.
3) What could be the problems of keeping the data longer if there is a
regular maintenance like  VACUUM and other maintenace activities



>
> On Thu, Feb 14, 2019 at 8:48 AM github kran <githubk...@gmail.com> wrote:
>
>>
>>
>> On Wed, Feb 13, 2019 at 11:38 AM Michael Lewis <mle...@entrata.com>
>> wrote:
>>
>>> I didn't see your email yesterday, sorry about that. Index scans instead
>>> of sequential scans and nested loop instead of hash join means that you
>>> have bad row count estimates on "Non prod Aurora RDS instance" as far as I
>>> can figure. Have you run commands like-
>>>
>>> analyze asset_info_2019_2_part4;
>>> analyze asset_info_2019_2_part2;
>>>
>>> etc? If data are very similar, indexes all exist, and
>>> default_statistics_target are the same, then you should be getting the same
>>> plans.
>>>
>>>
>>> *Michael Lewis  |  Software Engineer*
>>> *Entrata*
>>> *c: **619.370.8697 <619-370-8697>*
>>>
>>> 1)  Yes did the VACUUM for all the tables like asset_info_2019_2_part1,
>> part2 , part4 and also for location_info_xxx  to remove any dead tuples and
>> also rebuilt the indexes..
>> 2)   REINDEX table  location_data_2018_12_part4;( Like wise ran for all
>> the tables and also VACUUM for all the tables).
>> 3)  The data in Non prod instance is more.  One thing to mention here
>> when we built the Non prod instance we copied SNAPSHOT from Prod instance
>> and on top of that inserted data about 100 million rows and then did VACUUM
>> and re-indexed the tables.
>>
>> I cant think of anything we can do here but let us know if you need any
>> more details on this problem. Iam happy to share more details.
>>
>>
>>>
>>> On Wed, Feb 13, 2019 at 8:49 AM github kran <githubk...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Feb 12, 2019 at 12:55 PM github kran <githubk...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 12, 2019 at 12:33 PM Michael Lewis <mle...@entrata.com>
>>>>> wrote:
>>>>>
>>>>>> Did you update the stats by running ANALYZE on the tables involved,
>>>>>> or perhaps the entire database on the 'Non prod Aurora RDS instance'? Can
>>>>>> you share the two execution plans?
>>>>>>
>>>>>> *Michael Lewis  |  Software Engineer*
>>>>>> *Entrata*
>>>>>> *c: **619.370.8697 <619-370-8697>*
>>>>>>
>>>>>
>>>>> Here is the plan for both of the DB instances.
>>>>>
>>>>
>>>> Michael - Did you get any chance to look at this issue. Also we see
>>>> there is a sequential scan being done instead of index scan.
>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Feb 12, 2019 at 11:27 AM github kran <githubk...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 12, 2019 at 7:07 AM github kran <githubk...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 11, 2019 at 6:00 PM github kran <githubk...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis <mle...@entrata.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Are default statistics target the same on both prod and AWS? Have
>>>>>>>>>> you analyzed all tables being used in this query to ensure stats are 
>>>>>>>>>> up
>>>>>>>>>> proper? If the optimizer is choosing a different plan, then the 
>>>>>>>>>> stats must
>>>>>>>>>> be different IMO.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Michael Lewis |  Software Engineer*
>>>>>>>>>> *Entrata*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for your reply  I have verified few of the tables and their
>>>>>>>>> default statistics target and they seem to be same but is there 
>>>>>>>>> anything in
>>>>>>>>> particular you want me to look at it to differentiate Prod and Non 
>>>>>>>>> prod
>>>>>>>>> databases ?. ( Also the DB instance size is same but there is little
>>>>>>>>> more data in the Non prod Aurora RDS instance compared to Prod
>>>>>>>>> instance).
>>>>>>>>>
>>>>>>>>> Query used.
>>>>>>>>>>    select * from pg_stats where tablename = 'tableName'
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 11, 2019 at 2:15 PM github kran <githubk...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Postgres Team,
>>>>>>>>>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We
>>>>>>>>>>> recently deleted few million rows from the database and ran into a 
>>>>>>>>>>> issue in
>>>>>>>>>>> one of our dev account where the
>>>>>>>>>>> DB was not normal after this deletion. We did re index,
>>>>>>>>>>> vacuuming entire database but we couldnt bring it to the same state 
>>>>>>>>>>> as
>>>>>>>>>>> earlier. So next steps we deleted the database and
>>>>>>>>>>> recreated the database by copying the snapshot from a production
>>>>>>>>>>> instance. Further did vacumming, re-index on the database.
>>>>>>>>>>>
>>>>>>>>>>> After this now the dev database seems to be in a better state
>>>>>>>>>>> than earlier but we are seeing few of our DB calls are taking more 
>>>>>>>>>>> than 1
>>>>>>>>>>> minute when we are fetching data and we observed
>>>>>>>>>>> this is because the query plan was executing a hash join as part
>>>>>>>>>>> of the query whereas a similar query on prod instance is not doing 
>>>>>>>>>>> any hash
>>>>>>>>>>> join and is returning faster.
>>>>>>>>>>>
>>>>>>>>>>> Also we did not want to experiment by modifing the DB settings
>>>>>>>>>>> by doing enable_hash_join to off or random_page_count to 1 as we 
>>>>>>>>>>> dont have
>>>>>>>>>>> these settings in Prod instance.
>>>>>>>>>>>
>>>>>>>>>>> Note:
>>>>>>>>>>> The partition table sizes we have here is between 40 GB to 75 GB
>>>>>>>>>>> and this is our normal size range, we have a new partition table 
>>>>>>>>>>> for every
>>>>>>>>>>> 7 days.
>>>>>>>>>>>
>>>>>>>>>>> Appreciate your ideas on what we could be missing and what we
>>>>>>>>>>> can correct here to reduce the query latency.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> githubKran
>>>>>>>>>>>
>>>>>>>>>>

Re: Postgresql RDS DB Latency Chossing Hash join Plan

Reply via email to