On Thu, Feb 14, 2019 at 12:43 PM Michael Lewis <mle...@entrata.com> wrote:
> How many total rows in these tables? I am assuming these are partitions > and those 100 million rows got distributed. If the data difference is > significant, then you aren't guaranteed similar performance. You may want > to follow more of the suggested steps on. > > https://wiki.postgresql.org/wiki/SlowQueryQuestions > > > *Michael Lewis | Software Engineer* > *Entrata* > *c: **619.370.8697 <619-370-8697>* > > Michael - Yes correct the data of 100 million rows is distributed to all the partitions. FInally I feel we have come to conclusion after we changed the random_page_cost from 4 to 2 in Non prod instance and we see improvements in the query which use to take from 1 minute to 1 -2 seconds. That's correct we have around 490 million rows in few of our partition tables. The partition tables are created for every 7 days. We ran our API test which hits the backend database Aurora RDS PostgreSQL and see our query response times , requests/sec are better than before. Do you recommend this setting on a Production instance? Right now we are planning to go implement this option of random_page_cost to 2. ( We know it also works if we turn off the hash_join but we dont want to implement this change but rather use random_page_cost to 2). Questions. 1) What is your recommendation on this ? Can we modify this change on Prod instance which is performing better today or only keep this change to Non prod instance ?. ( Actually we want to implement this change on Non Prod instance) 2) Are there are any downside in changing this value ?. 3) Also another question we have a 64TB storage and running on r4 16 x large EC2 instance. Few of our partitions are bigger and few of them are smaller. We have data from last 9 months and we are planning to keep the data for about close to 1 year till May. Do you see any concerns on this ?. Eventually we are thinking to archive this data in next 2 months by dropping of older partitions. 3) What could be the problems of keeping the data longer if there is a regular maintenance like VACUUM and other maintenace activities > > On Thu, Feb 14, 2019 at 8:48 AM github kran <githubk...@gmail.com> wrote: > >> >> >> On Wed, Feb 13, 2019 at 11:38 AM Michael Lewis <mle...@entrata.com> >> wrote: >> >>> I didn't see your email yesterday, sorry about that. Index scans instead >>> of sequential scans and nested loop instead of hash join means that you >>> have bad row count estimates on "Non prod Aurora RDS instance" as far as I >>> can figure. Have you run commands like- >>> >>> analyze asset_info_2019_2_part4; >>> analyze asset_info_2019_2_part2; >>> >>> etc? If data are very similar, indexes all exist, and >>> default_statistics_target are the same, then you should be getting the same >>> plans. >>> >>> >>> *Michael Lewis | Software Engineer* >>> *Entrata* >>> *c: **619.370.8697 <619-370-8697>* >>> >>> 1) Yes did the VACUUM for all the tables like asset_info_2019_2_part1, >> part2 , part4 and also for location_info_xxx to remove any dead tuples and >> also rebuilt the indexes.. >> 2) REINDEX table location_data_2018_12_part4;( Like wise ran for all >> the tables and also VACUUM for all the tables). >> 3) The data in Non prod instance is more. One thing to mention here >> when we built the Non prod instance we copied SNAPSHOT from Prod instance >> and on top of that inserted data about 100 million rows and then did VACUUM >> and re-indexed the tables. >> >> I cant think of anything we can do here but let us know if you need any >> more details on this problem. Iam happy to share more details. >> >> >>> >>> On Wed, Feb 13, 2019 at 8:49 AM github kran <githubk...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Tue, Feb 12, 2019 at 12:55 PM github kran <githubk...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Feb 12, 2019 at 12:33 PM Michael Lewis <mle...@entrata.com> >>>>> wrote: >>>>> >>>>>> Did you update the stats by running ANALYZE on the tables involved, >>>>>> or perhaps the entire database on the 'Non prod Aurora RDS instance'? Can >>>>>> you share the two execution plans? >>>>>> >>>>>> *Michael Lewis | Software Engineer* >>>>>> *Entrata* >>>>>> *c: **619.370.8697 <619-370-8697>* >>>>>> >>>>> >>>>> Here is the plan for both of the DB instances. >>>>> >>>> >>>> Michael - Did you get any chance to look at this issue. Also we see >>>> there is a sequential scan being done instead of index scan. >>>> >>>>> >>>>>> >>>>>> On Tue, Feb 12, 2019 at 11:27 AM github kran <githubk...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 12, 2019 at 7:07 AM github kran <githubk...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 11, 2019 at 6:00 PM github kran <githubk...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis <mle...@entrata.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Are default statistics target the same on both prod and AWS? Have >>>>>>>>>> you analyzed all tables being used in this query to ensure stats are >>>>>>>>>> up >>>>>>>>>> proper? If the optimizer is choosing a different plan, then the >>>>>>>>>> stats must >>>>>>>>>> be different IMO. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Michael Lewis | Software Engineer* >>>>>>>>>> *Entrata* >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for your reply I have verified few of the tables and their >>>>>>>>> default statistics target and they seem to be same but is there >>>>>>>>> anything in >>>>>>>>> particular you want me to look at it to differentiate Prod and Non >>>>>>>>> prod >>>>>>>>> databases ?. ( Also the DB instance size is same but there is little >>>>>>>>> more data in the Non prod Aurora RDS instance compared to Prod >>>>>>>>> instance). >>>>>>>>> >>>>>>>>> Query used. >>>>>>>>>> select * from pg_stats where tablename = 'tableName' >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 11, 2019 at 2:15 PM github kran <githubk...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Postgres Team, >>>>>>>>>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We >>>>>>>>>>> recently deleted few million rows from the database and ran into a >>>>>>>>>>> issue in >>>>>>>>>>> one of our dev account where the >>>>>>>>>>> DB was not normal after this deletion. We did re index, >>>>>>>>>>> vacuuming entire database but we couldnt bring it to the same state >>>>>>>>>>> as >>>>>>>>>>> earlier. So next steps we deleted the database and >>>>>>>>>>> recreated the database by copying the snapshot from a production >>>>>>>>>>> instance. Further did vacumming, re-index on the database. >>>>>>>>>>> >>>>>>>>>>> After this now the dev database seems to be in a better state >>>>>>>>>>> than earlier but we are seeing few of our DB calls are taking more >>>>>>>>>>> than 1 >>>>>>>>>>> minute when we are fetching data and we observed >>>>>>>>>>> this is because the query plan was executing a hash join as part >>>>>>>>>>> of the query whereas a similar query on prod instance is not doing >>>>>>>>>>> any hash >>>>>>>>>>> join and is returning faster. >>>>>>>>>>> >>>>>>>>>>> Also we did not want to experiment by modifing the DB settings >>>>>>>>>>> by doing enable_hash_join to off or random_page_count to 1 as we >>>>>>>>>>> dont have >>>>>>>>>>> these settings in Prod instance. >>>>>>>>>>> >>>>>>>>>>> Note: >>>>>>>>>>> The partition table sizes we have here is between 40 GB to 75 GB >>>>>>>>>>> and this is our normal size range, we have a new partition table >>>>>>>>>>> for every >>>>>>>>>>> 7 days. >>>>>>>>>>> >>>>>>>>>>> Appreciate your ideas on what we could be missing and what we >>>>>>>>>>> can correct here to reduce the query latency. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> githubKran >>>>>>>>>>> >>>>>>>>>>