Thanks for the feedback. On Tue, Feb 19, 2019 at 11:12 AM Michael Lewis <mle...@entrata.com> wrote:
> 1) You can increase it as much as you want but (auto)analyze will take > longer to examine the values of default_stat_target * 300 rows and compute > the most common values and the frequencies of those values. How much > variation does you data actually have? If your data only has 50 distinct > values with fairly even distribution, then no need to increase it from 100 > even. Oh, deciding on the best query plan will take a little more time for > the optimizer since it will be examining bigger collection of stats on the > tables that you have increased. > > 2) I am not aware. > > 3) I am not aware of anything about your application so I can't recommend > any number outright, but 4MB for work_mem definitely seems low to me > assuming you have 16GB or more memory available unless you have very high > concurrency. It will depend on how many sorts per statement, how many > users, etc. If you spill over to disk on routine operations, then things > are definitely going to be much slower than if you are able to keep things > in memory. You could try running explain analyze and just verify that you > are keeping things in memory. You could also turn on automatic gathering of > explain analyze plans on live if you have the room for logging and can > tolerate just a little latency. > > https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server > > https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM > > I'm glad your performance is improved in AWS/dev environment. It can be a > big hassle to test things in an environment that performs significantly > different. > > > *Michael Lewis* > > > On Sun, Feb 17, 2019 at 10:01 AM github kran <githubk...@gmail.com> wrote: > >> >> >> On Thu, Feb 14, 2019 at 4:58 PM Michael Lewis <mle...@entrata.com> wrote: >> >>> This is beyond my expertise except to say that if your storage is SSDs >>> in AWS, then you definitely want random_page_cost close to the same as >>> seq_page_cost (1 by default) assuming your data is likely to be in cache as >>> discussed in the documentation >>> <https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS>. >>> As it says- "Reducing this value relative to seq_page_cost will cause the >>> system to prefer index scans" as you saw. Changing the value on production >>> would again depend on the storage type used, and how good the cache hit >>> rate is. >>> >>> As far as I know, dropping old partitions should not be significantly >>> impactful to the system other than no longer needing to store that data >>> (cost, time for full backups, etc). >>> >>> Again, as I understand things, there is not a big impact from having old >>> unused tables in terms of maintenance. They should be ignored by normal >>> processes. >>> >>> Glad you got your issue resolved. >>> >>> >>> *Michael Lewis* >>> >> >> Thanks for the feedback.You have been giving your >> thoughts/suggestions since the beginning of the case. It was helpful. I >> think I realized later based on your suggestion to increase the default >> statistics target from 100. It was not correctly initially >> as I had that set at session level without setting them on the >> partition tables. As next steps I have the stats to 1000 on all of the >> partitiion tables we use from Nov 2018 to Feb 19 and then did ANALYZE. Iam >> currently running a load test to test >> how the DB performance is behaving right now and so far its running >> good than before. ( I have reset all the previous changes done except the >> statistics change). I will keep you posted after the test finishes >> >> Questions. >> 1) Can i further increase the Setting to 3000 and see the system >> behaves. ?. How do I know the best value to be used for my database in >> terms of the sampling limit with the default statistics setting ?. >> 2) Apart from analyzing the tables do I need to do any other >> changes with the statistics setting ? >> 3) Also the current work mem is set to 4 MB and we didnt play with >> this value so far. For future needs can I increase the WORK MEM setting ?. >> >> Appreciate your reply. >> >> Thanks >> >> >>> On Thu, Feb 14, 2019 at 3:11 PM github kran <githubk...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Thu, Feb 14, 2019 at 12:43 PM Michael Lewis <mle...@entrata.com> >>>> wrote: >>>> >>>>> How many total rows in these tables? I am assuming these are >>>>> partitions and those 100 million rows got distributed. If the data >>>>> difference is significant, then you aren't guaranteed similar performance. >>>>> You may want to follow more of the suggested steps on. >>>>> >>>>> https://wiki.postgresql.org/wiki/SlowQueryQuestions >>>>> >>>>> >>>>> *Michael Lewis | Software Engineer* >>>>> *Entrata* >>>>> *c: **619.370.8697 <619-370-8697>* >>>>> >>>>> Michael - Yes correct the data of 100 million rows is distributed to >>>> all the partitions. >>>> FInally I feel we have come to conclusion after we changed the >>>> random_page_cost from 4 to 2 in Non prod instance and we see improvements >>>> in the query which use to take from 1 minute to 1 -2 seconds. >>>> That's correct we have around 490 million rows in few of our partition >>>> tables. The partition tables are created for every 7 days. >>>> >>>> We ran our API test which hits the backend database Aurora RDS >>>> PostgreSQL and see our query response times , requests/sec are better than >>>> before. Do you recommend this setting on a Production instance? Right now >>>> we are planning to go >>>> implement this option of random_page_cost to 2. ( We know it also works >>>> if we turn off the hash_join but we dont want to implement this change but >>>> rather use random_page_cost to 2). >>>> >>>> Questions. >>>> 1) What is your recommendation on this ? Can we modify this change on >>>> Prod instance which is performing better today or only keep this change to >>>> Non prod instance ?. ( Actually we want to implement this change on Non >>>> Prod instance) >>>> 2) Are there are any downside in changing this value ?. >>>> 3) Also another question we have a 64TB storage and running on r4 16 x >>>> large EC2 instance. Few of our partitions are bigger and few of them are >>>> smaller. We have data from last 9 months and we are planning to keep the >>>> data for about >>>> close to 1 year till May. Do you see any concerns on this ?. Eventually >>>> we are thinking to archive this data in next 2 months by dropping of older >>>> partitions. >>>> 3) What could be the problems of keeping the data longer if there is a >>>> regular maintenance like VACUUM and other maintenace activities >>>> >>>> >>>> >>>>> >>>>> On Thu, Feb 14, 2019 at 8:48 AM github kran <githubk...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Feb 13, 2019 at 11:38 AM Michael Lewis <mle...@entrata.com> >>>>>> wrote: >>>>>> >>>>>>> I didn't see your email yesterday, sorry about that. Index scans >>>>>>> instead of sequential scans and nested loop instead of hash join means >>>>>>> that >>>>>>> you have bad row count estimates on "Non prod Aurora RDS instance" as >>>>>>> far >>>>>>> as I can figure. Have you run commands like- >>>>>>> >>>>>>> analyze asset_info_2019_2_part4; >>>>>>> analyze asset_info_2019_2_part2; >>>>>>> >>>>>>> etc? If data are very similar, indexes all exist, and >>>>>>> default_statistics_target are the same, then you should be getting the >>>>>>> same >>>>>>> plans. >>>>>>> >>>>>>> >>>>>>> *Michael Lewis | Software Engineer* >>>>>>> *Entrata* >>>>>>> *c: **619.370.8697 <619-370-8697>* >>>>>>> >>>>>>> 1) Yes did the VACUUM for all the tables like >>>>>> asset_info_2019_2_part1, part2 , part4 and also for location_info_xxx to >>>>>> remove any dead tuples and also rebuilt the indexes.. >>>>>> 2) REINDEX table location_data_2018_12_part4;( Like wise ran for >>>>>> all the tables and also VACUUM for all the tables). >>>>>> 3) The data in Non prod instance is more. One thing to mention here >>>>>> when we built the Non prod instance we copied SNAPSHOT from Prod instance >>>>>> and on top of that inserted data about 100 million rows and then did >>>>>> VACUUM >>>>>> and re-indexed the tables. >>>>>> >>>>>> I cant think of anything we can do here but let us know if you need >>>>>> any more details on this problem. Iam happy to share more details. >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Feb 13, 2019 at 8:49 AM github kran <githubk...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 12, 2019 at 12:55 PM github kran <githubk...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Feb 12, 2019 at 12:33 PM Michael Lewis <mle...@entrata.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Did you update the stats by running ANALYZE on the tables >>>>>>>>>> involved, or perhaps the entire database on the 'Non prod Aurora RDS >>>>>>>>>> instance'? Can you share the two execution plans? >>>>>>>>>> >>>>>>>>>> *Michael Lewis | Software Engineer* >>>>>>>>>> *Entrata* >>>>>>>>>> *c: **619.370.8697 <619-370-8697>* >>>>>>>>>> >>>>>>>>> >>>>>>>>> Here is the plan for both of the DB instances. >>>>>>>>> >>>>>>>> >>>>>>>> Michael - Did you get any chance to look at this issue. Also we see >>>>>>>> there is a sequential scan being done instead of index scan. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 12, 2019 at 11:27 AM github kran < >>>>>>>>>> githubk...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 12, 2019 at 7:07 AM github kran < >>>>>>>>>>> githubk...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Feb 11, 2019 at 6:00 PM github kran < >>>>>>>>>>>> githubk...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis < >>>>>>>>>>>>> mle...@entrata.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Are default statistics target the same on both prod and AWS? >>>>>>>>>>>>>> Have you analyzed all tables being used in this query to ensure >>>>>>>>>>>>>> stats are >>>>>>>>>>>>>> up proper? If the optimizer is choosing a different plan, then >>>>>>>>>>>>>> the stats >>>>>>>>>>>>>> must be different IMO. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Michael Lewis | Software Engineer* >>>>>>>>>>>>>> *Entrata* >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for your reply I have verified few of the tables and >>>>>>>>>>>>> their default statistics target and they seem to be same but is >>>>>>>>>>>>> there >>>>>>>>>>>>> anything in particular you want me to look at it to differentiate >>>>>>>>>>>>> Prod and >>>>>>>>>>>>> Non prod databases ?. ( Also the DB instance size is same but >>>>>>>>>>>>> there is >>>>>>>>>>>>> little >>>>>>>>>>>>> more data in the Non prod Aurora RDS instance compared to Prod >>>>>>>>>>>>> instance). >>>>>>>>>>>>> >>>>>>>>>>>>> Query used. >>>>>>>>>>>>>> select * from pg_stats where tablename = 'tableName' >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Feb 11, 2019 at 2:15 PM github kran < >>>>>>>>>>>>>> githubk...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Postgres Team, >>>>>>>>>>>>>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We >>>>>>>>>>>>>>> recently deleted few million rows from the database and ran >>>>>>>>>>>>>>> into a issue in >>>>>>>>>>>>>>> one of our dev account where the >>>>>>>>>>>>>>> DB was not normal after this deletion. We did re index, >>>>>>>>>>>>>>> vacuuming entire database but we couldnt bring it to the same >>>>>>>>>>>>>>> state as >>>>>>>>>>>>>>> earlier. So next steps we deleted the database and >>>>>>>>>>>>>>> recreated the database by copying the snapshot from a >>>>>>>>>>>>>>> production instance. Further did vacumming, re-index on the >>>>>>>>>>>>>>> database. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> After this now the dev database seems to be in a better >>>>>>>>>>>>>>> state than earlier but we are seeing few of our DB calls are >>>>>>>>>>>>>>> taking more >>>>>>>>>>>>>>> than 1 minute when we are fetching data and we observed >>>>>>>>>>>>>>> this is because the query plan was executing a hash join as >>>>>>>>>>>>>>> part of the query whereas a similar query on prod instance is >>>>>>>>>>>>>>> not doing any >>>>>>>>>>>>>>> hash join and is returning faster. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also we did not want to experiment by modifing the DB >>>>>>>>>>>>>>> settings by doing enable_hash_join to off or random_page_count >>>>>>>>>>>>>>> to 1 as we >>>>>>>>>>>>>>> dont have these settings in Prod instance. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Note: >>>>>>>>>>>>>>> The partition table sizes we have here is between 40 GB to >>>>>>>>>>>>>>> 75 GB and this is our normal size range, we have a new >>>>>>>>>>>>>>> partition table for >>>>>>>>>>>>>>> every 7 days. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Appreciate your ideas on what we could be missing and what >>>>>>>>>>>>>>> we can correct here to reduce the query latency. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> githubKran >>>>>>>>>>>>>>> >>>>>>>>>>>>>>