On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fu...@oss.nttdata.com> > wrote: > > > > > > > > On 2020/12/02 12:53, Masahiko Sawada wrote: > > > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.m...@gmail.com> > > > wrote: > > >> > > >> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fu...@oss.nttdata.com> > > >> wrote: > > >>> > > >>> > > >>> > > >>> On 2020/12/01 16:23, Masahiko Sawada wrote: > > >>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > > >>>> <kasahara.tatsuh...@gmail.com> wrote: > > >>>>> > > >>>>> Hi, > > >>>>> > > >>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > > >>>>> <masao.fu...@oss.nttdata.com> wrote: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote: > > >>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > > >>>>>>> <kasahara.tatsuh...@gmail.com> wrote: > > >>>>>>>> > > >>>>>>>> Hi, Thanks for you comments. > > >>>>>>>> > > >>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > > >>>>>>>> <masao.fu...@oss.nttdata.com> wrote: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > >>>>>>>>>> Hi, > > >>>>>>>>>> > > >>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > > >>>>>>>>>> <masao.fu...@oss.nttdata.com> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > >>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > >>>>>>>>>>>> <sawada.m...@gmail.com> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > >>>>>>>>>>>>> <kasahara.tatsuh...@gmail.com> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > >>>>>>>>>>>>>> <sawada.m...@gmail.com> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > >>>>>>>>>>>>>>> <kasahara.tatsuh...@gmail.com> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > >>>>>>>>>>>>>>>> <kasahara.tatsuh...@gmail.com> wrote: > > >>>>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two > > >>>>>>>>>>>>>>>>>> probes of the stats > > >>>>>>>>>>>>>>>>>> data. First probe the existing stats data, and if it > > >>>>>>>>>>>>>>>>>> shows the table to > > >>>>>>>>>>>>>>>>>> be already vacuumed, return immediately. If not, *then* > > >>>>>>>>>>>>>>>>>> force a stats > > >>>>>>>>>>>>>>>>>> re-read, and check a second time. > > >>>>>>>>>>>>>>>>> Does the above mean that the second and subsequent > > >>>>>>>>>>>>>>>>> table_recheck_autovac() > > >>>>>>>>>>>>>>>>> will be improved to first check using the previous > > >>>>>>>>>>>>>>>>> refreshed statistics? > > >>>>>>>>>>>>>>>>> I think that certainly works. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I > > >>>>>>>>>>>>>>>> was able to reproduce > > >>>>>>>>>>>>>>>> what was probably a very similar problem. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> This problem seems to be more likely to occur in cases > > >>>>>>>>>>>>>>>> where you have > > >>>>>>>>>>>>>>>> a large number of tables, > > >>>>>>>>>>>>>>>> i.e., a large amount of stats, and many small tables need > > >>>>>>>>>>>>>>>> VACUUM at > > >>>>>>>>>>>>>>>> the same time. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> So I followed Tom's advice and created a patch for the PoC. > > >>>>>>>>>>>>>>>> This patch will enable a flag in the table_recheck_autovac > > >>>>>>>>>>>>>>>> function to use > > >>>>>>>>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has > > >>>>>>>>>>>>>>>> already been done > > >>>>>>>>>>>>>>>> by another worker on the check after the stats have been > > >>>>>>>>>>>>>>>> updated. > > >>>>>>>>>>>>>>>> If the tables continue to require VACUUM after the > > >>>>>>>>>>>>>>>> refresh, then a refresh > > >>>>>>>>>>>>>>>> will be required instead of using the existing statistics. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch. > > >>>>>>>>>>>>>>>> The tests were conducted in two cases. > > >>>>>>>>>>>>>>>> (I changed few configurations. see attached scripts) > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> 1. Normal VACUUM case > > >>>>>>>>>>>>>>>> - SET autovacuum = off > > >>>>>>>>>>>>>>>> - CREATE tables with 100 rows > > >>>>>>>>>>>>>>>> - DELETE 90 rows for each tables > > >>>>>>>>>>>>>>>> - SET autovacuum = on and restart PostgreSQL > > >>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be > > >>>>>>>>>>>>>>>> VACUUMed > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> 2. Anti wrap round VACUUM case > > >>>>>>>>>>>>>>>> - CREATE brank tables > > >>>>>>>>>>>>>>>> - SELECT all of these tables (for generate stats) > > >>>>>>>>>>>>>>>> - SET autovacuum_freeze_max_age to low values and > > >>>>>>>>>>>>>>>> restart PostgreSQL > > >>>>>>>>>>>>>>>> - Consumes a lot of XIDs by using txid_curent() > > >>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be > > >>>>>>>>>>>>>>>> VACUUMed > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> For each test case, the following results were obtained by > > >>>>>>>>>>>>>>>> changing > > >>>>>>>>>>>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > >>>>>>>>>>>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results > > >>>>>>>>>>>>>>>> are a little unstable, > > >>>>>>>>>>>>>>>> but I think it's enough to ask for a trend. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> =========================================================================== > > >>>>>>>>>>>>>>>> [1.Normal VACUUM case] > > >>>>>>>>>>>>>>>> tables:1000 > > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with > > >>>>>>>>>>>>>>>> patch) 20 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with > > >>>>>>>>>>>>>>>> patch) 16 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with > > >>>>>>>>>>>>>>>> patch) 16 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with > > >>>>>>>>>>>>>>>> patch) 17 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with > > >>>>>>>>>>>>>>>> patch) 17 sec > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> tables:5000 > > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with > > >>>>>>>>>>>>>>>> patch) 78 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with > > >>>>>>>>>>>>>>>> patch) 43 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with > > >>>>>>>>>>>>>>>> patch) 38 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with > > >>>>>>>>>>>>>>>> patch) 37 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with > > >>>>>>>>>>>>>>>> patch) 35 sec > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> tables:10000 > > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with > > >>>>>>>>>>>>>>>> patch) 153 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with > > >>>>>>>>>>>>>>>> patch) 98 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with > > >>>>>>>>>>>>>>>> patch) 78 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with > > >>>>>>>>>>>>>>>> patch) 66 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 97 sec VS (with > > >>>>>>>>>>>>>>>> patch) 56 sec > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> tables:20000 > > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 338 sec VS (with > > >>>>>>>>>>>>>>>> patch) 339 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 231 sec VS (with > > >>>>>>>>>>>>>>>> patch) 229 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 220 sec VS (with > > >>>>>>>>>>>>>>>> patch) 191 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 234 sec VS (with > > >>>>>>>>>>>>>>>> patch) 147 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 320 sec VS (with > > >>>>>>>>>>>>>>>> patch) 113 sec > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> [2.Anti wrap round VACUUM case] > > >>>>>>>>>>>>>>>> tables:1000 > > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 19 sec VS (with > > >>>>>>>>>>>>>>>> patch) 18 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 14 sec VS (with > > >>>>>>>>>>>>>>>> patch) 15 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 14 sec VS (with > > >>>>>>>>>>>>>>>> patch) 14 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 14 sec VS (with > > >>>>>>>>>>>>>>>> patch) 16 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 16 sec VS (with > > >>>>>>>>>>>>>>>> patch) 14 sec > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> tables:5000 > > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 69 sec VS (with > > >>>>>>>>>>>>>>>> patch) 69 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 66 sec VS (with > > >>>>>>>>>>>>>>>> patch) 47 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 59 sec VS (with > > >>>>>>>>>>>>>>>> patch) 37 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 39 sec VS (with > > >>>>>>>>>>>>>>>> patch) 28 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 39 sec VS (with > > >>>>>>>>>>>>>>>> patch) 29 sec > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> tables:10000 > > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 139 sec VS (with > > >>>>>>>>>>>>>>>> patch) 138 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 130 sec VS (with > > >>>>>>>>>>>>>>>> patch) 86 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 120 sec VS (with > > >>>>>>>>>>>>>>>> patch) 68 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 96 sec VS (with > > >>>>>>>>>>>>>>>> patch) 41 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 90 sec VS (with > > >>>>>>>>>>>>>>>> patch) 39 sec > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> tables:20000 > > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 313 sec VS (with > > >>>>>>>>>>>>>>>> patch) 331 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 209 sec VS (with > > >>>>>>>>>>>>>>>> patch) 201 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 227 sec VS (with > > >>>>>>>>>>>>>>>> patch) 141 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 236 sec VS (with > > >>>>>>>>>>>>>>>> patch) 88 sec > > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 309 sec VS (with > > >>>>>>>>>>>>>>>> patch) 74 sec > > >>>>>>>>>>>>>>>> =========================================================================== > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> The cases without patch, the scalability of the worker has > > >>>>>>>>>>>>>>>> decreased > > >>>>>>>>>>>>>>>> as the number of tables has increased. > > >>>>>>>>>>>>>>>> In fact, the more workers there are, the longer it takes > > >>>>>>>>>>>>>>>> to complete > > >>>>>>>>>>>>>>>> VACUUM to all tables. > > >>>>>>>>>>>>>>>> The cases with patch, it shows good scalability with > > >>>>>>>>>>>>>>>> respect to the > > >>>>>>>>>>>>>>>> number of workers. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> It seems a good performance improvement even without the > > >>>>>>>>>>>>>>> patch of > > >>>>>>>>>>>>>>> shared memory based stats collector. > > >>>>>>>>>>> > > >>>>>>>>>>> Sounds great! > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Note that perf top results showed that > > >>>>>>>>>>>>>>>> hash_search_with_hash_value, > > >>>>>>>>>>>>>>>> hash_seq_search and > > >>>>>>>>>>>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all > > >>>>>>>>>>>>>>>> patterns, > > >>>>>>>>>>>>>>>> with or without the patch. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Therefore, there is still a need to find ways to optimize > > >>>>>>>>>>>>>>>> the reading > > >>>>>>>>>>>>>>>> of large amounts of stats. > > >>>>>>>>>>>>>>>> However, this patch is effective in its own right, and > > >>>>>>>>>>>>>>>> since there are > > >>>>>>>>>>>>>>>> only a few parts to modify, > > >>>>>>>>>>>>>>>> I think it should be able to be applied to current > > >>>>>>>>>>>>>>>> (preferably > > >>>>>>>>>>>>>>>> pre-v13) PostgreSQL. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> +1 > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> + > > >>>>>>>>>>>>>>> + /* We might be better to refresh stats */ > > >>>>>>>>>>>>>>> + use_existing_stats = false; > > >>>>>>>>>>>>>>> } > > >>>>>>>>>>>>>>> + else > > >>>>>>>>>>>>>>> + { > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> - heap_freetuple(classTup); > > >>>>>>>>>>>>>>> + heap_freetuple(classTup); > > >>>>>>>>>>>>>>> + /* The relid has already vacuumed, so we might be > > >>>>>>>>>>>>>>> better to > > >>>>>>>>>>>>>>> use exiting stats */ > > >>>>>>>>>>>>>>> + use_existing_stats = true; > > >>>>>>>>>>>>>>> + } > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> With that patch, the autovacuum process refreshes the stats > > >>>>>>>>>>>>>>> in the > > >>>>>>>>>>>>>>> next check if it finds out that the table still needs to be > > >>>>>>>>>>>>>>> vacuumed. > > >>>>>>>>>>>>>>> But I guess it's not necessarily true because the next > > >>>>>>>>>>>>>>> table might be > > >>>>>>>>>>>>>>> vacuumed already. So I think we might want to always use > > >>>>>>>>>>>>>>> the existing > > >>>>>>>>>>>>>>> for the first check. What do you think? > > >>>>>>>>>>>>>> Thanks for your comment. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> If we assume the case where some workers vacuum on large > > >>>>>>>>>>>>>> tables > > >>>>>>>>>>>>>> and a single worker vacuum on small tables, the processing > > >>>>>>>>>>>>>> performance of the single worker will be slightly lower if > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>> existing statistics are checked every time. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> In fact, at first I tried to check the existing stats every > > >>>>>>>>>>>>>> time, > > >>>>>>>>>>>>>> but the performance was slightly worse in cases with a small > > >>>>>>>>>>>>>> number of workers. > > >>>>>>>>>>> > > >>>>>>>>>>> Do you have this benchmark result? > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>>> (Checking the existing stats is lightweight , but at high > > >>>>>>>>>>>>>> frequency, > > >>>>>>>>>>>>>> it affects processing performance.) > > >>>>>>>>>>>>>> Therefore, at after refresh statistics, determine whether > > >>>>>>>>>>>>>> autovac > > >>>>>>>>>>>>>> should use the existing statistics. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Yeah, since the test you used uses a lot of small tables, if > > >>>>>>>>>>>>> there are > > >>>>>>>>>>>>> a few workers, checking the existing stats is unlikely to > > >>>>>>>>>>>>> return true > > >>>>>>>>>>>>> (no need to vacuum). So the cost of existing stats check ends > > >>>>>>>>>>>>> up being > > >>>>>>>>>>>>> overhead. Not sure how slow always checking the existing > > >>>>>>>>>>>>> stats was, > > >>>>>>>>>>>>> but given that the shared memory based stats collector patch > > >>>>>>>>>>>>> could > > >>>>>>>>>>>>> improve the performance of refreshing stats, it might be > > >>>>>>>>>>>>> better not to > > >>>>>>>>>>>>> check the existing stats frequently like the patch does. > > >>>>>>>>>>>>> Anyway, I > > >>>>>>>>>>>>> think it’s better to evaluate the performance improvement > > >>>>>>>>>>>>> with other > > >>>>>>>>>>>>> cases too. > > >>>>>>>>>>>> Yeah, I would like to see how much the performance changes in > > >>>>>>>>>>>> other cases. > > >>>>>>>>>>>> In addition, if the shared-based-stats patch is applied, we > > >>>>>>>>>>>> won't need to reload > > >>>>>>>>>>>> a huge stats file, so we will just have to check the stats on > > >>>>>>>>>>>> shared-mem every time. > > >>>>>>>>>>>> Perhaps the logic of table_recheck_autovac could be simpler. > > >>>>>>>>>>>> > > >>>>>>>>>>>>>> BTW, I found some typos in comments, so attache a fixed > > >>>>>>>>>>>>>> version. > > >>>>>>>>>>> > > >>>>>>>>>>> The patch adds some duplicated codes into > > >>>>>>>>>>> table_recheck_autovac(). > > >>>>>>>>>>> It's better to make the common function performing them and make > > >>>>>>>>>>> table_recheck_autovac() call that common function, to simplify > > >>>>>>>>>>> the code. > > >>>>>>>>>> Thanks for your comment. > > >>>>>>>>>> Hmm.. I've cut out the duplicate part. > > >>>>>>>>>> Attach the patch. > > >>>>>>>>>> Could you confirm that it fits your expecting? > > >>>>>>>>> > > >>>>>>>>> Yes, thanks for updataing the patch! Here are another review > > >>>>>>>>> comments. > > >>>>>>>>> > > >>>>>>>>> + shared = pgstat_fetch_stat_dbentry(InvalidOid); > > >>>>>>>>> + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); > > >>>>>>>>> > > >>>>>>>>> When using the existing stats, ISTM that these are not necessary > > >>>>>>>>> and > > >>>>>>>>> we can reuse "shared" and "dbentry" obtained before. Right? > > >>>>>>>> Yeah, but unless autovac_refresh_stats() is called, these functions > > >>>>>>>> read the information from the > > >>>>>>>> local hash table without re-read the stats file, so the process is > > >>>>>>>> very light. > > >>>>>>>> Therefore, I think, it is better to keep the current logic to keep > > >>>>>>>> the > > >>>>>>>> code simple. > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> + /* We might be better to refresh stats */ > > >>>>>>>>> + use_existing_stats = false; > > >>>>>>>>> > > >>>>>>>>> I think that we should add more comments about why it's better to > > >>>>>>>>> refresh the stats in this case. > > >>>>>>>>> > > >>>>>>>>> + /* The relid has already vacuumed, so we might be > > >>>>>>>>> better to use existing stats */ > > >>>>>>>>> + use_existing_stats = true; > > >>>>>>>>> > > >>>>>>>>> I think that we should add more comments about why it's better to > > >>>>>>>>> reuse the stats in this case. > > >>>>>>>> I added comments. > > >>>>>>>> > > >>>>>>>> Attache the patch. > > >>>>>>>> > > >>>>>>> > > >>>>>>> Thank you for updating the patch. Here are some small comments on > > >>>>>>> the > > >>>>>>> latest (v4) patch. > > >>>>>>> > > >>>>>>> + * So if the last time we checked a table that was already > > >>>>>>> vacuumed after > > >>>>>>> + * refres stats, check the current statistics before refreshing > > >>>>>>> it. > > >>>>>>> + */ > > >>>>>>> > > >>>>>>> s/refres/refresh/ > > >>>>> Thanks! fixed. > > >>>>> Attached the patch. > > >>>>> > > >>>>>>> > > >>>>>>> ----- > > >>>>>>> +/* Counter to determine if statistics should be refreshed */ > > >>>>>>> +static bool use_existing_stats = false; > > >>>>>>> + > > >>>>>>> > > >>>>>>> I think 'use_existing_stats' can be declared within > > >>>>>>> table_recheck_autovac(). > > >>>>>>> > > >>>>>>> ----- > > >>>>>>> While testing the performance, I realized that the statistics are > > >>>>>>> reset every time vacuumed one table, leading to re-reading the stats > > >>>>>>> file even if 'use_existing_stats' is true. Please refer that > > >>>>>>> vacuum() > > >>>>>>> eventually calls AtEOXact_PgStat() which calls to > > >>>>>>> pgstat_clear_snapshot(). > > >>>>>> > > >>>>>> Good catch! > > >>>>>> > > >>>>>> > > >>>>>>> I believe that's why the performance of the > > >>>>>>> method of always checking the existing stats wasn’t good as > > >>>>>>> expected. > > >>>>>>> So if we save the statistics somewhere and use it for rechecking, > > >>>>>>> the > > >>>>>>> results of the performance benchmark will differ between these two > > >>>>>>> methods. > > >>>>> Thanks for you checks. > > >>>>> But, if a worker did vacuum(), that means this worker had determined > > >>>>> need vacuum in the > > >>>>> table_recheck_autovac(). So, use_existing_stats set to false, and next > > >>>>> time, refresh stats. > > >>>>> Therefore I think the current patch is fine, as we want to avoid > > >>>>> unnecessary refreshing of > > >>>>> statistics before the actual vacuum(), right? > > >>>> > > >>>> Yes, you're right. > > >>>> > > >>>> When I benchmarked the performance of the method of always checking > > >>>> existing stats I edited your patch so that it sets use_existing_stats > > >>>> = true even if the second check is false (i.g., vacuum is needed). > > >>>> And the result I got was worse than expected especially in the case of > > >>>> a few autovacuum workers. But it doesn't evaluate the performance of > > >>>> that method rightly as the stats snapshot is cleared every time > > >>>> vacuum. Given you had similar results, I guess you used a similar way > > >>>> when evaluating it, is it right? If so, it’s better to fix this issue > > >>>> and see how the performance benchmark results will differ. > > >>>> > > >>>> For example, the results of the test case with 10000 tables and 1 > > >>>> autovacuum worker I reported before was: > > >>>> > > >>>> 10000 tables: > > >>>> autovac_workers 1 : 158s,157s, 290s > > >>>> > > >>>> But after fixing that issue in the third method (always checking the > > >>>> existing stats), the results are: > > >>> > > >>> Could you tell me how you fixed that issue? You copied the stats to > > >>> somewhere as you suggested or skipped pgstat_clear_snapshot() as > > >>> I suggested? > > >> > > >> I used the way you suggested in this quick test; skipped > > >> pgstat_clear_snapshot(). > > >> > > >>> > > >>> Kasahara-san seems not to like the latter idea because it might > > >>> cause bad side effect. So we should use the former idea? > > >> > > >> Not sure. I'm also concerned about the side effect but I've not checked > > >> yet. > > >> > > >> Since probably there is no big difference between the two ways in > > >> terms of performance I'm going to see how the performance benchmark > > >> result will change first. > > > > > > I've tested performance improvement again. From the left the execution > > > time of the current HEAD, Kasahara-san's patch, the method of always > > > checking the existing stats (using approach suggested by Fujii-san), > > > in seconds. > > > > > > 1000 tables: > > > autovac_workers 1 : 13s, 13s, 13s > > > autovac_workers 2 : 6s, 4s, 4s > > > autovac_workers 3 : 3s, 4s, 3s > > > autovac_workers 5 : 3s, 3s, 2s > > > autovac_workers 10: 2s, 3s, 2s > > > > > > 5000 tables: > > > autovac_workers 1 : 71s, 71s, 72s > > > autovac_workers 2 : 37s, 32s, 32s > > > autovac_workers 3 : 29s, 26s, 26s > > > autovac_workers 5 : 20s, 19s, 18s > > > autovac_workers 10: 13s, 8s, 8s > > > > > > 10000 tables: > > > autovac_workers 1 : 158s,157s, 159s > > > autovac_workers 2 : 80s, 53s, 78s > > > autovac_workers 3 : 75s, 67s, 67s > > > autovac_workers 5 : 61s, 42s, 42s > > > autovac_workers 10: 69s, 26s, 25s > > > > > > 20000 tables: > > > autovac_workers 1 : 379s, 380s, 389s > > > autovac_workers 2 : 236s, 232s, 233s > > > autovac_workers 3 : 222s, 181s, 182s > > > autovac_workers 5 : 212s, 132s, 139s > > > autovac_workers 10: 317s, 91s, 89s > > > > > > I don't see a big difference between Kasahara-san's patch and the > > > method of always checking the existing stats. > > > > Thanks for doing the benchmark! > > > > This benchmark result makes me think that we don't need to tweak > > AtEOXact_PgStat() and can use Kasahara-san approach. > > That's good news :) > > Yeah, given that all autovaucum workers have the list of tables to > vacuum in the same order in most cases, the assumption in > Kasahara-san’s patch that if a worker needs to vacuum a table it’s > unlikely that it will be able to skip the next table using the current > snapshot of stats makes sense to me. > > One small comment on v6 patch: > > + /* When we decide to do vacuum or analyze, the existing stats cannot > + * be reused in the next cycle because it's cleared at the end of vacuum > + * or analyze (by AtEOXact_PgStat()). > + */ > + use_existing_stats = false; > > I think the comment should start on the second line (i.g., \n is > needed after /*). Oops, thanks. Fixed.
Best regards, > > Regards, > > -- > Masahiko Sawada > EnterpriseDB: https://www.enterprisedb.com/ -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
v7_mod_table_recheck_autovac.patch
Description: Binary data