Hi, Thanks for you comments. On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote: > > > > On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > Hi, > > > > On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fu...@oss.nttdata.com> > > wrote: > >> > >> > >> > >> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > >>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.m...@gmail.com> > >>> wrote: > >>>> > >>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >>>> <kasahara.tatsuh...@gmail.com> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.m...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > >>>>>> <kasahara.tatsuh...@gmail.com> wrote: > >>>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > >>>>>>> <kasahara.tatsuh...@gmail.com> wrote: > >>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of > >>>>>>>>> the stats > >>>>>>>>> data. First probe the existing stats data, and if it shows the > >>>>>>>>> table to > >>>>>>>>> be already vacuumed, return immediately. If not, *then* force a > >>>>>>>>> stats > >>>>>>>>> re-read, and check a second time. > >>>>>>>> Does the above mean that the second and subsequent > >>>>>>>> table_recheck_autovac() > >>>>>>>> will be improved to first check using the previous refreshed > >>>>>>>> statistics? > >>>>>>>> I think that certainly works. > >>>>>>>> > >>>>>>>> If that's correct, I'll try to create a patch for the PoC > >>>>>>> > >>>>>>> I still don't know how to reproduce Jim's troubles, but I was able to > >>>>>>> reproduce > >>>>>>> what was probably a very similar problem. > >>>>>>> > >>>>>>> This problem seems to be more likely to occur in cases where you have > >>>>>>> a large number of tables, > >>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at > >>>>>>> the same time. > >>>>>>> > >>>>>>> So I followed Tom's advice and created a patch for the PoC. > >>>>>>> This patch will enable a flag in the table_recheck_autovac function > >>>>>>> to use > >>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already been > >>>>>>> done > >>>>>>> by another worker on the check after the stats have been updated. > >>>>>>> If the tables continue to require VACUUM after the refresh, then a > >>>>>>> refresh > >>>>>>> will be required instead of using the existing statistics. > >>>>>>> > >>>>>>> I did simple test with HEAD and HEAD + this PoC patch. > >>>>>>> The tests were conducted in two cases. > >>>>>>> (I changed few configurations. see attached scripts) > >>>>>>> > >>>>>>> 1. Normal VACUUM case > >>>>>>> - SET autovacuum = off > >>>>>>> - CREATE tables with 100 rows > >>>>>>> - DELETE 90 rows for each tables > >>>>>>> - SET autovacuum = on and restart PostgreSQL > >>>>>>> - Measure the time it takes for all tables to be VACUUMed > >>>>>>> > >>>>>>> 2. Anti wrap round VACUUM case > >>>>>>> - CREATE brank tables > >>>>>>> - SELECT all of these tables (for generate stats) > >>>>>>> - SET autovacuum_freeze_max_age to low values and restart > >>>>>>> PostgreSQL > >>>>>>> - Consumes a lot of XIDs by using txid_curent() > >>>>>>> - Measure the time it takes for all tables to be VACUUMed > >>>>>>> > >>>>>>> For each test case, the following results were obtained by changing > >>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > >>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000. > >>>>>>> > >>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a > >>>>>>> little unstable, > >>>>>>> but I think it's enough to ask for a trend. > >>>>>>> > >>>>>>> =========================================================================== > >>>>>>> [1.Normal VACUUM case] > >>>>>>> tables:1000 > >>>>>>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > >>>>>>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > >>>>>>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > >>>>>>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > >>>>>>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > >>>>>>> > >>>>>>> tables:5000 > >>>>>>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > >>>>>>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > >>>>>>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > >>>>>>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > >>>>>>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > >>>>>>> > >>>>>>> tables:10000 > >>>>>>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 > >>>>>>> sec > >>>>>>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 > >>>>>>> sec > >>>>>>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 > >>>>>>> sec > >>>>>>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 > >>>>>>> sec > >>>>>>> autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 > >>>>>>> sec > >>>>>>> > >>>>>>> tables:20000 > >>>>>>> autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 > >>>>>>> sec > >>>>>>> autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 > >>>>>>> sec > >>>>>>> autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 > >>>>>>> sec > >>>>>>> autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 > >>>>>>> sec > >>>>>>> autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 > >>>>>>> sec > >>>>>>> > >>>>>>> [2.Anti wrap round VACUUM case] > >>>>>>> tables:1000 > >>>>>>> autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > >>>>>>> autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > >>>>>>> autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > >>>>>>> autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > >>>>>>> autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > >>>>>>> > >>>>>>> tables:5000 > >>>>>>> autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec > >>>>>>> autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec > >>>>>>> autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec > >>>>>>> autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec > >>>>>>> autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec > >>>>>>> > >>>>>>> tables:10000 > >>>>>>> autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec > >>>>>>> autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec > >>>>>>> autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec > >>>>>>> autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec > >>>>>>> autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec > >>>>>>> > >>>>>>> tables:20000 > >>>>>>> autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec > >>>>>>> autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec > >>>>>>> autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec > >>>>>>> autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec > >>>>>>> autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec > >>>>>>> =========================================================================== > >>>>>>> > >>>>>>> The cases without patch, the scalability of the worker has decreased > >>>>>>> as the number of tables has increased. > >>>>>>> In fact, the more workers there are, the longer it takes to complete > >>>>>>> VACUUM to all tables. > >>>>>>> The cases with patch, it shows good scalability with respect to the > >>>>>>> number of workers. > >>>>>> > >>>>>> It seems a good performance improvement even without the patch of > >>>>>> shared memory based stats collector. > >> > >> Sounds great! > >> > >> > >>>>>> > >>>>>>> > >>>>>>> Note that perf top results showed that hash_search_with_hash_value, > >>>>>>> hash_seq_search and > >>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all patterns, > >>>>>>> with or without the patch. > >>>>>>> > >>>>>>> Therefore, there is still a need to find ways to optimize the reading > >>>>>>> of large amounts of stats. > >>>>>>> However, this patch is effective in its own right, and since there are > >>>>>>> only a few parts to modify, > >>>>>>> I think it should be able to be applied to current (preferably > >>>>>>> pre-v13) PostgreSQL. > >>>>>> > >>>>>> +1 > >>>>>> > >>>>>> + > >>>>>> + /* We might be better to refresh stats */ > >>>>>> + use_existing_stats = false; > >>>>>> } > >>>>>> + else > >>>>>> + { > >>>>>> > >>>>>> - heap_freetuple(classTup); > >>>>>> + heap_freetuple(classTup); > >>>>>> + /* The relid has already vacuumed, so we might be better to > >>>>>> use exiting stats */ > >>>>>> + use_existing_stats = true; > >>>>>> + } > >>>>>> > >>>>>> With that patch, the autovacuum process refreshes the stats in the > >>>>>> next check if it finds out that the table still needs to be vacuumed. > >>>>>> But I guess it's not necessarily true because the next table might be > >>>>>> vacuumed already. So I think we might want to always use the existing > >>>>>> for the first check. What do you think? > >>>>> Thanks for your comment. > >>>>> > >>>>> If we assume the case where some workers vacuum on large tables > >>>>> and a single worker vacuum on small tables, the processing > >>>>> performance of the single worker will be slightly lower if the > >>>>> existing statistics are checked every time. > >>>>> > >>>>> In fact, at first I tried to check the existing stats every time, > >>>>> but the performance was slightly worse in cases with a small number of > >>>>> workers. > >> > >> Do you have this benchmark result? > >> > >> > >>>>> (Checking the existing stats is lightweight , but at high frequency, > >>>>> it affects processing performance.) > >>>>> Therefore, at after refresh statistics, determine whether autovac > >>>>> should use the existing statistics. > >>>> > >>>> Yeah, since the test you used uses a lot of small tables, if there are > >>>> a few workers, checking the existing stats is unlikely to return true > >>>> (no need to vacuum). So the cost of existing stats check ends up being > >>>> overhead. Not sure how slow always checking the existing stats was, > >>>> but given that the shared memory based stats collector patch could > >>>> improve the performance of refreshing stats, it might be better not to > >>>> check the existing stats frequently like the patch does. Anyway, I > >>>> think it’s better to evaluate the performance improvement with other > >>>> cases too. > >>> Yeah, I would like to see how much the performance changes in other cases. > >>> In addition, if the shared-based-stats patch is applied, we won't need to > >>> reload > >>> a huge stats file, so we will just have to check the stats on > >>> shared-mem every time. > >>> Perhaps the logic of table_recheck_autovac could be simpler. > >>> > >>>>> BTW, I found some typos in comments, so attache a fixed version. > >> > >> The patch adds some duplicated codes into table_recheck_autovac(). > >> It's better to make the common function performing them and make > >> table_recheck_autovac() call that common function, to simplify the code. > > Thanks for your comment. > > Hmm.. I've cut out the duplicate part. > > Attach the patch. > > Could you confirm that it fits your expecting? > > Yes, thanks for updataing the patch! Here are another review comments. > > + shared = pgstat_fetch_stat_dbentry(InvalidOid); > + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); > > When using the existing stats, ISTM that these are not necessary and > we can reuse "shared" and "dbentry" obtained before. Right? Yeah, but unless autovac_refresh_stats() is called, these functions read the information from the local hash table without re-read the stats file, so the process is very light. Therefore, I think, it is better to keep the current logic to keep the code simple.
> > + /* We might be better to refresh stats */ > + use_existing_stats = false; > > I think that we should add more comments about why it's better to > refresh the stats in this case. > > + /* The relid has already vacuumed, so we might be better to > use existing stats */ > + use_existing_stats = true; > > I think that we should add more comments about why it's better to > reuse the stats in this case. I added comments. Attache the patch. Best regards, > > Regards, > > > -- > Fujii Masao > Advanced Computing Technology Center > Research and Development Headquarters > NTT DATA CORPORATION -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
v4_mod_table_recheck_autovac.patch
Description: Binary data