Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

Haribabu kommi Wed, 11 Dec 2013 22:56:26 -0800

On 06 December 2013 11:57 Amit Kapila wrote:
> On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
> <[email protected]> wrote:
> > On 29 November 2013 12:00 Amit Kapila wrote:
> >> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
> >> <[email protected]> wrote:
> >> Few questions about your latest patch:
> >> a. Is there any reason why you are doing estimation of dead tuples
> >> only for Autovacuum and not for Vacuum.
> >
> > No, changed.
> >
> >> /* clear and get the new stats for calculating proper dead tuples */
> >> pgstat_clear_snapshot(); tabentry =
> >> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
> >> b. In the above code, to get latest data you are first clearing
> >> snapshot and then calling pgstat function. It will inturn perform
> I/O
> >>     (read of stats file) and send/receive message from stats
> >> collector to ensure it can read latest data. I think it will add
> overhead
> >>     to Vacuum, especially if 'nkeep' calculated in function
> >> lazy_scan_heap() can serve the purpose. In my simple test[1], I
> >> observed
> >>     that value of keep can serve the purpose.
> >>
> >> Can you please once try the test on 'nkeep' approach patch.
> >
> > Using the nkeep and snapshot approach, I ran the test for 40 mins
> with
> > a high analyze_threshold and results are below.
> >
> >                         Auto vacuum count               Bloat size
> > Master           11                      220MB
> > Patched_nkeep      14                      215MB
> > Patched_snapshot           18                              198MB
> >
> > Both the approaches are showing good improvement in the test.
> > Updated patches, test script and configuration is attached in the
> mail.
> 
> I think updating dead tuple count using nkeep is good idea as similar
> thing is done for Analyze as well in acquire_sample_rows().
> One minor point, I think it is better to log dead tuples is below error
> message:
> ereport(LOG,
> (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
> "pages: %d removed, %d remain\n"
> "tuples: %.0f removed, %.0f remain\n"
> 
> "tuples: %.0f removed, %.0f remain, %.0f dead\n"
> 
> 
> About your test, how to collect the data by running this script, are
> you manually stopping it after 40 mins, because I ran it for more than
> an hour, the final result didn't came.
> As I mentioned you last time, please simplify your test, for other
> person in its current form, it is difficult to make meaning out of it.
> Write comments on top of it in steps form to explain what exactly it is
> doing and how to take data using it (for example, do I need to wait,
> till script ends; how long this test can take to complete).


A simplified test and updated patch by taking care the above comment are 
attached in the mail.
I am not able to reduce the test duration but changed as the test automatically 
exists after 45 mins run.
Please check vacuum_test.sh file more details for running the test.

              Auto vacuum count   Bloat size
Master           15                220MB
Patched_nkeep    18                213MB

Please let me know your suggestions.

Regards,
Hari babu.

vacuum_fix_v7_nkeep.patch
Description: vacuum_fix_v7_nkeep.patch

test_script.tar.gz
Description: test_script.tar.gz

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

Reply via email to