> > > Each table has 4 indices. The updates are to the indexed columns. > > Here is the schema of a table: http://pastebin.com/ffu0dUjS All tables > have this same schema except that some tables don't have a port column and > so will have one less index
What indexes exist? Are the updates to indexed columns? Sorry I was wrong when I said the updates are to the indexed columns. The updates are to the 'data[]' column which doesn't have any index. On Wed, Apr 30, 2014 at 10:45 AM, Elanchezhiyan Elango <elanela...@gmail.com > wrote: > Missed to answer this one: > >> Is the 5GB for the table plus indexes, or just the table itself? > > No it's not including the the indices. Including indices it's actually > 17GB!! > > > On Wed, Apr 30, 2014 at 10:40 AM, Elanchezhiyan Elango < > elanela...@gmail.com> wrote: > >> Sergey, >> Thanks for the aggressive settings. I have listed some settings I am >> planning to try below. Please review and let me know your feedback. >> >> Francisco, >> Thanks for the partitioning idea. I used to have the tables partitioned. >> But now that I have moved to a schema where data is split across about ~90 >> tables I have moved away from partitioning. But it's something I have to >> reconsider at least for the high traffic tables. >> >> What indexes exist? Are the updates to indexed columns? >> >> Each table has 4 indices. The updates are to the indexed columns. >> Here is the schema of a table: http://pastebin.com/ffu0dUjS All tables >> have this same schema except that some tables don't have a port column and >> so will have one less index. >> >> Which day's tuples are deleted every night? Is it the day a week ago, or >>> a month ago, or something else? >> >> Tables1, Tables2: Anything older than 1 month is deleted. >> Tables3, Tables4: Anything older than 3 months is deleted. >> >> If only 50,000 get inserted daily and 1.2M get deleted, this table will >>> soon be empty! I think you said daily when you meant hourly somewhere in >>> there. >> >> Sorry that was wrong. >> For Tables3 it should have been 50000 rows (not 1.2M) getting deleted >> daily. >> And for Tables4 it should have been 2000 rows (not 48000) getting deleted >> daily. >> >> Why do you have a 4 minute timeout? That seems counter-productive. >> >> Oh, Is it less or more? >> >> You delete a bunch of tuples every night, so of course a vacuum full >>> after that is going to return a lot of space. But that space is probably >>> just going to be needed again the next day. If you don't do the vacuum >>> full, does the *peak* space keep increasing, or does it stabilize? >> >> I haven't tested to see if the space keeps on increasing. >> >> I did pgstattupe() on one of the tables: >> >> managed_target_stats=# select * from pgstattuple('xyz'); >> >> table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count >> | dead_tuple_len | dead_tuple_percent | free_space | free_percent >> >> >> ------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+-------------- >> >> 5642526720 | 18880283 | 4042874387 | 71.65 | >> 122037 | 21495560 | 0.38 | 1239598044 | 21.97 >> >> This is one of Tables1 tables and this is after running for a week or so >> with the default autovacuum settings. The dead_tuple_percent look good. But >> the free_percent looks high. Is this normal? Also when I enabled autovacuum >> logs, I saw the autovacuums triggering very rarely. May be that's the >> reason for too much free space? I am going to try with the following >> settings: >> >> log_autovacuum_min_duration = 0 >> >> autovacuum_vacuum_scale_factor = 0 >> >> autovacuum_vacuum_threshold = 40000 >> >> autovacuum_vacuum_cost_delay = 10ms >> >> autovacuum_max_workers = 5 >> >> autovacuum_analyze_scale_factor = 0 >> >> autovacuum_analyze_threshold = 40000 >> >> Do these settings look ok? I am setting autovacuum_vacuum_scale_factor >> and autovacuum_analyze_threshold to 0 and the thresholds to a constant >> 40000. My reasoning is updates happen in a burst every 5 minutes and the >> upper bound of updates is 50000 every 5 minutes. So I have just harcoded a >> threshold of 40000. Also I am leaving autovacuum_naptime at the default of >> 1 min because updates anyways happen only every 5 minutes at the max. So I >> don't see a point in running autovacuum more frequently than 1min. Correct >> me if my settings look wrong. >> >> Thanks, >> >> Elan. >> >> >> On Wed, Apr 30, 2014 at 8:36 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: >> >>> On Tue, Apr 29, 2014 at 4:59 PM, Elanchezhiyan Elango < >>> elanela...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I need help on deciding my vacuuming strategy. I need to know if I ever >>>> need to do 'vacuum full' for my tables. >>>> >>>> Tables1: Following is the query patterns on 4 high traffic table in my >>>> database: >>>> 1. Every 5 minutes about 50000 rows in the table are updated. And for a >>>> given clock hour the same 50000 rows are updated again and again. >>>> 2. When a new hour begins, another 50000 rows get inserted and they get >>>> updated every 5 minutes. >>>> 3. Every night a days worth of old stats are deleted. So this would be >>>> 24 * 50000 = 1.2M records that get deleted every night. >>>> >>> >>> What indexes exist? Are the updates to indexed columns? Which day's >>> tuples are deleted every night? Is it the day a week ago, or a month ago, >>> or something else? >>> >>> >>> >>>> Tables3: Another 4 tables have the following pattern: >>>> 1. Every 1 hour 50000 rows get updated. For a given day the same 50000 >>>> rows are updated again and again. >>>> 2. When a new day begins, another 50000 rows get inserted and they get >>>> updated every hour. >>>> 3. Every night 1.2M records get deleted. >>>> >>> >>> If only 50,000 get inserted daily and 1.2M get deleted, this table will >>> soon be empty! I think you said daily when you meant hourly somewhere in >>> there. >>> >>> >>>> >>>> Tables4: Another 39 tables have the following pattern: >>>> 1. Every 1 hour 2000 rows gets updated. For a given day the same 2000 >>>> rows are updated again and again. >>>> 2. When a new day begins, another 2000 rows get inserted and they get >>>> updated every hour. >>>> 3. Every night 48000 rows get deleted. >>>> >>>> With the above query pattern with intensive updates and deletes, I need >>>> to do some aggressive vacuuming. >>>> >>>> Current strategy:I am running with default autovacuum settings >>>> (postgres 9.1.9) and I tried doing a 'vacuum full' for the 8 high traffic >>>> tables (Tables1 and Tables3) every night. But after a point, the 'vacuum >>>> full's started timing out (with 4min timeout) every night. >>>> >>> >>> Why do you have a 4 minute timeout? That seems counter-productive. >>> >>> >>> >>>> I think this is because the table is growing bigger (~5GB) and doing a >>>> vacuum full every night is probably not feasible. >>>> >>> >>> It is probably not necessary, but it certainly seems feasible. 4 min * >>> 8 tables = 32 minutes. Call it one hour, since the 4 minute timeout has >>> started not being enough. Is the 5GB for the table plus indexes, or just >>> the table itself? >>> >>> >>>> >>>> Going with the default autovacuum settings and not doing 'vacuum full' >>>> at all is also not enough for my usecase. Whenever vacuum full succeeded >>>> every night, it did seem to reclaim a considerable amount of space. >>>> >>> >>> You delete a bunch of tuples every night, so of course a vacuum full >>> after that is going to return a lot of space. But that space is probably >>> just going to be needed again the next day. If you don't do the vacuum >>> full, does the *peak* space keep increasing, or does it stabilize? >>> >>> Cheers, >>> >>> Jeff >>> >> >> >