Re: [GENERAL] How to raise index points when equal and like is used with gist ?
On 2012-10-12 01:14, Sergey Konoplev wrote: On Thu, Oct 11, 2012 at 2:23 AM, Condor wrote: explain analyze SELECT * FROM table WHERE phone LIKE '12%' AND firstname = 'OLEG' AND middlename || lastname LIKE '%KUZNICOV%IGORU%'; QUERY PLAN - Bitmap Heap Scan on abonats_tbl (cost=1638.89..1816.65 rows=1 width=601) (actual time=219.793..219.793 rows=0 loops=1) Recheck Cond: ((firstname = 'OLEG'::text) AND (phone ~~ '12%'::text)) Filter: ((middlename || lastname) ~~ '%KUZNICOV%IGORU%'::text) Rows Removed by Filter: 65 -> BitmapAnd (cost=1638.89..1638.89 rows=45 width=0) (actual time=219.197..219.197 rows=0 loops=1) -> Bitmap Index Scan on table_firstname_idx (cost=0.00..34.42 rows=1690 width=0) (actual time=0.867..0.867 rows=1732 loops=1) Index Cond: (firstname = 'OLEG'::text) -> Bitmap Index Scan on table_phonegist_idx (cost=0.00..1604.22 rows=33995 width=0) (actual time=217.639..217.639 rows=33256 loops=1) Index Cond: (phone ~~ '12%'::text) Total runtime: 220.426 ms My question is: Is there any way how to make postgresql first to search from field that is with equal I have index there and then to filter result based to other conditions first gist and then other. I think may be I should play with index points. What about dropping table_phonegist_idx index? Is it used somewhere else? Ill try this night, no isn't used elsewhere. ps. BTW how do you cope with the pg_trgm ASCII alphanumeric restriction? Transliteration? The tel field has alphanumeric values and md5 hash values sometimes of some phone. Server is setup and started in CP1251 encoding. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to raise index points when equal and like is used with gist ?
On 2012-10-12 03:27, Tom Lane wrote: Condor writes: explain analyze SELECT * FROM table WHERE phone LIKE '12%' AND firstname = 'OLEG' AND middlename || lastname LIKE '%KUZNICOV%IGORU%'; QUERY PLAN - Bitmap Heap Scan on abonats_tbl (cost=1638.89..1816.65 rows=1 width=601) (actual time=219.793..219.793 rows=0 loops=1) Recheck Cond: ((firstname = 'OLEG'::text) AND (phone ~~ '12%'::text)) Filter: ((middlename || lastname) ~~ '%KUZNICOV%IGORU%'::text) Rows Removed by Filter: 65 -> BitmapAnd (cost=1638.89..1638.89 rows=45 width=0) (actual time=219.197..219.197 rows=0 loops=1) -> Bitmap Index Scan on table_firstname_idx (cost=0.00..34.42 rows=1690 width=0) (actual time=0.867..0.867 rows=1732 loops=1) Index Cond: (firstname = 'OLEG'::text) -> Bitmap Index Scan on table_phonegist_idx (cost=0.00..1604.22 rows=33995 width=0) (actual time=217.639..217.639 rows=33256 loops=1) Index Cond: (phone ~~ '12%'::text) Total runtime: 220.426 ms You sure that server is 9.2? Because that looks like a planner bug we squelched some time ago, wherein it was way too enthusiastic about adding more indexes to a BitmapAnd. Yes, Im sure: PostgreSQL 9.2.1 on x86_64-slackware-linux-gnu, compiled by x86_64-slackware-linux-gcc (GCC) 4.7.1, 64-bit If it is 9.2, please send a self-contained test case, that is some test data (and settings, if you're using nondefault ones) that makes it do this. Hm ... strange problem I catch. When I try to reproduce the problem, with test table, I made a very little table http://pastebin.com/nEK3cRr2 When I run the same type of query results is different: Seq Scan on users (cost=0.00..1.12 rows=1 width=26) (actual time=0.014..0.016 rows=1 loops=1) Filter: ((tel ~~ '09%'::text) AND (firstname = 'GREG'::text) AND ((middlename || lastname) ~~ '%%'::text)) Rows Removed by Filter: 5 Total runtime: 0.042 ms (4 rows) Okay, may be the problem is because I use cp1251 encoding .. lets change the data values, drop table, insert cp1251 values, start vacuum and result was the same speed Total runtime: 0.052 ms the same type of scan was used: Seq Scan on users (cost=0.00..1.14 rows=1 width=132) (actual time=0.019..0.021 rows=1 loops=1) Filter: ((tel ~~ '09%'::text) AND (firstname = 'CP1251 CHARS HERE'::text) AND ((middlename || lastname) ~~ '%%'::text)) Rows Removed by Filter: 6 Total runtime: 0.052 ms Even without tel filed result and type of scan is the same (Seq Scan). Now first name is write in cyrillic and mean "GREG" (I replace it with CP1251 CHARS HERE, because some ppl did not have cyrillic encoding). When I run the same query on the same database but different table that give strange result Bitmap Heap Scan. Index field is the same like test table from pastebin, no difference. And here I must say the history of the table. That table was made on psql 7.3 version and migrate on every major upgrade of the server that require dump/restore of database if that information is valuable. Any one has ideas what is going wrong on that table ? Why the same query on two different table with the same data gives me different scan results ? Regards, C -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to raise index points when equal and like is used with gist ?
On Fri, Oct 12, 2012 at 1:20 AM, Condor wrote: > Even without tel filed result and type of scan is the same (Seq Scan). This is because your table has to few rows and it is easier to seq scan. Add more rows, eg. 100 000, then ANALYZE the table and run tests. Use random() and generate_series() to generate the data. > > Now first name is write in cyrillic and mean "GREG" (I replace it with > CP1251 CHARS HERE, because some ppl did not have cyrillic encoding). When I > run the same query on the same database but different table that give > strange result Bitmap Heap Scan. Index field is the same like test table > from pastebin, no difference. > > > And here I must say the history of the table. That table was made on psql > 7.3 version and migrate on every major upgrade of the server that require > dump/restore of database if that information is valuable. > > Any one has ideas what is going wrong on that table ? Why the same query on > two different table with the same data gives me different scan results ? > > Regards, > > C > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Limit+Offset query wrong result in Postgres 9.0.3 ?
urkpostenardr wrote: > Is this bug in Postgres ? > If yes, is it fixed in latest release ? > Second query should return 2 rows instead of 1 ? > > create table t(i int); > insert into t values(1); > insert into t values(2); > insert into t values(3); > pgdb=# select i from t order by i limit 9223372036854775806 offset 1; > select i from t order by i limit 9223372036854775806 offset 1; > i > 2 > 3 > (2 rows) > pgdb=# select i from t order by i limit 9223372036854775807 offset 1; > select i from t order by i limit 9223372036854775807 offset 1; > i > 2 > (1 row) > pgdb=# > > > My server Version is postgres (PostgreSQL) 9.0.3 That looks like a bug somewhere; at least on my 9.2.0 on Linux I get the result that you would expect. Which operating system and architecture is that? Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Limit+Offset query wrong result in Postgres 9.0.3 ?
On 12 October 2012 04:55, urkpostenardr wrote: > Hi, > > Is this bug in Postgres ? > If yes, is it fixed in latest release ? > Second query should return 2 rows instead of 1 ? > > create table t(i int); > insert into t values(1); > insert into t values(2); > insert into t values(3); > pgdb=# select i from t order by i limit 9223372036854775806 offset 1; > select i from t order by i limit 9223372036854775806 offset 1; > i > 2 > 3 > (2 rows) > pgdb=# select i from t order by i limit 9223372036854775807 offset 1; > select i from t order by i limit 9223372036854775807 offset 1; > i > 2 > (1 row) > pgdb=# You seem to have hit the end of a 32-bit signed integer and it wraps around. There's probably some internal code that modifies limit-values <1 to 1, or you wouldn't have gotten any results at all... It does seem a fairly insane number to use for limit, it's probably better to leave it out if you're going to accept that many results. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Strategies/Best Practises Handling Large Tables
Hi, I currently have a table that is growing very quickly - i.e 7 million records in 5 days. This table acts as a placeholder for statistics, and hence the records are merely inserted and never updated or deleted. Many queries are run on this table to obtain trend analysis. However, these queries are now starting to take a very long time (hours) to execute due to the size of the table. I have put indexes on this table, to no significant benefit. Some of the other strategies I have thought of: 1. Purge old data 2. Reindex 3. Partition 4. Creation of daily, monthly, yearly summary tables that contains aggregated data specific to the statistics required Does anyone know what is the best practice to handle this situation? I would appreciate knowledge sharing on the pros and cons of the above, or if there are any other strategies that I could put in place. Thanking you in advance.
Re: [GENERAL] Strategies/Best Practises Handling Large Tables
On Fri, Oct 12, 2012 at 7:44 AM, Chitra Creta wrote: > Hi, > > I currently have a table that is growing very quickly - i.e 7 million > records in 5 days. This table acts as a placeholder for statistics, and > hence the records are merely inserted and never updated or deleted. > > Many queries are run on this table to obtain trend analysis. However, these > queries are now starting to take a very long time (hours) to execute due to > the size of the table. > > I have put indexes on this table, to no significant benefit. Some of the > other strategies I have thought of: > 1. Purge old data > 2. Reindex > 3. Partition > 4. Creation of daily, monthly, yearly summary tables that contains > aggregated data specific to the statistics required > > Does anyone know what is the best practice to handle this situation? > > I would appreciate knowledge sharing on the pros and cons of the above, or > if there are any other strategies that I could put in place. Partitioning is prolly your best solution. 3 & 4 sound like variations on the same thing. Before you go that route, you should make sure that your bottleneck is really a result of the massive amount of data, and not some other problem. Are you sure that the indices you created are being used, and that you have all the indices that you need for your queries? Look at the query plan output from EXPLAIN, and/or post here if you're unsure. Reindexing shouldn't make a difference unless something is wrong with the indices that you already have in place. Purging old data is only a good solution if you do not need the data, and never will need the data. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres DB Migration from 8.3 to 9.1
On Fri, Oct 12, 2012 at 7:45 AM, Vishalakshi Navaneethakrishnan < nvishalak...@sirahu.com> wrote: > Hi Friends, > > We have our production environment database server in Postgres 8.3 > version. we have planned to upgrade to lastest version 9.1. Dump from 8.3 > and restore in Postgres 9.1 takes more than 5 hours. Any other quick > method to upgrade from 8.3 to 9.1. We need to reduce our downtime below 1 > hour. Any Possibilities..? > > Thanks in Advance. > > -- > Best Regards, > Vishalakshi.N > > Try using the -j option to speed up restore process. See http://www.postgresql.org/docs/9.1/static/app-pgrestore.html . Not sure though whether it will bring it up within your range. Amitabh
Re: [GENERAL] Strategies/Best Practises Handling Large Tables
Hi, On 13 October 2012 01:44, Chitra Creta wrote: > I currently have a table that is growing very quickly - i.e 7 million > records in 5 days. This table acts as a placeholder for statistics, and > hence the records are merely inserted and never updated or deleted. > > Many queries are run on this table to obtain trend analysis. However, these > queries are now starting to take a very long time (hours) to execute due to > the size of the table. Have you changed autovacuum settings to make it more agressive? Another options is to run analyse after loading. > > I have put indexes on this table, to no significant benefit. Some of the > other strategies I have thought of: > 1. Purge old data > 3. Partition Those two go together. Partitioning is useful if you can constrain queries to specific ranges ie. this query needs last two days. You shouldn't go over 200 - 300 partitions per table. Partition granularity should be the same as the amount of data in average query. if you run weekly queries then use weekly partitions (bi-weekly or daily partitions might work but I do not have good experience). It is easy to purge old data because you need to drop unwanted partitions (no table/index bloat). Loading is little bit tricky becuase you have load data into right partition. > 4. Creation of daily, monthly, yearly summary tables that contains > aggregated data specific to the statistics required I think this way to the hell. You start with few tables and then you add more tables until you realise that it takes longer to update them then run your queries :) You might benefit from query parallelisation, for example, pgpool-II, Stado, Postgres XC or do not use Postgres at all. For example, any column oriented NoSQL database might be good choice. -- Ondrej Ivanic (ondrej.iva...@gmail.com) (http://www.linkedin.com/in/ondrejivanic) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres DB Migration from 8.3 to 9.1
On 10/12/2012 08:05 AM, Amitabh Kant wrote: On Fri, Oct 12, 2012 at 7:45 AM, Vishalakshi Navaneethakrishnan mailto:nvishalak...@sirahu.com>> wrote: Hi Friends, We have our production environment database server in Postgres 8.3 version. we have planned to upgrade to lastest version 9.1. Dump from 8.3 and restore in Postgres 9.1 takes more than 5 hours. Any other quick method to upgrade from 8.3 to 9.1. We need to reduce our downtime below 1 hour. Any Possibilities..? Thanks in Advance. -- Best Regards, Vishalakshi.N Try using the -j option to speed up restore process. See http://www.postgresql.org/docs/9.1/static/app-pgrestore.html . Not sure though whether it will bring it up within your range. Amitabh Note that one issue with the -j option is that it requires the input be a regular file rather than a pipe so you have to wait until you have a complete dump stored on-disk somewhere before you can start the restore. This delay may offset, eliminate or overshadow any benefit from the parallel-restore speedup. Pg_upgrade does support upgrades from 8.3: http://www.postgresql.org/docs/current/static/pgupgrade.html but you will need to set up a dev-system to become familiar with the process. Depending on the nature of your data, you may be able to roll-your-own. Some of our systems have large tables of data that, once collected, remain static. If you have that type of situation you may be able to pre-migrate historical data and then have a reduced window to migrate recent/live data. Cheers, Steve
Re: [GENERAL] Strategies/Best Practises Handling Large Tables
On Sat, Oct 13, 2012 at 01:44:02AM +1100, Chitra Creta wrote: > Hi, > > I currently have a table that is growing very quickly - i.e 7 million > records in 5 days. This table acts as a placeholder for statistics, and > hence the records are merely inserted and never updated or deleted. > > Many queries are run on this table to obtain trend analysis. However, these > queries are now starting to take a very long time (hours) to execute due to > the size of the table. > > I have put indexes on this table, to no significant benefit. Some of the > other strategies I have thought of: > 1. Purge old data > 2. Reindex > 3. Partition > 4. Creation of daily, monthly, yearly summary tables that contains > aggregated data specific to the statistics required > > Does anyone know what is the best practice to handle this situation? Three and four will probably be your biggest wins. We do both. There are trade-offs for both. If you need the absolute best in response times (e.g., in a web application), summary tables are the way to go. If you're regularly querying data in a specific date range, but response times aren't as critical (e.g., daily report generated at the end of day), partitioning will also help there. Having the correct indexes is crucial, as always. Additionally, you will be able to purge old data relatively quickly and without bloat using the partitioning method. > I would appreciate knowledge sharing on the pros and cons of the above, or > if there are any other strategies that I could put in place. > > Thanking you in advance. -Ryan Kelly -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PostgreSQL and WMS
Dear all, I would like to know if is possible set up a service (WMS/WFS) for a layer stored within Postgres. I have one WEBGIS application that can't connect to PostgreSQL but if I use the data inside Postgres I can get the URL from WMS and add the maps. Best Regards, José Santos
Re: [GENERAL] Postgres DB Migration from 8.3 to 9.1
On Thu, Oct 11, 2012 at 07:38:07PM -0700, John R Pierce wrote: > On 10/11/12 7:15 PM, Vishalakshi Navaneethakrishnan wrote: > >We have our production environment database server in Postgres 8.3 > >version. we have planned to upgrade to lastest version 9.1. Dump > >from 8.3 and restore in Postgres 9.1 takes more than 5 hours. Any > >other quick method to upgrade from 8.3 to 9.1. We need to reduce > >our downtime below 1 hour. Any Possibilities..? > > > is it the dump or the restore taking the lions share of that time? > > I don't know if pg_upgrade supports 8.3, but that would be one > approach. getting it setup to work correctly can require some > tinkering, but once you have that sorted, you start with a base > backup of the 8.3 file system, and pg_upgrade 'converts' it to the > newer version. you need both runtimes setup side by side so either > can be run as pg_upgrade will need to start the old version in order > to dump its metadata catalogs prior to migrating the data files. if > you put both data directories on the same file system, it can use > hard linking to 'move' the datafiles. Upgrading with pg_upgrade from 8.3 is going to require 9.2 to be compiled with --disable-integer-datetimes. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategies/Best Practises Handling Large Tables
On 10/12/12 7:44 AM, Chitra Creta wrote: 1. Purge old data 2. Reindex 3. Partition 4. Creation of daily, monthly, yearly summary tables that contains aggregated data specific to the statistics required if most of your queries read the majority of the tables, indexing will be of little help parittioning will aid in purging old data, as you can partitions by date (for instance, by week) and drop whole partitions rather than deleting individual records. aggregate tables likely will be the biggest win for your statistics if they reduce the mount of data you need to query. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL and WMS
yes , it is . i forget how to do it . you just need to a connection information in the wms configuration file From: José Pedro Santos To: Postgres Ajuda Sent: Friday, October 12, 2012 5:46 PM Subject: [GENERAL] PostgreSQL and WMS Dear all, I would like to know if is possible set up a service (WMS/WFS) for a layerstored within Postgres. I have one WEBGIS application that can't connect to PostgreSQL but if I usethe data inside Postgres I can get the URL from WMS and add the maps. Best Regards, José Santos
Re: [GENERAL] Limit+Offset query wrong result in Postgres 9.0.3 ?
On Fri, Oct 12, 2012 at 3:33 AM, Alban Hertroys wrote: > On 12 October 2012 04:55, urkpostenardr wrote: >> Hi, >> >> Is this bug in Postgres ? >> If yes, is it fixed in latest release ? >> Second query should return 2 rows instead of 1 ? >> >> create table t(i int); >> insert into t values(1); >> insert into t values(2); >> insert into t values(3); >> pgdb=# select i from t order by i limit 9223372036854775806 offset 1; >> select i from t order by i limit 9223372036854775806 offset 1; >> i >> 2 >> 3 >> (2 rows) >> pgdb=# select i from t order by i limit 9223372036854775807 offset 1; >> select i from t order by i limit 9223372036854775807 offset 1; >> i >> 2 >> (1 row) >> pgdb=# > > You seem to have hit the end of a 32-bit signed integer and it wraps > around. There's probably some internal code that modifies limit-values > <1 to 1, or you wouldn't have gotten any results at all... > > It does seem a fairly insane number to use for limit, it's probably > better to leave it out if you're going to accept that many results. This was previously reported as bug #6139, and fixed in 89df948ec26679e09. Josh -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] tablespace_oid alias?
Hi all, I want to get the databases that correspond to a certain tablespace through a call to pg_tablespace_databases(tablespace_oid) Which would be the OID alias type for tablespace_oid? I've tried: # select pg_tablespace_databases('pg_default'::XXX); with XXΧ as any of the OID aliases mentioned in the manual's chapter 8 section on "Object Iddentifier Types", but without success. Any hints? Missing something? TIA, Thalis K. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres DB Migration from 8.3 to 9.1
On Thu, Oct 11, 2012 at 8:15 PM, Vishalakshi Navaneethakrishnan wrote: > Hi Friends, > > We have our production environment database server in Postgres 8.3 version. > we have planned to upgrade to lastest version 9.1. Dump from 8.3 and > restore in Postgres 9.1 takes more than 5 hours. Any other quick method to > upgrade from 8.3 to 9.1. We need to reduce our downtime below 1 hour. Any > Possibilities..? As downtime required approaches zero, the likelyhood of needing slony approaches 1. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] tablespace_oid alias?
Thalis Kalfigkopoulos writes: > Which would be the OID alias type for tablespace_oid? There is none. Use "select oid from pg_tablespace where spcname = 'whatever'". (In general, OID alias types only get invented for object types where the lookup rules are more complicated than that, eg where you have schema qualification or arguments to think about.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres DB Migration from 8.3 to 9.1
Hi all, While testing upgrade facility, I have installed postgres 9.2 using source package. Because it needs to be compiled with --disable-integer-datetimes. I have used this command ./configure --prefix=/opt/PostgreSQL/9.2/ --disable-integer-datetimes --without-readline After installation i found pg_upgrade binary missing in bin directoy. How can i get this ? Thanks in Advance. On Sat, Oct 13, 2012 at 3:37 AM, Scott Marlowe wrote: > On Thu, Oct 11, 2012 at 8:15 PM, Vishalakshi Navaneethakrishnan > wrote: > > Hi Friends, > > > > We have our production environment database server in Postgres 8.3 > version. > > we have planned to upgrade to lastest version 9.1. Dump from 8.3 and > > restore in Postgres 9.1 takes more than 5 hours. Any other quick method > to > > upgrade from 8.3 to 9.1. We need to reduce our downtime below 1 hour. > Any > > Possibilities..? > > As downtime required approaches zero, the likelyhood of needing slony > approaches 1. > -- Best Regards, Vishalakshi.N
Re: [GENERAL] Postgres DB Migration from 8.3 to 9.1
On 10/12/12 9:52 PM, Vishalakshi Navaneethakrishnan wrote: While testing upgrade facility, I have installed postgres 9.2 using source package. Because it needs to be compiled with --disable-integer-datetimes. I have used this command ./configure --prefix=/opt/PostgreSQL/9.2/ --disable-integer-datetimes --without-readline After installation i found pg_upgrade binary missing in bin directoy. How can i get this ? pg_upgrade is part of the optional 'contrib' packages, these have to be explicitly built. from your top level source directory, try make contrib make install ... -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help estimating database and WAL size
On 2012-10-08, Daniel Serodio (lists) wrote: > We are preparing a PostgreSQL database for production usage and we need > to estimate the storage size for this database. We're a team of > developers with low expertise on database administration, so we are > doing research, reading manuals and using our general IT knowledge to > achieve this. > > We have actual data to migrate to this database and some rough > estimations of growth. For the sake of the example, let's say we have a > estimation of growth of 50% per year. > > The point is: what's the general proper technique for doing a good size > estimation? > > We are estimating the storage usage by the following rules. Topics where > we need advice are marked with ** asterisks **. Feedback on the whole > process is more than welcome. > > 1) Estimate the size of each table > 1.1) Discover the actual size of each row. > - For fields with a fixed size (like bigint, char, etc) we used > the sizes described in the documentation > - For fields with a dynamic size (like text) we estimated the > string length and used the function select pg_column_size('expected text > here'::text) long text is subject to compression, pg_column_size doesn't seem to test compression, compression is some sort of LZ.. > - We added 4 more bytes for the OID that PostgreSQL uses internally OID is optional, IIRC PGXID is not > 1.2) Multiply the size of each row by the number of estimated rows > ** Do I need to consider any overhead here, like row or table > metadata? ** page size 8K column overhead 1 byte per not-NULL column, NULLs are free, > 2) Estimate the size of each table index > ** Don't know how to estimate this, need advice here ** IIRC ( data being indexed + 8 bytes ) / fill factor > 3) Estimate the size of the transaction log > ** We've got no idea how to estimate this, need advice ** how big are your transactions? > 4) Estimate the size of the backups (full and incremental) > ** Don't know how to estimate this, need advice here ** depends on the format you use, backups tend to compress well. > 5) Sum all the estimates for the actual minimum size no, you get estimated size. > 6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates > 1, 2 and 4 for the minimum size after 1 year > > 7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5 > and 6 for a good safety margin > > I know the rules got pretty extensive, please let me know if you need > more data or examples for a better understanding. > > We've also posted this question to > http://dba.stackexchange.com/q/25617/10166 > > Thanks in advance, > Daniel Serodio > > -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help estimating database and WAL size
On 10/08/12 1:39 PM, Daniel Serodio (lists) wrote: 3) Estimate the size of the transaction log ** We've got no idea how to estimate this, need advice ** postgres doesn't have a 'transaction log', it has the WAL (Write-Ahead Logs). These are typically 16MB each. on databases with a really heavy write load, I might bump the checkpoint_segments as high as 60, which seems to result in about 120 of them being created, 2GB total. these files get reused, unless you are archiving them to implement a continuous realtime backup system (which enables "PITR", Point in Time Recovery) -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general