Re: [GENERAL] Integrity on large sites
Tom Lane wrote: > Dave Page <[EMAIL PROTECTED]> writes: >> I can't imagine Flickr or Slashdot ... > > I'm pretty sure I remember reading that Slashdot had to put enormous > amounts of cacheing in front of their DB to keep it from falling over > on a regular basis. Yes, slashdot and flickr both use memcached: http://lists.danga.com/pipermail/memcached/2005-November/001726.html "...we use both memcached and squid at flickr. ..." http://lists.danga.com/pipermail/memcached/2003-July/17.html "...Slashdot's been using memcached to serve its comment text" ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Vacuum DB in Postgres Vs similar concept in other RDBMS
Tom Lane wrote: > > Exactly. VACUUM sucks (ahem) in all ways but one: it pushes the > maintenance costs associated with MVCC out of the foreground query code > paths and into an asynchronous cleanup task. AFAIK we are the only DBMS > that does it that way. Personally I believe it's a fundamentally > superior approach --- because when you are under peak load you can defer > the cleanup work --- but you do need to pay attention to make sure that > the async cleanup isn't postponed too long. We're still fooling around > with autovacuum and related tuning issues to make it work painlessly... > Should this paragraph be added to the FAQ here? http://www.postgresql.org/docs/faqs.FAQ.html ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Faster data type for one-length values
Alvaro Herrera wrote: > André Volpato wrote: > >> The ammount of space saved seems pretty clear to me. > > Yeah, zero most of the time due to alignment. So trading off more I/O for less CPU? I wonder if for any I/O bound database servers it might be worth packing tightly rather than aligning indexes on one-byte data. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Linux distro
Chris Browne wrote: > > The server does not need the overhead of having *any* of the "X > desktop" things running; it doesn't even need an X server. > > You don't need X running on the server in order use those "enterprise > management" tools; indeed, in a "lights out" environment, that server > hasn't even got a graphics card, which means that an X server *can't* > be running on it. Well, sure it can. Nothing says a X server has to write directly to a graphics card or anything. It could write to some frame buffer in memory and allow access to it through VNC, for example. http://www.cl.cam.ac.uk/research/dtg/attarchive/vnc/xvnc.html But of course I agree it makes no sense for a database server to also be running a GUI server; and ideally not even wasting electricity and space in the box for a graphics chip or monitor connector. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] tsearch2: plainto_tsquery() with OR?
Tom Lane wrote: > Oleg Bartunov <[EMAIL PROTECTED]> writes: >> On Wed, 8 Aug 2007, cluster wrote: >>> Does anyone know where I can request an OR-version of plainto_tsquery()? > >> plainto_tsquery expects plain text, use to_tsquery for boolean operators. > > Are either of these definitions really right? If I type "foo bar baz" > into Google, for instance, it seems to produce some sort of weighted > result, neither a strict AND nor a strict OR. Google didn't get where > they are by misjudging what the simplest search behavior should be like. For what it's worth, Google states [1] Automatic "and" queries By default, Google only returns pages that include all of your search terms. There is no need to include "and" between terms. Keep in mind that the order in which the terms are typed will affect the search results. To restrict a search further, just include more terms. For example, to plan a vacation to Hawaii, simply type vacation hawaii. and also describes "OPERATOR EXAMPLE...vacation hawaii" as "FINDS PAGES CONTAINING...the words vacation and Hawaii". If I'm not mistaken, it sounds the same as what tsearch describes. [1] http://www.google.com/intl/en/help/basics.html#and [2] http://www.google.com/intl/en/help/cheatsheet.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Transactional DDL
Scott Marlowe wrote: > On 8/14/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote: >> Hi, >> I read a few lines about SP compilation in postgres >> >> http://searchoracle.techtarget.com/originalContent/0,289142,sid41_gci1179016,00.html >> >> 1. stored procedure compilation is transactional. >> "You can recompile a stored procedure on a live system, and only >> transactions starting after that compilation will see the changes," he said. >> "Transactions in process can complete with the old version. Oracle just >> blocks on the busy procedure." Really? When I tried it [1] - changing a function definition during the middle of a long-running-query that used the function gave me the surprising result that some rows were processed using the old definition of the function and some with the new one. The explanation from Tom [2] was that there was some good reason function lookups used SnapshotNow. Ron [1] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00163.php I have a long query something like select slow_function(col) from large_table; and half way through the query, in a separate connection, I CREATE OR REPLACE slow_function I was surprised to see that some of the rows in my select were processed by the old definition and some by the new. [2] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00179.php ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Interpreting statistics collector output
Decibel! wrote: > On Aug 15, 2007, at 2:11 PM, Gregory Stark wrote: >> "Decibel!" <[EMAIL PROTECTED]> writes: >>> On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote: On Aug 15, 2007, at 11:52 AM, Decibel! wrote: > I can't really think of a case where a seqscan wouldn't return all the > rows in the table... that's what it's meant to do. >> >> LIMIT > > Ok, you got me. :P But normally you wouldn't do a LIMIT without some > kind of an ORDER BY, which would mean scanning the whole table. Seems "where exists" does it too, no? test=# explain analyze select 1 where exists (select * from bigtbl); QUERY PLAN -- Result (cost=9681.61..9681.62 rows=1 width=0) (actual time=0.033..0.034 rows=1 loops=1) One-Time Filter: $0 InitPlan -> Seq Scan on bigtbl (cost=0.00..9681.61 rows=140461 width=443) (actual time=0.027..0.027 rows=1 loops=1) Total runtime: 0.177 ms (5 rows) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] language interface in postgresql
David Fetter wrote: >> Dollar-quoting is a cute technical solution to that, but you can't >> deny that it's simpler if you just restrict the function language to >> be SQL-ish so that CREATE FUNCTION can parse it without any >> interesting quoting rules. So sayeth Oracle and the SQL standards >> committee, anyway. > > I think the aforementioned entities got it wrong where we got it right :) > Though Oracle says[1] 'Java-based stored procedures are a standard (ANSI SQL-1999 SQLJ-Part-1 a.k.a. "SQL routines using Java") database-independent alternative to proprietary procedural extensions to SQL'; so I guess they did both think of some flexibility in that regard, no? [1] http://www.oracle.com/technology/sample_code/tech/java/jsp/samples/jwcache/Abstract.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Enterprise Wide Deployment
Scott Marlowe wrote: > On 8/14/07, john_sm <[EMAIL PROTECTED]> wrote: >> Hey guys, for an enterprise wide deployment, what will you suggest and why >> among - Red Hat Linux, Suse Linux and Ubuntu Linux, also, do you think, we >> can negotiate the support pricing down? > > It's more about your skill set and familiarity than whether any one of > those will work well enough. One more consideration - consider using one of the distros supported by your hardware vendor for your model server. There's nothing more frustrating than having hardware tech support point fingers at a software vendor and software tech support point fingers at the hardware vendor. HP supports Debian, Red Hat, and Novell, at least on servers. I'm not sure what they'd do if you call them about Ubuntu or CentOS. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Autovacuum not vacuuming pg_largeobject
Denis Gasparin wrote: >> Yeah, you're wrong. The difference is that plain vacuum does not try >> very hard to reduce the length of a table file --- it just frees up >> space within the file for reuse. vacuum full will actually move things >> from the end of the file to free space nearer the head of the file, >> so that it can shorten the file. >> >> What I suspect the above observations really prove is you don't have >> max_fsm_pages set high enough, and so pg_largeobject was bloating because >> the free space was being forgotten instead of reused. >> >> > I tried to issue the vacuum command on one of my database and i got the > following log messages: > > LOG: max_fsm_relations(1000) equals the number of relations checked > HINT: You have at least 1000 relations. Consider increasing the > configuration parameter > > I suspect I must increase max_fsm_relations. > The value of max_fsm_pages is 2 (the default value). > I suspect I must change this to a higher value... but how high should > this value be? First increase max_fsm_relations to more than the number of tables,etc in your system. Then, after it's running a while, do the same vacuum command and see if it gives you a hint to increase that parameter too. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] an other provokative question??
Dann Corbit wrote: > All of the database systems > that I know of that use this column-oriented scheme are in-memory > database systems. I don't know if Mr. Stonebraker's is also. KDB+ (http://kx.com/) is column-oriented and has both on-disk and in-memory capabilities http://kx.com/faq/#6 . It's around since 1998 and both column and row oriented databases are still around so I think it'd be more fair to say both are mature technologies. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] an other provokative question??
Greg Sabino Mullane wrote: >> ...in favor of renaming the database "Horizontica". > > ...should definitely be "HorizonticaSQL" Surely that should be capitalized "HorizonticASQL", no. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] select random order by random
Chris Browne wrote: > If I replicate your query, with extra columns, AND NAMES, I get the following: > > [EMAIL PROTECTED]:5433=# select random() as r1, random() as r2, random() as > r3 from generate_series(1,10) order by random(); > r1 | r2 |r3 > ++--- > 0.0288224648684263 | 0.904462072532624 | 0.27792159980163 > 0.144174488261342 | 0.406729203648865 | 0.452183415647596 > ... > It is indeed somewhat curious that the query parser chose to interpret > that the "order by random()" was referring to column #1. And even more curiously, IMHO, even specifying column names isn't enough. Note that this: li=# select * from (select (random()*10)::int as a, (random()*10)::int as b from generate_series(1,10) order by a) as x order by b; a | b ---+ 0 | 8 1 | 10 3 | 4 4 | 8 5 | 1 5 | 9 6 | 4 6 | 5 8 | 4 9 | 0 (10 rows) is sorted by "a" even though the outermost "order by" clause explicitly said to order by "b". Seems like it's a known odd behavior ... http://archives.postgresql.org/pgsql-general/2006-11/msg01523.php http://archives.postgresql.org/pgsql-general/2006-11/msg01539.php http://archives.postgresql.org/pgsql-general/2006-11/msg01544.php ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Primary Key
Joshua D. Drake wrote: > On Mon, 26 Nov 2007 10:28:03 -0800 (PST) > Richard Broersma Jr <[EMAIL PROTECTED]> wrote: >> --- On Mon, 11/26/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote: >>> In "theory" the item that would be a natural key >>> in this instance is the VIN. And you then need to deal with cars that have mismatched Body VIN numbers and Engine VIN numbers? It's not uncommon on older cars or crash damaged cars. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] top posting
Gregory Stark wrote: > We're not goldfish, we can remember the topic of discussion for at least a few > hours. So can Goldfish. Apparently they have a 3-month+ memory. http://nootropics.com/intelligence/smartfish.html http://en.wikipedia.org/wiki/MythBusters_(season_1)#Goldfish_Memory With a memory like that, perhaps a goldfish should replace the elephant for the mascot, or did the mysql guys already take that one. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Your opinion about PostgreSQL?
Cindy Parker wrote: > ...choice of Oracle or PostgreSQL for the back-end... > Can you help me compare PostgreSQL to SQL Server and/or Oracle? Do > you know of any websites or blogs that discuss these issues? ... > I did look at http://sql-info.de/postgresql/postgres-gotchas.html, an > excellent page. Any others like this that you could recommend would > be helpful. I think this is one of the more useful database comparison pages. http://troels.arvin.dk/db/rdbms/ It compares all the major players (Oracle, MS, My, PostgreSQL) not just against each other but against the SQL standards as well. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] count(*) and bad design was: Experiences with extensibility
Chris Browne wrote: > [EMAIL PROTECTED] (Zoltan Boszormenyi) writes: >> SELECT COUNT(*) > [Waving hands for a moment] Would what Chris describes below be a good candidate for a pgfoundry project that has functions that'll create the triggers for you? (yeah, I might be volunteering, but would undoubtedly need help) Then when people ask it again and again the response could be "install http://pgfoundry.org/fast_count_star"; rather than "go read the mailing list archives and roll your own - but remember to worry about deadlock and contention on the table containing the counts". > What I would do *instead* would be for each INSERT to add a tuple with > a count of 1, and for each DELETE to add a tuple with count of -1, and > then to periodically have a single process walk through to summarize > the table. There may be a further optimization to be had by doing a > per-statement trigger that counts the number of INSERTs/DELETEs done, > so that inserting 30 tuples (in the table being tracked) leads to > adding a single tuple with count of 30 in the summary table. > > That changes the tradeoffs, again... > > - Since each INSERT/DELETE is simply doing an INSERT into the summary >table, the ongoing activity is *never* blocking anything > > - You get the count by requesting > SELECT SUM(rec_cnt) as record_count from rec_count where tablename = > 'foo'; > > - Note that the query is MVCC-consistent with the table! > > - Once in a while, you'll want to run a single procedure that, for >each table, deletes all the existing records, and replaces them >with a single one consisting of the sum of the individual values. > > - You can re-sync a table by running the query: > begin; > delete from record_count where tablename = 'foo'; > insert into record_count(tablename, rec_cnt) select 'foo', (select > count(*) from foo); > commit; ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Postgres and MySQL Rosetta stone??
Scott Marlowe wrote: >> In particular, MySQl seems to have richer string functions to parse >> out sub strings and als trim a string for automatic table insertion >> from long multifield strings. > > Have you read the postgresql manual on string functions? Seriously, > it's easily a match for MySQL in that arena, and beats it hands down > in a few. And if you need more - like extracting Noun Clauses from sentences you can do that with PostgreSQL and Pl/Perl using natural language processing libraries that are callable from perl. I'd be interested to know how MySQL would handle that kind of string processing function. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Understanding PostgreSQL Storage Engines
Pavel Stehule wrote: > 2010/10/8 Carlos Mennens : >> I know that MySQL uses MyISAM storage engine by default... what >> storage engine does PostgreSQL use by default ... > > PostgreSQL supports and uses just only one storage engine - PostgreSQL. That said, ISTM one of Postgres's bigger strengths commercially seems to be that vendors can reasonably easily plug in different storage engines. Isn't the biggest SQL database in the world basically postgres using a non-default different storage engine[note 1 below]? Heck, companies have even made FPGA/hardware-accelerated storage engines for postgres[2]. Bigger IT companies than Oracle have sold PostgreSQL using different storage engines[3]. Couldn't one almost say that one of the big differences between MySQL and Postgres is that MySQL only offers a couple storage engines, while Postgres has many vendors offering engines? [ 1/2 :-) ] Ron [1] http://www.computerworld.com/s/article/9087918/Size_matters_Yahoo_claims_2_petabyte_database_is_world_s_biggest_busiest "Yahoo brought the database in-house and continued to enhance it, including tighter data compression, more parallel data processing and more optimized queries. The top layer remains PostgreSQL" [2] http://www.dbms2.com/2007/09/27/the-netezza-developer-network/ "My understanding is that they started with PostgreSQL and then rewrote the back-end to embed in the FPGA. Query processing on a SPU is split between the general purpose CPU and the FPGA, with the latter mostly responsible for restricting rows and projecting columns." [3] http://www.fujitsu.com/downloads/MAG/vol40-1/paper15.pdf "Fujitsu loaded the storage management mechanism of Symfoware Server into PostgreSQL. " -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why facebook used mysql ?
Lincoln Yeoh wrote: > What's more important to such companies is the ability to scale over > multiple machines. That question - how much work it is to administer thousands of database servers - seems to have been largely missing from this conversation. Apparently back in 2008, Facebook had 1800 MySQL servers with 2 DBAs.[1] I wonder how that compares with large-scale Postgres deployments. Ron [1] http://perspectives.mvdirona.com/2008/04/22/1800MySQLServersWithTwoDBAs.aspx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Survey on backing up unlogged tables: help us with PostgreSQL development!
Glen Parker wrote: > As was already mentioned, application logs. Unlogged tables would be > perfect for that, provided they don't go *poof* every now and then for > no good reason. Nobody's going to be too heart broken if a handful of > log records go missing, or get garbled, after a server crash or power > outage. Delete 'em all after every restart though, and that's a problem. How often are you doing unintentional restarts? I'd guess for many people it's "whenever I had so many backend crashes that I get motivated to check if I'm running the latest minor release". And if it's an intentional restart - surely you could archive your application logs before doing the restart, no? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Survey on backing up unlogged tables: help us with PostgreSQL development!
Josh Berkus wrote: >> With the current patches, the data survives a restart just fine. > > Per -hackers, that's not guarenteed. "Not guaranteed" is fine. What people are asking for is "often survives". AFAIK we don't truncate the log file created by the log_filename GUC on every unclean crash and every clean shutdown. Should we? :-) Why not? For people who intend to use these tables to log application data, they'd have the exact same reasons for not wanting them truncated when they don't need to be. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres DOD Certification Common Criteria Level
Eric McDonald wrote: > Greetings All: > > Does anyone here have any insight on to what EAL level Postgres is at > for DOD/Military installations? I see that there's an SE-Linux > fortified version on the Wiki, but no certifications are listed in the > contents. > > Any direction to certifications, STIG, or otherwise would be greatly > appreciated-- Well, there's an (ancient) 8.1.5 which NTT got certified at EAL1 back in 07. You can go here: http://www.commoncriteriaportal.org/products/ and expand "Databases" to see it. It seems like there are some proprietary forks on the list as well, at much higher levels (EAL4+); but I guess these forks have diverged quite a bit. I guess I'd be somewhat surprised to see the community version on the list, since Wikipedia claims that getting such certifications cost millions even back in the 90's. http://en.wikipedia.org/wiki/Evaluation_Assurance_Level -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] arrays of floating point numbers / linear algebra operations into the DB
Webb Sprague wrote: > On Feb 1, 2008 2:31 AM, Enrico Sirola <[EMAIL PROTECTED]> wrote: >> I'd like to perform linear algebra operations on float4/8 arrays... > > If there were a coherently designed, simple, and fast LAPACK/ MATLAB > style library and set of datatypes for matrices and vectors in > Postgres, I think that would be a HUGE plus for the project! I'd also be very excited about this project. Especially if some GIST or similar index could efficiently search for vectors "close" to other vectors. I assume something like "within a n-dimensional bounding box" would be possible with GIST I'd be eager to help, test, debug, etc; but probably aren't qualified to take the lead on such a project. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] arrays of floating point numbers / linear algebra operations into the DB
Webb Sprague wrote: > On Feb 1, 2008 12:19 PM, Ron Mayer <[EMAIL PROTECTED]> wrote: >> Webb Sprague wrote: >>> On Feb 1, 2008 2:31 AM, Enrico Sirola <[EMAIL PROTECTED]> wrote: >>>> ...linear algebra ... >>> ... matrices and vectors . >> ...Especially if some GIST or similar index could efficiently search >> for vectors "close" to other vectors... > > Hmm. If I get some more interest on this list (I need just one LAPACK > / BLAS hacker...), I will apply for a pgFoundry project and appoint > myself head of the peanut gallery... I think you should start one. I'd be happy to help. I'm rather proficient in C; somewhat literate about postgres' GIST stuff (I think a couple of my bugfix patches were accepted in postgis); and deal with a big database doing lots of similarity-based searches (a 6'2" guy with light brown hair being similar to a 6'1" guy with dark blond hair) - and am experimenting with modeling some of the data as vectors in postgres. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] arrays of floating point numbers / linear algebra operations into the DB
Ted Byers wrote: > --- Webb Sprague <[EMAIL PROTECTED]> wrote: >>> ...linear algebra ... >> ... matrices and vectors . > ...Especially if some GIST or similar index >> could efficiently search > for vectors "close" to other vectors... > > I see a potential problem here, in terms of how one > defines "close" or similitude. I think, though, > practical answers can be found in examples of applying > quantitative methods in some subdisciplines of > biology. Even if the best GIST can give is selecting vectors constrained by a n-dimensional bounding box (in much the same way it does for postgis) it can help a lot. Then your application can select everything in a conservatively large box, and use whatever it's favorite metric is to narrow the data further. > Someone pointed to the potential utility of pl/R. I > would be interested at least in learning about your > assessment of the two (postgis and pl/r. I think they'd be complimentary. IMHO if a native postgresql datatype could allow indexes to narrow the amount of data that needs to be processed; it'd be great to do the rest of the work using R (though we're perhaps foolishly using something else in our application). > If you're looking at similarity, and some practicality > in the USE of quantitative procedures, you may want to > look into the biogeography and numerical taxonomy > literature, and to a lesser extent quantitative plant > ecology. Indeed. Though the current literature I wade through is crime analysis. Ideally our software tries to match witness descriptions of "black SUV" as substantially similar to "dark green 4runner" - especially when seen at night; and understand that a "Glock 19" is a "9mm handgun" and that a "9mm handgun" might be but isn't necessarily a "Glock 19". Also - two person records with a tattoo of a cross might contribute a little similarity -- but two records with tattoos of Darth Maul / Maori face art contribute much more to the similarity scores because they're so much more distinctive. And of course there are entire companies focused on similarity metrics for fingerprints, DNA, and names (Bill is arguably more similar to William than to Bell). Any magic indexes to help such queries would be very cool; but so far we do most of it in our application logic. > A good linear algebra library would be useful, but > there are a lot of nonlinear analyses that would be of > interest; and there are nonparametric, yet > quantitative approaches that are of considerable > interest in assessing similarity. True, many things don't map cleanly to linear algebra; but they would be quite useful. > If I can manage the time, I hope to start a project > where I can store description data for specimens of > plants and animals, use analyses including but not > limited to ordination, clustering, discriminant > functions, cannonical correlation, to create a > structure for comparing them, and for identifying new > specimens, or at a minimum, if the specimen is truly > something unknown, learn what known specimens or > groups thereof it is most similar to, and how it is > different. Very cool. Sounds like a somewhat similar issue to mine. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] [pgsql-advocacy] PostgreSQL Certification
Josh Berkus wrote: Id really prefer my company be certified by the community rather than by a company, despite the full respect I have in SRA's engagement in PostgreSQL and that we all know their contributions. What would it mean for a company to be certified? I'd hope it'd mean that I can have some degree of confidence hiring that organization for Postgresql support. No? It seems to have very similar benefits as certifying individuals. Microsoft seems to have something like that for their partners in their "Database Management competency" https://partner.microsoft.com/global/40012911 ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Dump format for long term archiving.
If one wanted to dump some postgres databases for long term archival storage (maybe decades), what's the recommended dump format? Is the tar or plain text preferred, or is there some other approach (xml? csv?) I should be looking at instead? Or should we just leave these in some postgres database and keep upgrading it every few years? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] postgre vs MySQL
Greg Smith wrote: On Fri, 14 Mar 2008, Andrej Ricnik-Bay wrote: A silly question in this context: If we know of a company that does use PostgreSQL but doesn't list it anywhere ... can we take the liberty to publicise this somewhere anyway? I notice Oracle (and sleepycat before them) had a lot of fun pointing out when Microsoft uses BDB. http://www.oracle.com/technology/oramag/oracle/07-jan/o17opensource.html You'll find Oracle Berkeley DB "under the hood" in everything from Motorola cell phones, Microsoft/Groove's collaboration suite and it seems unlikely Microsoft gave them their blessings. Bad idea. There are companies who consider being listed as a user of a product a sort of recommendation of that technology, and accordingly Other reasons a company might get offended by this: * They might consider it a trade secret and a competitive advantage over competitors; and internally enjoy giggling when they see their competitors sign deals with expensive databases. * They might have a close business partnership with Microsoft or Oracle that could be strained if they support other databases. I suspect my employer would not like it announced for both reasons. they will get really annoyed...asked to be removed from the list of those using PostgreSQL. ... PostgreSQL inside, it's best not to publish the results unless you like to collect cease & desist letters. While I agree companies are likely to get annoyed - just like fast food companies do when you say how much trans-fats their products contain; I'm rather curious what such a cease&desist letter would say. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)
Zdeněk Kotala wrote: 1) What type of names do you prefer? I'd prefer a "pg" program that took as arguments the command. So you'd have "pg createdb" instead of "pg_createdb". There are many precedents. "cvs update", "git pull" "apt-get install". Anyone else like this approach? Of the choices, though, I prefer (d). 2) How often do you use these tools? --- a) every day (e.g. in my cron) b) one per week c) one time d) never About monthly. I guess (b) is closest. 3) What name of initdb do you prefer? -- -- a) initdb b) pg_initdb c) pg_init d) pg_ctl -d init (replace initdb with pg_ctl new functionality) e) What is initdb? My start/stop script does it automatically. pg initdb Of the choices, though, I guess (d). 4) How do you perform VACUUM? - a) vacuumdb - shell command b) VACUUM - SQL command c) autovacuum d) What is vacuum? (c) on most databases (b) on certain databases -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] IBM investing in EnterpriseDB
Clodoaldo wrote: ...IBM is investing...What does it mean for Postgresql? One cool thing it means is that there are now *two* companies (thanks again Fujitsu) bigger than Oracle backing (to some extent) Postgres. And now one company bigger than Microsoft. Yeah, this doesn't affect the community much. But it sure comes in useful when your CFO calls you into a meeting and says "Hey, I just had lunch with our Microsoft rep and he asked why we're running some unsupported freeware database." Your CFO wouldn't want to run your company on a database - like Oracle 10i and MySQL and SQLServer - that are only backed by little (under $50B revenue) guys, would he? :-) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)
Tom Lane wrote: "Leif B. Kristensen" <[EMAIL PROTECTED]> writes: On Wednesday 26. March 2008, Ron Mayer wrote: I'd prefer a "pg" program that took as arguments the command. So you'd have "pg createdb" instead of "pg_createdb". I like this too. It'd be considerably more work than the currently proposed patch, though, since we'd have to meld the currently separate programs into one executable. If it'd help make it happen, I could work on a patch as a strawman. One fairly serious objection is that doing so would eliminate the current distinction between client-side and server-side applications, at least if we wanted to fold both sets into one "pg" executable. So a client-only install would be carrying some baggage in the form of code that's useless if the server isn't local. If we are OK with restricting the scope of the "pg" program to client-side functionality, then there's no problem. Could it detect if the server side components aren't installed and give a "server components aren't installed" error message if they aren't available?This could probably be handled reasonably portably if the "pg" program called a separate executable for the server functionality behind the scenes. From a user point of view, I think the single executable's nice. From an installer point of view, I think keeping them separate is nice. Seems both would be possible this way, tho. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)
Tom Lane wrote: "Leif B. Kristensen" <[EMAIL PROTECTED]> writes: On Wednesday 26. March 2008, Ron Mayer wrote: ...a "pg" program that took as arguments the command. So you'd have "pg createdb" instead of "pg_createdb". I'll second that. ... I like this too. Though I guess we might need to find a different name if we want to go down that path. "man pg" tells me "browse pagewise through text files". IMHO "postgres createdb" is still better than the whole bunch of separate commands. Or even "pgSQL createdb" just so people know it's not a pre-SQL postgres database :-) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Schema design question
Ben wrote: I'm working on a project which requires me to keep track of objects, each of which can have an arbitrary number of attributes. Although there will be many attributes that an object can have,... Anyway, this seems like a common problem without a perfect solution, and I'm sure people must have hindsight opinions on how they solved it. Your thoughts? If each attribute can only occur once for any object, and if your queries are based more on exact matches of the values rather than range queries, you might want to look into the hstore module: http://www.postgresql.org/docs/current/static/hstore.html If attributes can occur more than once, you might even want to serialize the whole object as some text format (yaml, xml, etc) instead. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to recovery data from folder data installation?
Magnus Hagander wrote: Craig Ringer wrote: with a version of PostgreSQL with the same minor version as the one you were using on the server, eg if you were using 8.1.4 you should get the latest PostgreSQL in the 8.1 series (NOT 8.2 or 8.3) to try to read the data. What you mean here is of course that you should be using the same *major* version, but the latest available *minor* version. Other than that, it's correct - and the version numbers in the example is correct :-) If we want to stop people from being confused about this terminology, we would need to make our definitions of these terms more visible on the main web site. Many places talk about the "latest release" - but unfortunately with differing definitions. For example, "latest release" is defined as 8.3 in the english language FAQ; defined as 8.3.1 and 8.2.7, etc on the english language home page; defined as 8.2.3 on the traditional chinese FAQ, etc. Some places (the english FAQ) talk about "major releases" happening every year and "minor releases" happening every few months, but without defining what that means to the numbering. Perhaps there should be a FAQ that says that our numbering system defines the digits as "marketing"."major"."minor". It's no surprise to me that people don't use those terms consistently. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Probably been asked a hundred times before.
Lincoln Yeoh wrote: At 10:30 PM 6/24/2008, David Siebert wrote: Which disto is best for running a Postgres server? Just to add one more slightly different philosophy. For servers I manage, I run the most conservative and slow changing distros that only update security releases (Debian Stable, RHEL are good choices; no doubt Solaris would be too; Ubuntu updates too frequently for my tastes). For the components less core to our business (ssh, munin, etc) we trust the distro provider to provide security updates and to do the very minimum of other changes that might have compatibility issues. For the components that are more core to our business, though, we get the source from the projects themselves (like postgresql.org) and compile from source. This gives us the advantages of being totally in control of when updates occur, and of having developers be able to attach debuggers if need be. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INTERVAL SECOND limited to 59 seconds?
Finally got around to looking at this thread. Looks like the original questions from the thread got resolved, but I found this behaviour surprising: regression=# select interval '1' day to second; interval -- @ 1 hour (1 row) Should this be 1 second? If so I can send a patch. regression=# select version(); version --- PostgreSQL 8.4beta2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3), 32-bit (1 row) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INTERVAL SECOND limited to 59 seconds?
Tom Lane wrote: > Ron Mayer writes: >> Looks like the original questions from the thread >> got resolved, but I found this behaviour surprising: > >> regression=# select interval '1' day to second; >> interval >> -- >> @ 1 hour >> (1 row) > >> Should this be 1 second? > > That is a bit odd, especially seeing that eg. '1' hour to second > comes out as 1 second. What's making it do that? What from a design point of view? Seems like it's a side effect of the logic that makes: select interval '1 2'; know that the 2 means hours rather than seconds. Code-wise, it seems because around line 2906 in DecodeInterval: switch (range) ... case INTERVAL_MASK(DAY) | INTERVAL_MASK(HOUR) | INTERVAL_MASK(MINUTE) | INTERVAL_MASK(SECOND): type=DTK_HOUR; But if I naively change that by making it DTK_SECOND, I'd break "select interval '1 2' day to second;". I guess I'd need to tweak it to say: if it follows a days filed it means hours; but by itself it means seconds? There's a bit of other odd stuff around there. It seems CVS head accepts "select interval '1 2' hour;" but not "select interval '1 2' hour to minute;" regression=# select interval '1 2' hour; interval 1 day 02:00:00 (1 row) and I would have guessed that either both should succeed or both should fail. And if both succeed I wouldn't have expected 1 day 2 hours.. I'd still be happy to send a patch, but am still trying to figure out what the desired behavior is. My current impression: What's the desired behavior for each of these: select interval '1' day to second; --- should it be 1 second to be consistent with "select interval 1;"? --- or an error as Sebastien argued in a different part of the thread? select interval '1 2' hour; --- should be an error as "select interval '1 2' hour to minute" is? --- should be "1 day 2 hours" as cvs head treats "select interval '1 day 2 hours' hour to minute;"? --- should be 2 hours? select interval '1 2' hour to minute; --- should be an error as "select interval '1 2' hour to minute" is? --- should be "1 day 2 hours" as cvs head treats "select interval '1 day 2 hours' hour to minute;"? --- should be 2 hours? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INTERVAL SECOND limited to 59 seconds?
Tom Lane wrote: > I wrote: >> I'm inclined to say that these two cases are out of line with what >> the rest of the code does and we should change them. > ... > Now, all three of these cases throw "invalid input syntax" in 8.3, > so this is not a regression from released behavior. The question > is does anyone think that these syntaxes should be valid? They're > not legal per spec, for sure, and they seem pretty ambiguous to me. Seems to do a sane thing for all sane inputs I threw at it. It still accepts one odd input that 8.3 rejected: regression=# select interval '1 1' hour; Perhaps the additional patch below fixes that? *** *** 3022,3028 DecodeInterval(char **field, int *ftype, int nf, int range, tm->tm_hour += val; AdjustFractSeconds(fval, tm, fsec, SECS_PER_HOUR); tmask = DTK_M(HOUR); ! type = DTK_DAY; /* set for next field */ break; case DTK_DAY: --- 3022,3029 tm->tm_hour += val; AdjustFractSeconds(fval, tm, fsec, SECS_PER_HOUR); tmask = DTK_M(HOUR); ! if (range == (INTERVAL_MASK(DAY) | INTERVAL_MASK(HOUR))) ! type = DTK_DAY; /* set for next field */ break; case DTK_DAY: It also gives different answers than 8.3 for "select interval '1 1:' hour" but I guess that's intended, right? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INTERVAL SECOND limited to 59 seconds?
Tom Lane wrote: > Ron Mayer writes: >> regression=# select interval '1 1' hour; > > Hmm, not sure about that one. We decided a week or two back that we > don't want the thing discarding higher-order field values, and this > seems pretty close to that. As the code is set up (plus my patch) > I think it's the case that only the rightmost field specification > of the interval qualifier makes any difference for parsing the value; > the leftmost field doesn't really affect what we think the constant > means. That seems like a nice simple consistency property ... Sounds good. I'm happy with it then. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] generic modelling of data models; enforcing constraints dynamically...
Sam Mason wrote: > It all depends on the problem domain of course, but this seems to work > OK for us! I really want to hack Samba around so that the users can > view the files directly from inside the database, but I'm not sure how > good an idea this really. "hack Samba"? Wouldn't it be easier to use one of the database-as-a filesystem FUSE bindings and run stock samba over that? The perl Fuse::DBI module's example sounds pretty similar to the system you described where he "file" seems to be a column in a table. http://www.rot13.org/~dpavlin/fuse_dbi.html If that doesn't suit your needs there are more elaborate ones(libferris) that seem more complex and more flexible, and simpler ones (dumbofs) that seem to be meant more as example code you could hack for your purposes http://lwn.net/Articles/306860/ http://yiannnos.com/dumbofs And then you could use unmodified samba out-of-the-box exporting that to whatever the heck speaks SMB/CIFS these days. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] generic modelling of data models; enforcing constraints dynamically...
Drifting off topic so I'm no longer ccing the lists. Sam Mason wrote: > >> The perl Fuse::DBI module's example sounds pretty similar to the >> system you described where he "file" seems to be a column in a table. >> http://www.rot13.org/~dpavlin/fuse_dbi.html > > FUSE looks pretty easy to get going and I think I'd want more control > over how files were presented than this gives so I'd probably end up > rolling my own code. Thanks for pointing out that FUSE though, not sure I FUSE really more the framework that wraps around your code. There are applications using fuse that expose gmail as a filesystem. Here's a simple example that uses FUSE to expose a perl HASH and a few hello-world-like perl functions. http://cpansearch.perl.org/src/NOSEYNICK/Fuse-Simple-1.00/README > why I'd not thought of it before. I'll probably still never get around > to it, but maybe I will! It's actually easy enough that I wouldn't be surprised if you try it, and get it working just for fun even if noone uses it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Absolute value of intervals
On Mon, Nov 02, 2009 at 10:52:40AM +, Jasen Betts wrote: > what's the absolute value of '1month -30 days'::interval Curious what a use case for taking the absolute value of such mixed intervals might be. I could imagine such intervals being used for stuff like "XXX is due in Y months but needs to be submitted Z days earlier". But I can't really imagine worrying about if it's positive or negative until it's evaluated at a particular date (i.e. reporting "XXX has to be submitted in N days" or "XXX was supposed to be submitted N days ago"). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql 8.3 statistic stadistinct value dead wrong even with default_statistics_target=1000
Tom Lane wrote: Maxim Boguk <[EMAIL PROTECTED]> writes: [ ndistinct estimates way off ] Estimating the number of distinct values in a distribution with a long tail is just a really hard problem :-( If I have a table where I know it has this property, is there any way I can tell autovacuum's ANALYZE to go ahead and scan the whole table?ISTM a vacuum_cost_delay throttled full table scan might be less harmful then a bad ndistinct guess. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL TPC-H test result?
Tom Lane wrote: interval '1' year. ...is SQL spec syntax, but it's not fully implemented in Postgres... Or someone could try to make it work, but given that no one has taken the slightest interest since Tom Lockhart left the project, I wouldn't hold my breath waiting for that. I have interest. For 5 years I've been maintaining a patch for a client that allows the input of ISO-8601 intervals (like 'P1YT1M') rather than the nonstandard shorthand ('1Y1M') that postgresql supports[1]. I'd be interested in working on this. Especially if supporting SQL standard interval syntax could improve the chances of getting my ISO-8601-interval-syntax replacing nonstandard-postgres-shorthand-intervals patch accepted again, I'd be quite happy work on it. Tom in 2003 said my code looked cleaner than the current code[2], and the patch was accepted[3] for a while before being rejected - I believe because Peter said he'd like to see the SQL standard intervals first. I see it's still a TODO, though. the grammar supports it but the info doesn't get propagated to interval_in, and interval_in wouldn't know what to do even if it did have the information that there was a YEAR qualifier after the literal. Any hints on how best to propagate the needed info from the grammar? Or should it be obvious to me from reading the code? [1] http://archives.postgresql.org/pgsql-patches/2003-09/msg00119.php [2] http://archives.postgresql.org/pgsql-patches/2003-09/msg00121.php [3] http://archives.postgresql.org/pgsql-patches/2003-12/msg00253.php Ron Mayer (formerly [EMAIL PROTECTED] who posted those ISO-8601 interval patches) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL TPC-H test result?
Ron Mayer wrote: Tom Lane wrote: Or someone could try to make it work, but given that no one has taken the slightest interest since Tom Lockhart left the project, I wouldn't hold my breath waiting for that. I have interest. For 5 years I've been maintaining a patch for a client Doh. Now that I catch up on emails I see Tom has a patch in a different thread. I'll follow up there... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Are there plans to add data compression feature to postgresql?
You might want to try using a file system (ZFS, NTFS) that does compression, depending on what you're trying to compress. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Are there plans to add data compression feature to postgresql?
Grant Allen wrote: ...warehouse...DB2...IBM is seeing typical storage savings in the 40-60% range Sounds about the same as what compressing file systems claim: http://opensolaris.org/os/community/zfs/whatis/ "ZFS provides built-in compression. In addition to reducing space usage by 2-3x, compression also reduces the amount of I/O by 2-3x. For this reason, enabling compression actually makes some workloads go faster. I do note that Netezza got a lot of PR around their compression release; claiming it doubled performance. Wonder if they added that at the file system or higher in the DB. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Are there plans to add data compression feature to postgresql?
Chris Browne wrote: There's a way that compressed filesystems might *help* with a risk factor, here... By reducing the number of disk drives required to hold the data, you may be reducing the risk of enough of them failing to invalidate the RAID array. And one more way. If neither your database nor filesystem do checksums on blocks (seems the compressing filesystems mostly do checksums, tho), a one bit error may go undetected corrupting your data without you knowing it. With a filesystem compression, that one bit error is likely to grow into something big enough to be detected immediately. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres mail list traffic over time
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian <[EMAIL PROTECTED]> writes: I also was confused by its flatness. I am finding the email traffic almost impossible to continue tracking, so something different is happening, but it seems it is not volume-related. Yes, my perception also is that it's getting harder and harder to keep up with the list traffic; so something is happening that a simple volume count doesn't capture. If measured in "bytes of the gzipped mbox" it looks like there's a *huge* increase of volume on Hackers in the past 3 months - well over twice the historical levels; and maybe 4X 2002-2006. Graphs of this metric can be seen here: http://0ape.com/postgres_mailinglist_size/ In some ways I think compressed mbox sizes are a more fair way of measuring the bandwidth for these lists since it (correctly) counts a large gzipped path as requiring more mental effort than people top-posting brief messages on top of old threads. (Data from commands like HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.2008-09.gz | grep Content-Length ) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres mail list traffic over time
Joshua D. Drake wrote: On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: Bruce Momjian wrote: Tom Lane wrote: ... harder to keep up with the list traffic; so something is happening that a simple volume count doesn't capture. If measured in "bytes of the gzipped mbox" it ... Its because we eliminated the -patches mailing list. That's part of it. I've added -patches to the graph at http://0ape.com/postgres_mailinglist_size/ as well as a graph of hackers+patches combined; and it still looks like hackers+patches is quite high in the past 3 months. With hackers+patches it looks like 2002-08 was the biggest month; but the past 3 months still look roughly twice late 2007's numbers. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres mail list traffic over time
Tom Lane wrote: Ron Mayer <[EMAIL PROTECTED]> writes: Joshua D. Drake wrote: Its because we eliminated the -patches mailing list. That's part of it. I've added -patches to the graph at http://0ape.com/postgres_mailinglist_size/ as well as a graph of hackers+patches combined; and it still looks like hackers+patches is quite high in the past 3 months. One of the reasons we got rid of -patches was the frequency of cross-posting to both -hackers and -patches. Are you double-counting cross-posted messages? For the combined graph I just summed the output of: HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.2008-09.gz | grep Content-Length HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-patches.2008-09.gz | grep Content-Length I didn't look to see if the downloadable mboxes had duplicate messages. If people want the raw data, here's the script I used to get it. #!/usr/bin/env ruby %W{rubygems hpricot open-uri gruff}.each{|l| require l} def chart(url) h = Hpricot.parse(open(url){|f| f.read}) mboxes = (h / "//a").map{|x| x.attributes['href']}. select{|x| x=~/\.gz/} mboxes.sort.each{|x| y = `HEAD #{url}/#{x}` =~ /Content-Length: (\d+)/ && $1 puts "#{x} #{y}" } end patches = chart('http://archives.postgresql.org/pgsql-patches') general = chart('http://archives.postgresql.org/pgsql-general') hackers = chart('http://archives.postgresql.org/pgsql-hackers') Perhaps some of the extra burden on the experienced hackers is a larger volume of newer people trying to contribute that are needing more handholding (and thus more re-posted updated patches, etc)? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Interval Format
Zagato wrote: I have som SQL that in 8.0.3 do: # SELECT '32 hours'::INTERVAL; interval - @ 1 day 8 hours And in 8.3.5 do: seg_veh2=# SELECT '@ 32 hours'::INTERVAL; interval @ 32 hours Why i unable to get my old style of interval, i really need to see the day part im using datestyle german in both im try with "SET IntervalStyle TO..." but i get this error: Even IntervalStyle won't help you here; since recent releases consider the values "1 day 8 hours" as distinct from "32 hours" (because they are at certain times of the year in some timezones where days are 25 hours, etc). So.. how can i recover my "days" part... If you tell it what time of year you are thinking of this trick can work: dish=# set datestyle to german; SET dish=# select '2001-01-01'::timestamp + (interval '32 hours') - '2001-01-01'::timestamp; ?column? - @ 1 day 8 hours (1 row) Any help it appreciate... thanks :D -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Very large tables
Grzegorz Jaśkiewicz wrote: true, if you don't want to search on values too much ,or at all - use float[]. But otherwise, keep stuff in a tables as such. It might be humongous in size, but at the end of the day - prime thing when designing a db is speed of queries. If he's worried about speed, he might look into porting contrib/intarray to arrays of floats. I've had good luck improving performance by converting big tables of ints into intarray with indexes. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Excessive (and slow) fsync() within single transaction
Scott Marlowe wrote: > Actually, it's usually the drives that lie about fsync, especially > consumer grade (and some server grade) SATA / PATA drives are known > for this. I'm still looking for any evidence of any drive that lies. Is there actually a drive which claims to support the IDE "FLUSH_CACHE_EXT" feature (IDE command 0xEA) that doesn't?!? You can see if your drive claims to - most this century do - in Linux with "hdparm -I /dev/sdX" which should show FLUSH_CACHE_EXT if the disk supports this extension. And if your disk is using a last-century IDE spec that predated that feature, I think your kernel probably told you in the startup messages that your drive is not so much lying, but obsolete and missing a feature. Software on the other hand sucks at sending these commands. Ext3 will only send the right IDE commands if the inode was touched[1]. VirtualBox doesn't send them correctly[2]. I hear LVM and MD raid can lose those events too. Since so many layers don't send the right IDE commands to flush, userspace utilities exist to send those commands[4]. I sometimes wonder if postgres should add something like that. But I still haven't found any drive that lies -- and only a couple who's firmware openly admits not to support the feature. [1] http://www.mail-archive.com/linux-ker...@vger.kernel.org/msg272253.html [2] http://openquery.com/blog/host-flush [3] http://forum.soft32.com/linux/flush-disk-write-cache-userspace-ftopict338477.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Possible causes for database corruption and solutions
Craig Ringer wrote: > On 16/12/2009 9:07 AM, Scott Marlowe wrote: >> I'd also recommend moving off of OSX as you're using a minority OS as >> far as databases are concerned, and you won't have a very large >> community to help out when things do go wrong > > It sounds like PostgreSQL is being used as a DB bundled with an app - > not quite embedded, but as close as Pg gets. ... > For what its worth, there are clearly a fair few Mac OS X users of Pg > out there...the things they ask about don't suggest to me that > Mac OS X is a particularly untrustworthy platform for running Pg on. Not to mention that Apple seems to use postgres bundled in some of their high-end software: http://support.apple.com/kb/TS1648 http://www.apple.com/finalcutserver/ http://www.devworld.apple.com/appleapplications/ardsql.html If postgres had any significant trouble running on OSX, I'm confidant Apple would love to hear about it and do something about it very quickly. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Justifying a PG over MySQL approach to a project
Lincoln Yeoh wrote: > Ten or so years ago MySQL was better than Postgres95, and it would have > been easy to justify using MySQL over Postgres95 (which was really slow > and had a fair number of bugs). But Postgresql is much better than MySQL > now. That's just my opinion of course. Really?!? MySQL development started in '94; and their first internal release was May 95.[1] At that time Postgres's SQL language support was new, but didn't the underlying database already have a half decade of history that surely was more mature than MySQL at the time? I thought the main justification for MySQL back then is that they had better Win95 support (and a quality control philosophy that more matched the old pre-NT windows that favored time-to-market over correctness). [1] http://en.wikipedia.org/wiki/MySQL#cite_note-21 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Justifying a PG over MySQL approach to a project
Gauthier, Dave wrote: > The arguments against PG are not technical. A few more points that I didn't see in this thread yet that might help answer the non-technical questions: * There seem to be more commercial vendors providing support for Postgres than MySQL - because most mysql support came from that one company. http://www.postgresql.org/support/professional_support * There are bigger companies supporting Postgres than mysql. And yes, that'll still be true even if Oracle supports MySQL. http://postgresql.fastware.com/ * There are a number of extremely scalable commercial solutions based on postgres forks (greenplum, enterprisedb, aster, whatever yahoo uses, etc). These run many of the largest databases in the world. If you expect your app to grow to that scale; it might make your migration easier. * There are specialty commercial companies that support specific postgres features very well - such as Refractions specialized http://www.refractions.net/ which provide great postgis support. * There are enough large companies that depend entirely on each of the databases that make either one a save choice from that point of view (Skype). And the way Apple and Cisco use it for a number of their programs (google "cisco postgresql" or "apple final cut postgreesql" for links) are other nice datapoints of companies most managers would have heard of. > Dear Santa, All I want for Christmas is to be able to keep my DB. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] postgres vs mysql conventional wisdom
On this first day of the month, I thought it might be interesting to re-visit the conventional wisdom about postgres vs mysql. Do these seem like fair observations? Storage engines - Advantage Postgres for having far more available. Postgre has such a wide range of storage engines to choose from -- ranging from whatever proprietary backend Yahoo uses, to Fujitsu's proprietary storage engine; to PowerGres Plus's proprietary storage engine; to Netezza's hardware accelerated storage engine. In contrast, MySQL only has a few to choose from. Broad Developer base - Advantage MySQL for having a more distributed team. Postgres developers seem to be concentrated in a handful of companies. After various acquisitions, MySQL developers are scattered to the four winds. Large company support - Advantage Postgres for having larger companies support it. Fujitsu ($46 billion 2009 revenue; 185,000 employees) much bigger than that little company that bought whomever bought MySQL ($23 billion revenue; 73000 employees). Any more? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Pet Peeves?
Gregory Stark wrote: > I'm putting together a talk on "PostgreSQL Pet Peeves" for discussion at > FOSDEM 2009 this year. I have a pretty good idea what some them are of course, * The capitalization that makes everyone (customers, execs, etc) I introduce it to parse the name as Postgre-SQL. * Last I checked, postgres keeps my laptop's hard drive spinning all the time by doing some I/O even when "totally idle". For that matter, waking up the CPUs a few times a second too. * pgfoundry - which was once described on the mailinglists as the "kiss of death" for projects. Yes, I understand why it is what it is; but it's still a pet-peeve. I can't even put my finger on what's annoying about it; but it is. * Upgrades involving databases with extra modules like postgis. $ sh utils/postgis_restore.pl lwpostgis.sql newdb olddb.dump -E=UNICODE * Finding extensions I might want. For example: - Surely someone wrote a good count(*)-replacement-trigger before. Now where can I find one? Searching for "count" on pgfoundry doesn't help me. Searching varlena genralbits find shows me a simple one, but IIRC is lacking when it comes to concurrency. Googling just gets me lots of "read the archives" postings in the archives. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Pet Peeves?
Gregory Stark wrote: > One thing which has *not* been mentioned which i find positively shocking is > VACUUM. This was once our single biggest source of user complaints. Between > Autovacuum improvements and HOT previously and the free space map in 8.4 the > situation will be much improved. The other 2 features that made VACUUM much less painful for me was vacuum_cost_delay (8.0?) and allowing concurrent GiST indexes (8.1?). Before those features were in, VACUUM could choke even a very lightly loaded moderately large database for arbitrarily long times. I guess I'd still like some more convenient tuning of autovacuum (perhaps specifying X mbps disk I/O); but I'd say vacuum fell off my pet-peeve list around the 8.1 timeframe. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] where to divide application and database
Ivan Sergio Borgonovo wrote: > On Fri, 20 Feb 2009 06:50:22 -0800 > David Fetter wrote: >>> ... moving some of the checks >>> into the database and away from the application. >> Since a useful database has *many* applications instead of "the" >> application, I think this is an excellent move. > > I was wondering if "checks" may have an impact > on performances and if pg does some optimisation over them. Are you suggesting thee would be a positive or negative impact on performance. Moving some checks in the database should *improve* performance by giving the planner improved information.For example, unique constraints indicate when only 0-1 rows may come out of a query; and range constraints could let a database know when a partition doesn't even need to be visited. No doubt other checks (say, spellchecking a column) would have have performance costs. I'm with David Fetter's perspective of considering multiple applications that can run on top of a database. If a particular check should apply to all conceivable applications that run on a database (say, foreign key constraints) it seems to me they belong in a database. If a particular check should apply to just one application, but other applications might have reasons not to enforce such a check (say, one app might do spell checking in english; another in a different language) - that belongs in the app. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High cpu usage after many inserts
Joshua D. Drake wrote: > On Wed, 2009-02-25 at 09:21 +0900, Jordan Tomkinson wrote: >> On Wed, Feb 25, 2009 at 12:05 AM, Aidan Van Dyk >> wrote: >> * Greg Smith [090201 00:00]: >> > Shouldn't someone have ranted about RAID-5 by this point in >> the thread? >> You mean someone's actually still using RAID-5? >> ;-) >> >> What exactly is wrong with RAID5 and what should we have gone with? On top of the stuff Joshua wrote, there's also the "RAID 5 Write Hole". Quoting Wikipedia: "In the event of a system failure while there are active writes, the parity of a stripe may become inconsistent with the data. If this is not detected and repaired before a disk or block fails, data loss may ensue as incorrect parity will be used to reconstruct the missing block in that stripe. This potential vulnerability is sometimes known as the write hole. Battery-backed cache and similar techniques are commonly used to reduce the window of opportunity for this to occur." And in more detail from http://blogs.sun.com/bonwick/entry/raid_z "RAID-5 write hole... What's worse, it will do so silently -- it has no idea that it's giving you corrupt data." I sometimes wonder if postgres should refuse to start up on RAID-5 in the same way it does on VFAT or running root. :-) > RAID5 outside of RAID 0 is the worst possible RAID level to run with a > database. (of the commonly used raid level's that is). > > It is very, very slow on random writes which is what databases do. > Switch to RAID 10. > > Sincerely, > > Joshua D. Drkae > > >> -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] speaking of 8.4...
Joshua D. Drake wrote: > On Fri, 2009-02-27 at 10:19 +1300, Tim Uckun wrote: >> >> [according to some page on the web site...] >> 8.4 was scheduled to be released march 1. Do we know what the > All schedules are subject to change within the community :) >> tentative date of release is? > > When it is done of course. Perhaps that should be the official position communicated in the various places on the web site. I do notice that the Press FAQ with it's Q4 2008 guess is even more optimistic than the other page on the website people seem to be finding that implies March. http://www.postgresql.org/about/press/faq Q: When will 8.4 come out? A: Historically, PostgreSQL has released approximately every 12 months and there is no desire in the community to change from that pattern. So expect 8.4 sometime in the fourth quarter of 2008. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] speaking of 8.4...
Joshua D. Drake wrote: > On Thu, 2009-02-26 at 15:27 -0800, Ron Mayer wrote: >> Joshua D. Drake wrote: >>> On Fri, 2009-02-27 at 10:19 +1300, Tim Uckun wrote: >>>> >>>> 8.4 was scheduled to be released march 1. ... >> >> I do notice that the Press FAQ with it's Q4 2008 guess >> is even more optimistic ... > > Wow that must have been written some time ago I don't think anyone > thought we would hit that date in any recent (say last 6-8 months). > That press FAQ page is often sadly amusing. Back in Jan 2008 that same page stated that 8.3 "would" come out in July 2007: http://archives.postgresql.org/pgsql-advocacy/2008-01/msg00235.php But that's not nearly as sad as the Chinese FAQs that state that the latest version of Postgres is 8.2.1 or 8.2.3 depending on whether you prefer traditional or simplified writing styles. http://www.postgresql.org/docs/faqs.FAQ_chinese_simp.html http://www.postgresql.org/docs/faqs.FAQ_chinese_trad.html And for german speakers, their newest is 8.2.5 http://www.postgresql.org/docs/faqs.FAQ_german.html Basically for all the languages you get a different "lateset" release. Surely these FAQ entries are doing more harm than good. Can we please just update all of these to link to some page where the actual latest version is instead; and replace the forward-looking expectations with something that is less misleading? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Maximum transaction rate
Greg Smith wrote: > There are some known limitations to Linux fsync that I remain somewhat > concerned about, independantly of LVM, like "ext3 fsync() only does a > journal commit when the inode has changed" (see > http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 ). The > way files are preallocated, the PostgreSQL WAL is supposed to function > just fine even if you're using fdatasync after WAL writes, which also > wouldn't touch the journal (last time I checked fdatasync was > implemented as a full fsync on Linux). Since the new ext4 is more Indeed it does. I wonder if there should be an optional fsync mode in postgres should turn fsync() into fchmod (fd, 0644); fchmod (fd, 0664); to work around this issue. For example this program below will show one write per disk revolution if you leave the fchmod() in there, and run many times faster (i.e. lying) if you remove it. This with ext3 on a standard IDE drive with the write cache enabled, and no LVM or anything between them. == /* ** based on http://article.gmane.org/gmane.linux.file-systems/21373 ** http://thread.gmane.org/gmane.linux.kernel/646040 */ #include #include #include #include #include #include int main(int argc,char *argv[]) { if (argc<2) { printf("usage: fs \n"); exit(1); } int fd = open (argv[1], O_RDWR | O_CREAT | O_TRUNC, 0666); int i; for (i=0;i<100;i++) { char byte; pwrite (fd, &byte, 1, 0); fchmod (fd, 0644); fchmod (fd, 0664); fsync (fd); } } == -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Maximum transaction rate
Marco Colombo wrote: > Ron Mayer wrote: >> Greg Smith wrote: >>> There are some known limitations to Linux fsync that I remain somewhat >>> concerned about, independantly of LVM, like "ext3 fsync() only does a >>> journal commit when the inode has changed" (see >>> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 ) >> I wonder if there should be an optional fsync mode >> in postgres should turn fsync() into >> fchmod (fd, 0644); fchmod (fd, 0664); 'course I meant: "fchmod (fd, 0644); fchmod (fd, 0664); fsync(fd);" >> to work around this issue. > > Question is... why do you care if the journal is not flushed on fsync? > Only the file data blocks need to be, if the inode is unchanged. You don't - but ext3 fsync won't even push the file data blocks through a disk cache unless the inode was changed. The point is that ext3 only does the "write barrier" processing that issues the FLUSH CACHE (IDE) or SYNCHRONIZE CACHE (SCSI) commands on inode changes, not data changes. And with no FLUSH CACHE or SYNCHRONINZE IDE the data blocks may sit in the disks cache after the fsync() as well. PS: not sure if this is still true - last time I tested it was nov 2006. Ron - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] postgreSQL & amazon ec2 cloud
Tom Lane wrote: > Adrian Klaver writes: >> Nothing. I have created a Postgres instance on an EC2 virtual machine with >> attached EBS(Elastic Block Storage)..[...] > > ... I wonder whether you have any guarantees about database consistency > in that situation? PG has some pretty strong requirements about fsync While I agree it shouldn't be taken on faith, their documentation does take the time to point out that syncing of I/O's is pretty expensive operation (even literally - they charge extra for individual i/o operations and point out that those increase with syncs). http://aws.amazon.com/ebs/ I have a couple databases there, and based on the rather high I/O-wait times on their cheapest ($0.10/hr) instances, I'm wildly guessing that they're doing something reasonable for sync :-). Their higher priced instances supposedly have better I/O performance. > behavior etc, and I'd not want to take it on faith that a cloud > environment will meet those requirements. That said, even apart from any SLA of fsync itself, I imagine there's a concern that an entire hosted cloud might vanish for any number of reasons. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Maximum transaction rate
Marco Colombo wrote: > Yes, but we knew it already, didn't we? It's always been like > that, with IDE disks and write-back cache enabled, fsync just > waits for the disk reporting completion and disks lie about I've looked hard, and I have yet to see a disk that lies. ext3, OTOH seems to lie. IDE drives happily report whether they support write barriers or not, which you can see with the command: %hdparm -I /dev/hdf | grep FLUSH_CACHE_EXT I've tested about a dozen drives, and I've never seen one claims to support flushing that doesn't. And I haven't seen one that doesn't support it that was made less than half a decade ago. IIRC, ATA-5 specs from 2000 made supporting this mandatory. Linux kernels since 2005 or so check for this feature. It'll happily tell you which of your devices don't support it. %dmesg | grep 'disabling barriers' JBD: barrier-based sync failed on md1 - disabling barriers And for devices that do, it will happily send IDE FLUSH CACHE commands to IDE drives that support the feature. At the same time Linux kernels started sending the very similar. SCSI SYNCHRONIZE CACHE commands. > Anyway, it's the block device job to control disk caches. A > filesystem is just a client to the block device, it posts a > flush request, what happens depends on the block device code. > The FS doesn't talk to disks directly. And a write barrier is > not a flush request, is a "please do not reorder" request. > On fsync(), ext3 issues a flush request to the block device, > that's all it's expected to do. But AFAICT ext3 fsync() only tell the block device to flush disk caches if the inode was changed. Or, at least empirically if I modify a file and do fsync(fd); on ext3 it does not wait until the disk spun to where it's supposed to spin. But if I put a couple fchmod()'s right before the fsync() it does. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Are there performance advantages in storing bulky field in separate table?
Sam Mason wrote: > On Wed, Apr 08, 2009 at 04:56:35PM +0100, Ian Mayo wrote: >> One more thing: hey, did you hear? I just got some advice from Tom Lane! > > Statistically speaking; he's the person most likely to answer you by Even so, this might be the #1 advantage of Postgres over Oracle (cost being #2). Unless you're one of their ten biggest customers, I imagine it'd take quite some time to similar support from the core team's counterparts of the other big databases. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Are there performance advantages in storing bulky field in separate table?
Robert Treat wrote: > > You can be sure that discussion of this topic in this forum will soon be > visited by religious zealots, but the short answer is "nulls are bad, mmkay". > > A slightly longer answer would be that, as a general rule, attributes of your > relations that only apply to 1% of the rows are better represented as a one To fulfill your prophecy of zealotry, I've got a number of tables with columns that are mostly null that I can't think of that nice a way of refactoring. I'd love ideas to improve the design, though. One example's an address table. Most addresses have a few fields that are typically present (building number, city, state, etc). Others, as described in various government's address standards, are fields that are typically absent. For example in US addressing rules, the "Urbanization Name" line: http://www.usps.com/ncsc/addressstds/addressformats.htm MRS MARIA SUAREZ Name URB LAS GLADIOLAS Urbanization name 150 CALLE A House no. and st. name SAN JUAN PR 00926-3232City, state, and ZIP+4 Similarly sparse columns in my address tables are, titles, division/department Names and mailstop codes. (described here: http://pe.usps.gov/text/pub28/pub28c3_011.htm) While I realize I could stick in some string (empty string, or some other magic string like "urbanization name doesn't apply to this address") into a table, it sure is convenient to put nulls in those columns. I'm quite curious what you'd suggest a well-designed address table would look like without nulls. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search strategy for names
Rick Schumeyer wrote: > I want to be able to search a list of articles for title words as well > as author names I'm not sure the best strategy for the names. The > full text parser "parses" the names giving undesirable results. > > For example, > > select to_tsvector('claude Jones'); > to_tsvector > > 'jone':2 'claud':1 > > > Is there a way to tell the parser to index the words in a column without > trying to determine word roots? > Or what is the best way to index names for efficient searching? I've got a similar question; but would love a dictionary that could give me an efficient index that considers that Bill/William and Bob/Rob/Robert and Khadaffi/Qaddafi might share the same roots. Ideally it'd return exact matches first, followed by the similar terms. I kludged up some many step queries to try to do this; but wonder if this would work better as a tsearch dictionary, and wonder even more if I'm re-inventing something that's already out there. Anyone know of such a tsearch dictionary that is aware of the roots of names? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Difference between array column type and separate table
Alban Hertroys wrote: > On May 2, 2009, at 9:33 AM, Mike Christensen wrote: > >> ... >> create table Threads ( ... Tags int2[], ...); >> >> To me this seems cleaner, but I'm wondering about performance. If I >> had millions of threads, is a JOIN going to be faster? ... > > ...I don't think array values are indexable either. ... Of course they are indexable if you construct the right index http://www.postgresql.org/docs/current/static/intarray.html For certain queries, the array will probably be *much* faster than the extra tables. For example a query like this: SELECT * FROM Threads WHERE Threads.tags @@ '1&(2|3)'::query_int; that does a single indexscan will be far far faster using an extra table, where the query would look something like: select * from Threads T where id in (select threadid from threadtags where id = 1) and id in (select threadid from threadtags where id = 2 or id = 3); requiring 3 indexscans on a table with many more rows, and some potentially rather expensive joins. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INTERVAL data type and libpq - what format?
Sam Mason wrote: > You get an error because " 123 11" isn't a valid literal of an > (undecorated) INTERVAL type. Hmm. should it be? Skimming the spec makes me think it might be a valid day-time interval. Quoting the spec: ::= [ ] { | } ... ::= | ::= [ [ [ ] ] ] I can send a patch if this interpretation is right... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] A way to let Vacuum warn if FSM settings are low.
Short summary: I find this tiny (9-line) patch useful to help my clients know when FSM settings may need updating. Some of the more frequently asked questions here are in regards to FSM settings. One hint I've seen is to run "vacuum verbose;". At the end of thousands of lines of INFO and DETAIL messages vacuum verbose has 2 separate lines with some numbers to compare ("total pages needed" and "FSM size...pages") that help indicate too low fsm settings. I've gotten into the habit of always installing the following patch (below) that automatically does this comparison for me, and if max_fsm_pages is too small, it logs a warning as shown here: patched=# vacuum; WARNING: max_fsm_pages(1601) is smaller than total pages needed(2832) VACUUM I find this much nicer than the existing output ( clean=# vacuum verbose; [. thousands of lines of INFO and DETAIL messages ] INFO: free space map: 77 relations, 470 pages stored; 2832 total pages needed DETAIL: Allocated FSM size: 100 relations + 1601 pages = 19 kB shared memory. ) for many reasons: * First, because it's a warning, lots of people will notice it before their asking the FAQ again. * Second, because all the information is on a single line and actually contains the string "max_fsm_relations", it gives people a clue what to do about it. (note that vacuum verbose uses similar phrases but from the number of questions here, it must not be obvious) * Third, I don't need the 'verbose' setting. * And most importantly, our clients let us know about WARNINGs, but not about INFOs or DETAILs in their log page; so it gives us a chance to respond before their system drags to a halt. If a patch like this could get into the standard distro, that'd be awesome - just let me know what additional work is needed (I didn't look at docs or internationalization yet). If not, I'd like to post it here to patches just in case anyone else will benefit from the same thing. == % diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c postgresql-patched/src/backend/storage/freespace/freespace.c --- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 14:00:54.0 -0800 +++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-23 14:58:50.638745744 -0800 @@ -704,6 +704,15 @@ /* Convert stats to actual number of page slots needed */ needed = (sumRequests + numRels) * CHUNKPAGES; + +if (needed > MaxFSMPages) +ereport(WARNING, +(errmsg("max_fsm_pages(%d) is smaller than total pages needed(%.0f)", + MaxFSMPages, needed))); +if (numRels > MaxFSMRelations) +ereport(WARNING, +(errmsg("max_fsm_relations(%d) is smaller than the number of relations (%d)", + MaxFSMRelations, numRels))); ereport(elevel, (errmsg("free space map: %d relations, %d pages stored; %.0f total pages needed", == Thoughts? Ron ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.
On Wed, 23 Feb 2005, Tom Lane wrote: > Ron Mayer <[EMAIL PROTECTED]> writes: > > +if (needed > MaxFSMPages)... ereport(WARNING, ... > > An unconditional WARNING seems a bit strong to me for a case that is not > necessarily wrong. How about a HINT, not unlike the "checkpoints are occurring too frequently" one? > Depending on the needs of the installation, this might be a perfectly > acceptable situation --- for example if you have lots of large > read-mostly tables. OTOH, for other installations, it's nice to have some quick way for our customers to know when to call us. For a system we install somewhere and may not have access to, would this be a safe, overly-conservative test that could be put in place? Unfortunately we really don't know exactly how fast their tables will be growing, and I'm hoping some sort of monitoring like this - even if it's too conservative - can catch things before they cause problems. I basing the comparison on this older thread: http://archives.postgresql.org/pgsql-www/2004-11/msg00078.php if better comparisons can be made, I'd be interested into looking into them if someone points me in the right direction. > On the other side of the coin, the test could pass (ie no warning) in > situations where in fact MaxFSMPages is too small ... I was hoping that second message would have caught this - but I clearly didn't understand what comparison to use there either. :( Would the fixed (with == instead of > ) test on MaxFSMRelations catch most cases where it gives a false negative? > > +if (numRels > MaxFSMRelations) ... > This part is just plain dead code, since it's not possible for numRels > to exceed MaxFSMRelations. oops. Sorry. > I think it might be useful to warn when numRels == MaxFSMRelations, > since if you don't have even one spare fsmrel slot then you probably > have too few (it's unlikely you got it on the nose). But I don't know > how to produce a warning about MaxFSMPages that's worth anything. I'm not sure I understood the false-positive with many large read-mostly tables you mentioned above. Even in that case, how would I know that the limited FSM space I had was used on the tables that needed it? The database I have matches that description - 90 GB of GIS data that changes rarely (when cities close streets, etc) - and a few tables of quickly changing data (recent crimes in the cities being analyzed). I want to make sure the quickly changing tables don't get starved for fsm space from the large tables. Would this comparison insure against that risk? (or am I misunderstanding completely, and I'll drop the subject) Ron ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.
Thanks everyone for the feedback on my patch. Objections I've heard (both online and in email) included: * WARNING is too strong for possibly OK behavior * It's similar to "checkpoints occuring too frequently... consider increasing...checkpoint_segments" which is a LOG not a WARNING. * The end user can't do anything anyway; so the LOG file would be a better place. * My comparison for numRels was broken. * If we're hiting the user to do something (change settings) I should make it put a HINT in the log file. Praise I've heard included: * Even if it's too conservative, some admins want to know. * Unlike the current VACUUM VERBOSE info, all info is on one line, so automated log monitoring software can more easily catch it. * Unlike the current VACUUM VERBOSE info, this one points the user in the right direction. Would the updated patch below address most of the concerns? The output now looks like: LOG: max_fsm_pages(1601) is smaller than the actual number of page slots needed(2832) HINT: You may want to increase max_fsm_pages to be larger than 2832 and only goes in the log file (like the "checkpoints" hint). I think Tom's outstanding comment that "Depending on the installation, this might be a perfectly acceptable situation ... I don't know how toproduce a warning about MaxFSMPages that's worth anything" is the only objection left unaddressed. I guess my defense to that statement would be that I think for some installations this does provide value, so by making it a LOG instead of a WARNING are both needs met? Thanks, Ron % diff -U 10 postgresql-8.0.1/src/backend/storage/freespace/freespace.c postgresql-patched/src/backend/storage/freespace/freespace.c --- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 14:00:54.0 -0800 +++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-24 13:44:52.361669928 -0800 @@ -704,20 +704,32 @@ /* Convert stats to actual number of page slots needed */ needed = (sumRequests + numRels) * CHUNKPAGES; ereport(elevel, (errmsg("free space map: %d relations, %d pages stored; %.0f total pages needed", numRels, storedPages, needed), errdetail("Allocated FSM size: %d relations + %d pages = %.0f kB shared memory.", MaxFSMRelations, MaxFSMPages, (double) FreeSpaceShmemSize() / 1024.0))); + +if (needed > MaxFSMPages) +ereport(LOG, +(errmsg("max_fsm_pages(%d) is smaller than the actual number of page slots needed(%.0f)", + MaxFSMPages, needed), + errhint("You may want to increase max_fsm_pages to be larger than %.0f",needed))); +if (numRels == MaxFSMRelations) +ereport(LOG, +(errmsg("max_fsm_relations(%d) is equal than the number of relations vacuum checked (%d)", + MaxFSMRelations, numRels), + errhint("You probably have more than %d relations. You should increase max_fsm_relations. Pages needed for max_fsm_pages may have been underestimated as well. ",numRels))); + } /* * DumpFreeSpaceMap - dump contents of FSM into a disk file for later reload * * This is expected to be called during database shutdown, after updates to * the FSM have stopped. We lock the FreeSpaceLock but that's purely pro * forma --- if anyone else is still accessing FSM, there's a problem. */ void ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.
On Thu, 24 Feb 2005, Tom Lane wrote: > I preferred Simon's idea of not trying to produce a warning for pages > when we've detected relation overflow. Sounds good. I'll make that update. Should the relation overflow be a WARNING or a LOG? It sounds like if you have that problem it's almost certainly a problem, right? > Making it a LOG rather than WARNING does address the issue of being > too much in-your-face for an uncertain condition, though. Great. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.
On Thu, 24 Feb 2005, Ron Mayer wrote: > Should the relation overflow be a WARNING or a LOG? It sounds like > if you have that problem it's almost certainly a problem, right? And while I'm at it... what's the convention for INFOs vs LOGs? The "checkpoint...too frequent" seemed similar, and is a LOG. And do people think the HINT's I added add value or just noise? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.
On Thu, 24 Feb 2005, Tom Lane wrote: > I'd go for making them both LOG, I think. More consistent. Ok, here's another try :) With a couple more questions... 1. If I read Simon's email correctly, it implied that he wanted to see the "free space map" message for a VACUUM even when VERBOSE is turned off. I could just tweak it in PrintFreeSpaceMapStastics() as shown here... but now elevel (which depended on VACUUM VERBOSE or not) is no longer needed by PrintFreeSpaceMapStastics. 1a. Is that desired to always show this line as an INFO instead of a DEBUG2 (which it currently is when VERBOSE is not selected)? 1b. Should I tweak vacuum.c (feels cleaner) or just freespace.c (minimal changes). 2. If I read Simon's email correctly, it implied that he wanted to see these new lines when you type VACUUM. This would suggest making them INFOs. Making them INFOs would slightly contradict another goal of wanting to see them in the LOG for automated log grepping scripts to find, since that would require turning on INFOs that I think commonly aren't logged. Also making them LOGs is consistent with the checkpoint hint. I suppose they could be made NOTICEs; but that isn't consistent with either. diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c postgresql-patched/src/backend/storage/freespace/freespace.c --- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 14:00:54.0 -0800 +++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-24 14:54:36.619566040 -0800 @@ -705,12 +705,25 @@ /* Convert stats to actual number of page slots needed */ needed = (sumRequests + numRels) * CHUNKPAGES; -ereport(elevel, +ereport(INFO, (errmsg("free space map: %d relations, %d pages stored; %.0f total pages needed", numRels, storedPages, needed), errdetail("Allocated FSM size: %d relations + %d pages = %.0f kB shared memory.", MaxFSMRelations, MaxFSMPages, (double) FreeSpaceShmemSize() / 1024.0))); + +if (numRels == MaxFSMRelations) +ereport(LOG, +(errmsg("max_fsm_relations(%d) is equal than the number of relations vacuum checked (%d)", + MaxFSMRelations, numRels), + errhint("You probably have more than %d relations. You should increase max_fsm_relations. Pages needed for max_fsm_pages may have been underestimated. ",numRels))); +else +if (needed > MaxFSMPages) +ereport(LOG, +(errmsg("max_fsm_pages(%d) is smaller than the actual number of page slots needed(%.0f)", + MaxFSMPages, needed), + errhint("You may want to increase max_fsm_pages to be larger than %.0f",needed))); + } ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.
On Fri, 25 Feb 2005, Bruce Momjian wrote: > Tom Lane wrote: > > Ron Mayer <[EMAIL PROTECTED]> writes: > > > Should the relation overflow be a WARNING or a LOG? ... > > I'd go for making them both LOG, I think. More consistent. > > Can we also update this wording: > > INFO: free space map: 52 relations, 61 pages stored; 848 total pages needed > DETAIL: Allocated FSM size: 1000 relations + 2 pages = 182 kB shared > memory. > > The "pages needed" is confusing. In fact it is the total pages used or > allocated. I looked in the code and got confused. It needs clarity. Any preference? To me, "allocated" has some risk of sounding like it refers to the total free space map (memory allocated for fsm) instead of just the used ones."Allocated" is actually used for that other meaning on the next line. I guess it's confusing there too, so that line should be changed as well. How about if I go for "used" in that first line; and simply remove the word "Allocated" in the DETAIL line. So instead of: > > INFO: free space map: 52 relations, 61 pages stored; 848 total pages needed > DETAIL: Allocated FSM size: 1000 relations + 2 pages = 182 kB shared > memory. > it'll say > > INFO: free space map: 52 relations, 61 pages stored; 848 total pages used > DETAIL: FSM size: 1000 relations + 2 pages = 182 kB shared memory. > With those changes, the patch now looks like this... == % diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c postgresql-patched/src/backend/storage/freespace/freespace.c --- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 14:00:54.0 -0800 +++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-25 16:45:26.773792440 -0800 @@ -705,12 +705,25 @@ /* Convert stats to actual number of page slots needed */ needed = (sumRequests + numRels) * CHUNKPAGES; -ereport(elevel, -(errmsg("free space map: %d relations, %d pages stored; %.0f total pages needed", +ereport(INFO, +(errmsg("free space map: %d relations, %d pages stored; %.0f total pages used", numRels, storedPages, needed), - errdetail("Allocated FSM size: %d relations + %d pages = %.0f kB shared memory.", + errdetail("FSM size: %d relations + %d pages = %.0f kB shared memory.", MaxFSMRelations, MaxFSMPages, (double) FreeSpaceShmemSize() / 1024.0))); + +if (numRels == MaxFSMRelations) +ereport(LOG, +(errmsg("max_fsm_relations(%d) is equal than the number of relations vacuum checked (%d)", + MaxFSMRelations, numRels), + errhint("You probably have more than %d relations. You should increase max_fsm_relations. Pages needed for max_fsm_pages may have been underestimated. ",numRels))); +else +if (needed > MaxFSMPages) +ereport(LOG, +(errmsg("max_fsm_pages(%d) is smaller than the actual number of page slots needed(%.0f)", + MaxFSMPages, needed), + errhint("You may want to increase max_fsm_pages to be larger than %.0f",needed))); + } /* == Getting closer? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[GENERAL] multicolumn GIST index question
Did anyone get multi-column GIST indexes working using both the gist_btree and postgis modules? It seems to work fine for me on small test cases (shown at the bottom), but seems to crash my database for large ones. Any advice is welcome - including pointers to better lists to ask questions like this. Output showing a crash (on a large database) and a successful run on a tiny one follow. Ron fl=# -- sessionid is a text; the_geom is a GEOMETRY fl=# create index testidx2 on user_point_features using gist (sessionid,the_geom); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. fl=# fl=# fl=# fl=# create table test3(a text, b text, c text); CREATE TABLE fl=# select addgeometrycolumn ('','test3','the_geom','-1','POINT',2); addgeometrycolumn -- public.test3.the_geom SRID:-1 TYPE:POINT DIMS:2 geometry_column fixed:0 (1 row) fl=# create index idx_text_text on test3 using gist (a,b); CREATE INDEX fl=# create index idx_text_geom on test3 using gist (a,the_geom); CREATE INDEX fl=# ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.
On Sun, 27 Feb 2005, Simon Riggs wrote: > On Fri, 2005-02-25 at 16:48 -0800, Ron Mayer wrote: > > Getting closer? > For me, yes. [...] > The not-warnings seem a little wordy for me, but they happen when and > how I would hope for. > > So, for me, it looks like a polish of final wording and commit. Thanks for the feedback. How about I replace the grammatically poor: LOG: max_fsm_relations(%d) is equal than the number of relations vacuum checked (%d)", HINT: You probably have more than %d relations. You should increase max_fsm_relations. Pages needed for max_fsm_pages may have been underestimated. with this: LOG: max_fsm_relations(100) equals the number of relations checked HINT: You have >= 100 relations. You should increase max_fsm_relations. and replace this: LOG: max_fsm_pages(%d) is smaller than the actual number of page slots needed(%.0f)", HINT: You may want to increase max_fsm_pages to be larger than %.0f" with the slightly smaller LOG: the number of page slots needed (2832) exceeds max_fsm_pages (1601) HINT: You may want to increase max_fsm_pages to a value over 2832. These updated messages would fit on an 80-column display if the numbers aren't too big. Here's 80 characters for a quick reference. 01234567890123456789012345678901234567890123456789012345678901234567890123456789 The "pages needed...underestimate" in the first message was no longer useful anyway; since it's no longer logging fsm_pages stuff when the max_fsm_relations condition occurred anyway Ron The patch now looks like: % diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c postgresql-patched/src/backend/storage/freespace/freespace.c --- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 14:00:54.0 -0800 +++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-27 11:54:39.776546200 -0800 @@ -705,12 +705,25 @@ /* Convert stats to actual number of page slots needed */ needed = (sumRequests + numRels) * CHUNKPAGES; -ereport(elevel, -(errmsg("free space map: %d relations, %d pages stored; %.0f total pages needed", +ereport(INFO, +(errmsg("free space map: %d relations, %d pages stored; %.0f total pages used", numRels, storedPages, needed), - errdetail("Allocated FSM size: %d relations + %d pages = %.0f kB shared memory.", + errdetail("FSM size: %d relations + %d pages = %.0f kB shared memory.", MaxFSMRelations, MaxFSMPages, (double) FreeSpaceShmemSize() / 1024.0))); + +if (numRels == MaxFSMRelations) +ereport(LOG, +(errmsg("max_fsm_relations(%d) equals the number of relations checked", + MaxFSMRelations), + errhint("You have >= %d relations. You should increase max_fsm_relations.",numRels))); +else +if (needed > MaxFSMPages) +ereport(LOG, +(errmsg("the number of page slots needed (%.0f) exceeds max_fsm_pages (%d)", + needed,MaxFSMPages), + errhint("You may want to increase max_fsm_pages to a value over %.0f.",needed))); + } /* % ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] vacuum_cost_delay & VACUUM holding locks on GIST indexes
When you VACUUM a table with postgis indexes (perhaps GIST indexes in general?) it seems a lock is held on the table. Setting vacuum_cost_delay seems to make vacuum hold this lock much longer. Is this true? If so, I assume that's not desirable behavior, right? It makes autovacuum harder to use on tables that have these indexes. Any clever workarounds? Ron fli-lin1 /home/pg> while (1) while? echo " explain analyze SELECT * from lines2 WHERE the_geom && setSRID('BOX3D(-84.31043 30.44341,-84.2954 30.45372)'::BOX3D, -1 );" | psql fli fli | grep runtime while? sleep 5 while? end Total runtime: 23.355 ms Total runtime: 32.276 ms [ "vacuum verbose" starts ] Total runtime: 36.080 ms Total runtime: 28.373 ms Total runtime: 114679.281 ms [ bad but not horrible] Total runtime: 30.823 ms [...] Total runtime: 22.867 ms [ "set vacuum_cost_delay=20"] Total runtime: 22.808 ms Total runtime: 23.288 ms [ "vacuum vebose" again ] Total runtime: 23.366 ms [ dozens of lines ] Total runtime: 23.337 ms Total runtime: 764133.163 ms [ YIPES ] Total runtime: 23.722 ms fli=# select * from pg_locks; relation | database | transaction | pid | mode | granted --+--+-+---+--+- 36677268 |17230 | | 29039 | AccessShareLock | t 36677268 |17230 | | 29039 | ShareUpdateExclusiveLock | t 36677268 |17230 | | 29039 | AccessExclusiveLock | t 33620188 |17230 | | 29039 | ShareUpdateExclusiveLock | t | | 66414 | 30758 | ExclusiveLock| t 36677268 |17230 | | 30731 | AccessShareLock | f 33620188 |17230 | | 29039 | ShareUpdateExclusiveLock | t 36677269 |17230 | | 29039 | ShareUpdateExclusiveLock | t 16839 |17230 | | 30758 | AccessShareLock | t | | 66412 | 30731 | ExclusiveLock| t 33620188 |17230 | | 30731 | AccessShareLock | t | | 66372 | 29039 | ExclusiveLock| t (12 rows) fli=# fli=# select * from pg_class where oid=36677268; relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relacl --+--+-+--+---+-+---+--++---+---+-+-+-+--+---+-+--+--+-+++-++ tmp_lines2__gist | 2758256 | 0 | 100 | 783 |36677268 | 0 |34623 | 6.1128e+06 | 0 | 0 | f | f | i |1 | 0 | 0 |0 | 0 | 0 | f | f | f | f | (1 row) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] vacuum_cost_delay & VACUUM holding locks on GIST
On Mon, 28 Feb 2005, Tom Lane wrote: > > You could shorten the intervals for which the lock is held by reducing > vacuum_mem, but this might be counterproductive overall. Does this work? I just tried: setting vacuum_mem=1024 setting vacuum_cost_delay=10 ran a while loop that repeatedly executes a simple select statement that usually takes 0.03 seconds and it still gave me a single extremely slow (7 minutes long instead of 25 milliseconds) query that lasted until after the ...GIST... line was shown on vacuum verbose output. It feels like even with the minimal vacuum_mem it spent a very long time (430 seconds) wihtout releasing the lock on the GIST index for my pretty large (relpages=94371 pages) table. Or do I have something else broken? Shown below is the output of a while loop of a simple query using this index; and the \d table output. Ron = == Loop of small select()s == with vacuum_mem=1024 and vacuum_cost_delay=10 = %while (1) while? echo " explain analyze SELECT * from lines2 WHERE the_geom && setSRID('BOX3D(-84.31043 30.44341,-84.2954 30.45372)'::BOX3D, -1 );" | psql fli fli | grep runtime while? sleep 5 while? end Total runtime: 24.375 ms /* set vacuum_mem=1024 */ Total runtime: 24.303 ms Total runtime: 25.370 ms /* vacuum verbose */ Total runtime: 27.332 ms Total runtime: 26.628 ms Total runtime: 26.001 ms [many more like this] Total runtime: 27.437 ms Total runtime: 24.679 ms Total runtime: 26.628 ms Total runtime: 431265.868 ms Total runtime: 24.419 ms /* INFO: index "tmp_lines2__gist" ... */ Total runtime: 24.375 ms Total runtime: 24.303 ms Total runtime: 24.294 ms Total runtime: 24.235 ms = == \d for the table. = fli=# \d lines2; Table "tmp.lines2" Column | Type | Modifiers ---+--+--- tigerfile | integer | tlid | integer | cfcc | character varying(3) | name | text | the_geom | geometry | Indexes: "lines2__tlid" btree (tlid) "tmp_lines2__gist2" gist (the_geom) Check constraints: "enforce_geotype_the_geom" CHECK (geometrytype(the_geom) = 'LINESTRING'::text OR the_geom IS NULL) "enforce_dims_the_geom" CHECK (ndims(the_geom) = 2) "enforce_srid_the_geom" CHECK (srid(the_geom) = -1) fli=# ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low. [final?]
Well, I was really hoping something would end up in the log file. The situation is that our clients - sometimes not that computer savvy - go perhaps a year without us being involved (unless the log monitoring scripts show something abnormal; or if the system breaks). The primary motivation for tweaking this file in the first place was so that the log file would catch the situation where their database outgrows the FSM settings before it causes a problem. What about at least sending the output to a log file if VERBOSE or some GUC variable is set? Ron Bruce Momjian wrote: I have applied your patch with minor modifications. Applied version attached. I think the "pages" message: INFO: free space map: 44 relations, 28 pages stored; 704 total pages used DETAIL: FSM size: 1000 relations + 2 pages = 182 kB shared memory. should remain DEBUG2 for non-VERBOSE, and INFO for VERBOSE. The information is pretty complex and probably of little interest to a typical vacuum user. In fact, the new messages make the information even less important because problems are now flagged. I adjusted your output levels for the new messages. I realize the "checkpoint warning" is a LOG message, but it has to be because there is no active session when a checkpoint is being run. In the case of VACUUM, there is an active session so I think the messages should be sent to that session. Sending them to the logs and not to the user seems unusual because they are the ones who asked for the VACUUM. I realize they might not be able to change the server settings. These new messages: NOTICE: max_fsm_relations(1000) equals the number of relations checked HINT: You have >= 44 relations. Consider increasing the configuration parameter "max_fsm_relations". NOTICE: the number of page slots needed (704) exceeds max_fsm_pages (2) HINT: Consider increasing the configuration parameter "max_fsm_relations" to a value over 704. VACUUM should be NOTICE. NOTICE is for unusual events that are not warnings, and that fits these cases. If the administrator wants those in the logs, he can set log_min_messages to NOTICE. I also adjusted your output strings to more closely match our checkpoint warning message. Another idea would be to send the output to both the client and the logs by default. ------- Ron Mayer wrote: On Sun, 27 Feb 2005, Simon Riggs wrote: On Fri, 2005-02-25 at 16:48 -0800, Ron Mayer wrote: Getting closer? For me, yes. [...] The not-warnings seem a little wordy for me, but they happen when and how I would hope for. So, for me, it looks like a polish of final wording and commit. Thanks for the feedback. How about I replace the grammatically poor: LOG: max_fsm_relations(%d) is equal than the number of relations vacuum checked (%d)", HINT: You probably have more than %d relations. You should increase max_fsm_relations. Pages needed for max_fsm_pages may have been underestimated. with this: LOG: max_fsm_relations(100) equals the number of relations checked HINT: You have >= 100 relations. You should increase max_fsm_relations. and replace this: LOG: max_fsm_pages(%d) is smaller than the actual number of page slots needed(%.0f)", HINT: You may want to increase max_fsm_pages to be larger than %.0f" with the slightly smaller LOG: the number of page slots needed (2832) exceeds max_fsm_pages (1601) HINT: You may want to increase max_fsm_pages to a value over 2832. These updated messages would fit on an 80-column display if the numbers aren't too big. Here's 80 characters for a quick reference. 01234567890123456789012345678901234567890123456789012345678901234567890123456789 The "pages needed...underestimate" in the first message was no longer useful anyway; since it's no longer logging fsm_pages stuff when the max_fsm_relations condition occurred anyway Ron The patch now looks like: % diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c postgresql-patched/src/backend/storage/freespace/freespace.c --- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 14:00:54.0 -0800 +++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-27 11:54:39.776546200 -0800 @@ -705,12 +705,25 @@ /* Convert stats to actual number of page slots needed */ needed = (sumRequests + numRels) * CHUNKPAGES; -ereport(elevel, -(errmsg("free space map: %d relations, %d pages stored; %.0f total pages needed", +ereport(INFO, +(errmsg("free space map: %d relations, %d pages stored; %.0f total pages used", numRels, storedPages, needed), - errdetail("Allocated FSM size: %d relations + %d page
Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low. [final?]
Bruce Momjian wrote: You didn't like server_min_messages = 'notify'? I merely don't have a feeling for how much additional stuff verbose would be putting in the log files. If it's a good practice for production systems to be logging NOTIFY's I'm happy with the change. My reasoning why I thought the log file was more useful was that only an admin with access to the log files could really do anything about the message anyway. Also since the message happing occasionally is probably OK, yet if it happens a lot it's more likely worth looking into - I think the historical record of when it happened is more interesting than a one-time occurrence which is all you seen in the active session. Ron PS: I'm fine either way; and perhaps it's a good idea for me to be logging NOTIFY's anyway -- I just thought I'd explain my reasoning above. I'm sure you guys know a lot more than me what best practices would be. ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] contrib module intagg crashing the backend
If one of the contrib modules (int_array_aggregate in contrib/intagg) is crashing my backend, where's a good place to report a bug or get hints where to look further? It seems to work fine on small arrays, but crashes on large ones. The second query would have put about 500 rows in the aggregate. Thanks, Ron % psql fli fli Welcome to psql 8.0.1, the PostgreSQL interactive terminal. [...] fli=# select str_name,int_array_aggregate(tlid) from tlid_streetnames where str_name='civic center' and tigerfile=6001 group by str_name; str_name |int_array_aggregate --+--- civic center | {125030363,125030026,125030039,125050707} (1 row) fli=# select str_name,int_array_aggregate(tlid) from tlid_streetnames where str_name='main' group by str_name; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Pgpool questions
Tatsuo Ishii wrote: Is there anyway to load balance selects to more than 2 servers? ex. 1 master with 2 slaves? Run multiple instances? It's on my TODO but will not be a trivial change. Could I set up 2 pairs of pgpool-balanced servers, and use a third pgpool to load balance across the two pools? Or is that just silly. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Performance difference between ANY and IN, also array syntax
Tom Lane wrote: Bart Grantham <[EMAIL PROTECTED]> writes: # SELECT * FROM connections WHERE connectee_node_id = ANY (ARRAY[28543,28542] ); [ performance sucks ] Yeah :-(. The = ANY (array) notation is new in the last release or so, and is completely without optimization of any kind. Turn it into IN if you can. You could also use the module in contrib/intagg to treat your array as a table so you can JOIN against it like this: select * from connections join int_array_enum('{28543,28542}'::int[]) as iae(x) on (connectee_node_id = x); This seems to work very well for me. I show a real-life example with a table about the same size as Bart's (about 200K rows) below. It has the advantage of being able to do all the different join types whenever it guesses it would be appropriate to do so. I've never seen "in" clauses use some of those tricks (though it might be possible that in clauses can do that as well). It seems to always think that int_array_enum will return a 1000 row table (I wonder if there should eventually be some way to tweak that), so when joining against very large tables it happily does index scans - and when joining against smaller tables it'll do merge joins or whatever. = = = == an example going against a larger table, == the int_array_enum trick reasonably chooses nested loops and index scans = = = fli=# explain select * from lines where tlid = ANY ('{125060436,125060437}'::int[]); QUERY PLAN Seq Scan on lines (cost=0.00..26636.03 rows=254229 width=102) Filter: (tlid = ANY ('{125060436,125060437}'::integer[])) (2 rows) fli=# explain select * from lines join int_array_enum('{125060436,125060437}'::int[]) as iae(x) on (tlid=x); fli-# fli-# fli-# fli-# QUERY PLAN -- Nested Loop (cost=0.00..4228.99 rows=1000 width=106) -> Function Scan on int_array_enum iae (cost=0.00..12.50 rows=1000 width=4) -> Index Scan using rtgr_lines__tlid on lines (cost=0.00..4.20 rows=1 width=102) Index Cond: (lines.tlid = "outer".x) (4 rows) = = = == an example going against a smaller table == the int_array_enum trick reasonably chooses a merge join. = = = fli=# explain analyze select * from rt6 join int_array_enum('{125060436,125060437}'::int[]) as iae(x) on (tlid=x); fli-# fli-# fli-# fli-# QUERY PLAN\ --\ Merge Join (cost=62.33..1226.41 rows=2848 width=82) (actual time=36.669..36.669 rows=0 loops=1) Merge Cond: ("outer".tlid = "inner".x) -> Index Scan using rt6__tlid on rt6 (cost=0.00..1066.65 rows=21881 width=78) (actual time=0.020..21.982 \ rows=10544 loops=1) -> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=0.042..0.046 rows=2 loops=1) Sort Key: iae.x -> Function Scan on int_array_enum iae (cost=0.00..12.50 rows=1000 width=4) (actual time=0.020..0.0\ 23 rows=2 loops=1) Total runtime: 36.770 ms (7 rows) == ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] PRIMARY KEY on a *group* of columns imply that each column is
Marco Colombo wrote: That is, given a variable A, you can always write a boolean expression that evaluates True or False to test if A is _equal_ to None/undef/NULL (in C): And of course you can do this in SQL as well. ('a', NULL) is neither the same _nor different_ from ('a', 2). Uh, I'm not sure what you mean by "the same"; but certainly ('a',NULL) is distinct from ('a',2). The result of comparison is NULL, no matter how you're testing it: Unless you're testing with the IS DISTINCT FROM operator. fli=# select ('a',NULL) IS DISTINCT FROM ('a',2); ?column? -- t (1 row) marco=# select ('a', 2) = ('a', NULL); ?column? -- (1 row) marco=# select ('a', 2) <> ('a', NULL); ?column? -- (1 row) fli=# select ('a',NULL) IS DISTINCT FROM ('a',NULL); ?column? -- f (1 row) That's why NULLs are not allowed in primary keys... But in general I agree with most of the rest of what you said. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Array manipulation/syntax question
Bart Grantham wrote: Hello again. I had a problem a few weeks ago with using IN ( some_array ) having really rough performance. Special thanks to Ron Mayer for the suggestion of using int_array_enum(some_array) to join against. I had to upgrade to PG8 but that technique works really well. Now I have a question about array manipulation. Specifically, how can I build an array from the result of a query without resorting to array_append()? Well, the same contrib module with int_array_enum() also has aggregate called int_array_aggregate that turns a column into an array. -- my_array is an INT[] _my_array := select some_column from some_table; so this: SELECT int_array_aggregate(some_column) from some_table; might be what you want. ... in plpgsql? Well... the function from the contrib module was in C, so I guess it doesn't strictly meet what you asked, but it works very well for me. ;-) Ron Mayer PS: the stuff in /contrib/intarray may be useful too, for indexing, merging, and many other tricks on the integer arrays. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Postgres in government
Mark Steckel wrote: I have also scrounged the Internet looking for examples of Postgres being used in government, preferably in 24x7 capacities. I find googling for vendors who are known to use PostgreSQL and searching for people's resumes is a good way to find descriptions of Government projects along with contacts for references. Here's a couple that look interesting. US Navy, Northrop Grumman Use PostgreSQL in the Navy Enterprise Portal and the Fleet Numerical Meteorology and Oceanography Center(FNMOC) Portal: http://kennethbowen.com/kbresume.html " Develop J2EE application to store user profiles for the Navy Enterprise Portal and the Fleet Numerical Meteorology and Oceanography Center(FNMOC) Portal using JBoss application server and PostgreSQL database. " DOD, USGS, USDA, Army Corp of Engineers, Navy, through a company called Sanz: http://postgis.refractions.net/pipermail/postgis-users/2005-March/007399.html Sanz manages "tens of terrabyte datasets of raster and vector data for the DOD, USGS, USDA, Army Corp of Engineers, Navy, etc." using postgresql and postgis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Converting MySQL tinyint to PostgreSQL
Martijn van Oosterhout wrote: Well, you get another issue, alignment. If you squeeze your string down, the next field, if it is an int or string, will get padded to a multiple of 4 negating most of the gains. Like in C structures, there is padding to optimise access. Anecdotally I hear at least as many people say that their database is more I/O bound than CPU bound; and it seems that adding bytes for alignment is a way of reducing CPU for more disk I/O. I guess unaligned access so expensive that it makes up for the extra i/o? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Converting MySQL tinyint to PostgreSQL
Jim C. Nasby wrote: On Thu, Jul 14, 2005 at 11:29:23PM +0200, Martijn van Oosterhout wrote: On Thu, Jul 14, 2005 at 11:30:36AM -0500, Jim C. Nasby wrote: On Wed, Jul 13, 2005 at 07:52:04PM -0400, Bruce Momjian wrote: This is a good point. We have always stored data on disk that exactly matches its layout in memory. We could change that, but no one has shown it would be a win. [...] My thought was to convert as pages were read and written. That should minimize the code impact. If that were practical, even more radical I/O saving tricks might be possible beyond removing alignment bytes - like some compression algorithm. Jim C. Nasby wrote: > Or maybe as an alternative, would it be possible to determine how much > space in a given relation was being wasted due to padding? That could be > used to figure out how much IO could be saved on different transactions. Well, I do notice that if I gzip my larger tables's data files they tend to compress between 80-95% so it seems there's a fair amount of redundancy in at least some tables. Has anyone tried running postgresql on a filesystem that does compression? It seems that would be an easy way to guess at the ultimate potential I/O savings of separating memory layout from disk layout. Ron ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] How to create unique constraint on NULL columns
Andrus wrote: Then redesign this as a many to many relation. ... This means adding separate row for each department into permission table. Not really. You can of course store an Array of department IDs in that same table. That would probably cause the minimum impact on your queries too. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] GUID for postgreSQL
Dann Corbit wrote: There is a "privacy hole" from using the MAC address. (Read it in the WIKI article someone else posted). Probably, it would be better to use a one way hash of the MAC address. The chances of MAC addresses colliding (through some low-end network card vendor's sloppy manufacturing process; or through the not uncommon ability to edit your mac address) is surely much higher than the chance of big random numbers colliding. Sure, you won't hurt anything if you add the MAC address to your entropy sources when generating a GUID; but thinking a MAC address is more unique than the bits you'd get from any less human-error-prone source is almost certainly wrong. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] table clustering brings joy
Greg Stark wrote: clustering... That will only help if you're often retrieving moderately large result sets by one particular index. If you normally only retrieve one record at a time or from lots of different indexes then it probably won't really make much difference. It'll also help for columns whose values are related in some way. For example, clustering a table of addresses based on "zip code" will help lookups based on city or county or state (presumably because all the disk pages for a given city will be grouped together within the disk pages for the zip codes within the city). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Finding nearest numeric value
Richard Huxton wrote: While the other answers all do their job, and in one go too, I'd be surprised if you found anything faster than: SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1 Really? Aren't most things with ORDER BY O(n*log(n))? Or is the optimizer smart enough to find an index on myval and stop after the first one (assuming the index returned things sequentially. If not, it seems this could do things in O(n) time: select min(abs(value - CONSTANT)) from tablename followed by select * from tablename where abs(value - CONSTANT) = [result] though I'm sure someone could roll that up into a single statement. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] ctid access is slow
Tom Lane wrote: It wouldn't be hard that I can see (just build hash and btree opclasses for tid), but I'm pretty unclear on why bother. There's no use-case for cross-table joins involving ctid, since you couldn't usefully store a ctid referencing another table. The example Ilja showed was quite artificial and should not convince anyone to expend effort on this. Perhaps there are more convincing examples, but let's see one. Would it be useful for extremely static (read-only) data? The largest tables in my database are read-only for many months at a time (geospatial data which the vendor updates annually). I've occasionally wondered if storing ctids in tables that link to these tables rather than the traditional id column would help. (I never really bothered, though; since normal index scans were fast enough; and any future performance optimization will probably cache this data in memcached instead.) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings