[GENERAL] compat-postgresql-libs rpm bug in 64bit mode
I'm using compat-postgresql-libs-4-1PGDG.rhel5.x86_64.rpm on CentOS 5 in 64bit mode. The package did not create a libpq.so.4 link in /usr/lib64/, which caused Apache to fail. Regards, Kevin Murphy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] compat-postgresql-libs rpm bug in 64bit mode
Devrim GÜNDÜZ wrote: Hi, On Wed, 2008-08-06 at 11:19 -0400, Kevin Murphy wrote: I'm using compat-postgresql-libs-4-1PGDG.rhel5.x86_64.rpm on CentOS 5 in 64bit mode. The package did not create a libpq.so.4 link in /usr/lib64/, which caused Apache to fail. http://yum.pgsqlrpms.org/8.3/redhat/rhel-5-x86_64/repoview/compat-postgresql-libs.html We both have 32 and 64 bit packages. Please make sure that you have 64-bit one installed. Yes, it's installed. compat-postgresql-libs.x86_644-1PGDG.rhel5 installed compat-postgresql-libs.i686 4-1PGDG.rhel5 installed The package creates /usr/lib64/libpq.so.4.1, but it does not create /usr/lib64/libpq.so.4 -> /usr/lib64/libpq.so.4.1. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to use postgresql-jdbc rpm with Sun JDK
What's the easiest RPM-based way to install postgresql-jdbc on CentOS 5? I just became aware of the JPackage project, which seems appealing, but it doesn't have an up-to-date version of postgresql-jdbc. I'm using Devrim GÜNDÜZ's very nice PostgreSQL repository (pgsqlrpms.org) with yum. When yum installs the PGDG postgresql-jdbc-8.3.3 RPM on CentOS 5, it appears to want to drag in GCJ dependencies, but I want to use a Sun JDK. The JDK is pre-installed by the Rocks V cluster distribution, which is based on CentOS 5. Thanks, Kevin Murphy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to use postgresql-jdbc rpm with Sun JDK
Devrim GÜNDÜZ wrote: On Thu, 2008-08-07 at 09:57 -0400, Kevin Murphy wrote: When yum installs the PGDG postgresql-jdbc-8.3.3 RPM on CentOS 5, it appears to want to drag in GCJ dependencies, but I want to use a Sun JDK. The JDK is pre-installed by the Rocks V cluster distribution, which is based on CentOS 5. Upstream (I mean, Tom) is building -jdbc package with open source components for a long time, and I am following that, too. We were directly installing the binary jar files under /usr/share/java without compilation. So, you will need to install jar files I think: http://jdbc.postgresql.org/download.html#supported Speaking as a near-ignoramus, would a simple RPM that wraps the binary jar file make sense? I'm not sure what dependencies it should have, though: simply 'java'? My issue is that a Rocks cluster likes to have all software packaged as RPMs; compute nodes can be automatically built from scratch when required simply by dumping RPMs in a certain directory on the head node and adding the RPM name to an XML file. I guess it can't be that hard for me to custom-build this sort of RPM, although I've never built one before. -Kevin Murphy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to use postgresql-jdbc rpm with Sun JDK
Tom Lane wrote: Kevin Murphy <[EMAIL PROTECTED]> writes: Speaking as a near-ignoramus, would a simple RPM that wraps the binary jar file make sense? Sure, if you want to do it that way. We did in fact do it that way up till about 8.0. We (or at least I) moved away from it because of Red Hat's policy that source RPMs should contain only, well, source. But if you roll your own you certainly need not be bound by that ideology. After looking in more detail at what the gcj packages install, I've decided it's no big deal to just use Devrim's postgresql-jdbc-8.3.603-1PGDG.rhel5.x86_64.rpm package as is. While the java-1.4.2-gcj-compat and libgcj RPMs are needlessly installed, they don't interfere with Rocks' Java metapackage (roll), as I had feared. Thanks for the responses, Kevin Murphy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Drupal and PostgreSQL - performance issues?
Greg Smith wrote: On Tue, 14 Oct 2008, Mikkel H�gh wrote: You are targetting DBAs using servers with less than 512 MB RAM. Is PostgreSQL supposed to be used by professional DBAs on enterprise systems or is it supposed to run out of the box on my old Pentium 3? you'll discover that the Linux default for how much memory an application like PostgreSQL can allocate is 32MB. This is true even if you install the OS on a system with 128GB of RAM. One thing that might help people swallow the off-putting default "toy mode" performance of PostgreSQL would be an explanation of why PostgreSQL uses its shared memory architecture in the first place. How much of a performance or stability advantage does it confer under what database usage and hardware scenarios? How can any such claims be proven except by writing a bare-bones database server from scratch that can use multiple memory models? -Kevin Murphy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] perl-DBD-Pg package for CentOS 5?
Hi Devrim, Thanks for the awesome resource of yumpgsqlrpms.org. My life would be complete if it offered perl-DBD-Pg for CentOS 5! I'll look around for a src rpm. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] perl-DBD-Pg package for CentOS 5?
Joao Ferreira wrote: Have you considered installing directlly from CPAN ? # perl -MCPAN -e 'install DBD::Pg;' On Fri, 2008-10-31 at 09:20 -0400, Kevin Murphy wrote: My life would be complete if it offered perl-DBD-Pg for CentOS 5! Yes, but I prefer a package in this situation because the code runs on a Rocks Linux cluster, and customizations to the compute nodes are conventionally via package; when a new node is added, the suite of local RPMs is automatically post-installed, et voilà. I've worked around the lack of a package in the past, and I'll do it again. Luckily our nodes have access to the world. Now I'm fantasizing about making an RPM that just customizes CPAN/Config.pm and installs DBD::Pg from CPAN. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Question about inscrutable cast syntax
Two versions of an application differ in the way a particular column is defined. OLD: lastupdated timestamp without time zone DEFAULT ('now'::text)::timestamp(6) with time zone NOT NULL NEW: lastupdated timestamp without time zone DEFAULT now() NOT NULL I believe these are both the result of an automatic mysql-to-postgresql conversion process rather than hand-coded. For my personal edification, I'd like to understand what's going on with that first one! Note that the column type is timestamp without time zone, so I'm also curious about the 'with time zone' modifier in the first example. Thanks, Kevin Murphy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Indices types, what to use. Btree, Hash, Gin or Gist
Gregory Stark wrote: Mohamed writes: I want to match against a boolean field, that is, only true or false is possible. I am thinking Btree but not sure.. correct? No index is going to be particularly effective for boolean columns unless they're very heavily skewed. You might find it useful to build separate partial indexes on other keys for each value though. Gregory, Just for the edification of the masses, can you show an example that illustrates your point about partial indexes? -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Case-sensitive connect in psql is perplexing
I've noticed that the argument to the \c (connect) meta-command is case-sensitive. This doesn't seem to be consistent with other meta-commands or the SQL standard of case-insensitive identifiers. Would it hurt to change the behavior? Regards, Kevin Murphy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Aliasing syntax question
I've now seen the 'unnest' function defined in a few different ways with regard to aliases. Substitute the following pairs of values for X and Y in the query below, and all of them work (in PG 8.3.6, at least): X Y gg i g(i) gg(i) g.i g(i) create or replace function unnest(anyarray) returns setof anyelement as $$ select $1[X] from generate_series(array_lower($1,1),array_upper($1,1)) Y; $$ language sql; Please enlighten the unworthy! Thanks, Kevin Murphy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Mac OS X 10.3 and SYSV shared memory settings
Because I was just burned by this, I'd like to remind OS X users that OS 10.3 (Panther) may be a less desirable platform than 10.2 for running PostgreSQL and other applications that benefit from customizing SYSV shared memory settings. The problem is that in 10.3, there unbelievably seems to be no reliable way to customize the SYSV shared memory settings such that the settings are preserved across OS updates (10.3.6 to 10.3.7, e.g.)! The following reminder from the PG 8.0 manual (thanks to whomever put this in) also applies to PG 7.4.X, and it is incredibly important: "In OS X 10.3, these commands [shared memory sysctl commands] have been moved to /etc/rc and must be edited there. You'll need to reboot to make changes take effect. Note that /etc/rc is usually overwritten by OS X updates (such as 10.3.6 to 10.3.7) so you should expect to have to redo your editing after each update." If someone has a solution to this problem, please let me know. In the meantime we'll have to implement a cron-driven alert and a policy change, hope that transitioning to a new sysadmin at a later date does not screw us, and pray that 10.4 provides a solution to this problem. A simple change to /etc/rc on Apple's part seems to be all that would be required - it already executes sysctl commands from a user-defined /etc/sysctl.conf file, but the shared memory settings can only be set *once*, and /etc/rc happens to set the defaults before reading /etc/sysctl.conf. I can have a cron job remove the shared memory sysctls in /etc/rc if they reappear, but that would be a decidedly imperfect kludge. Kevin Murphy ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[GENERAL] mailing list archives not responding?
Hi all, My searches at: http://archives.postgresql.org/pgsql-general/ haven't been working for the last couple hours; the query times out. Is this my problem or a real one? -Kevin ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] LWM 2004 Readers' Choice Nomination
On Mar 28, 2005, at 1:40 PM, Joshua D. Drake wrote: I don't see PostgreSQL either. That seems very odd as we are the second widest used OSS database on Linux. Yeah, they might as well just put up a graph of advertising dollars by database instead of these poll results ;-) Kevin Murphy ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Strange interaction of union and expressions
On Apr 20, 2005, at 1:24 PM, Aaron Bingham wrote: create table a (foo varchar); insert into a (foo) values ('baz'); create table b (foo varchar); insert into b (foo) values ('woof'); select '"' || foo || '"' as foo from (select foo from a) as bar union select foo from b; No, it's doing what you asked. You mean: select '"' || foo || '"' from (select foo from a union select foo from b) as subq; Right? -Kevin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] subqueries and qualification of table names
I have a query which didn't work properly until I fully qualified columns used in a a subquery with the appropriate table names. The reason is that both tables have a column named 'chromosome' used in the subquery. In the following query, PG treats the phrase "and chromosome = chromosome" as "and genetic.chromosome = genetic.chromosome". I.e. it treats: --- # select chromosome, layer, rank, (select refsnp_id from genetic where extended_frame = True and chromosome = chromosome and gl_left = rank) from framework where name = 'D3S3610' and layer = 'GL'; --- as: --- select chromosome, layer, rank, (select refsnp_id from genetic where genetic.extended_frame = True and genetic.chromosome = genetic.chromosome and genetic.gl_left = framework.rank) from framework where name = 'D3S3610' and layer = 'GL'; --- Is that the appropriate SQL behavior? Personally I don't care; I'm just curious. Thanks, Kevin Murphy ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] MacOSX, fink, missing readline/readline.h
Teunis Peters wrote: This has to do with building postgresql-8.0.3 I'm installing on a MacOSX system (10.3) and configure's failing to find the readline installation. Installing the vanilla readline library from source now works great on 10.3. If you first install this using the standard ./configure && make && sudo make install : ftp://ftp.gnu.org/gnu/readline/readline-5.0.tar.gz then your postgresql compile should work fine. -Kevin murphy ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[GENERAL] can't drop sequence even though table is gone
I'm using PG 8.0.3 on Mac OS X 10.4.1. I've dropped some tables that had associated sequences, but I am unable to drop the sequences. I haven't noticed any errors or crashes. It will be easy to rebuild the whole database. I'm just reporting this because it seemed unusual. => \ds List of relations Schema | Name | Type | Owner +--+--+--- public | seq_SML_sources_id | sequence | fable public | seq_SML_symbols_id | sequence | fable public | seq_TaggedEntities_id| sequence | fable => drop sequence seq_SML_sources_id; ERROR: sequence "seq_sml_sources_id" does not exist => select * from pg_class where relname = 'seq_SML_sources_id'; relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relacl +--+--+--+---+-+---+--+---+---+---+-+-+-+--+---+-+--+--+-+++-++ seq_SML_sources_id | 2200 | 12237018 | 100 | 0 | 12237017 | 0 |1 | 1 | 0 | 0 | f | f | S |9 | 0 | 0 |0 |0 | 0 | f | f | f | f | Thanks, Kevin Murphy ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] What's Popular for CMS and RAD with PHP/PostgreSQL?
John DeSoi wrote: CMS and RAD tools out there? Using PHP and PostgreSQL only, what do you feel are the most popular Drupal is a very nice CMS for PHP and PostgreSQL. I'm using it for some consulting projects now and it has saved me a great deal of time. Some of the useful features include: I too like drupal and use it with postgresql, but some of the Drupal contrib module authors are still very mysql-centric and fond of writing code that breaks when using postgresql (or doesn't support it at all). Luckily, it's usually not too hard to patch yourself or find a patch. A lot of highly desirable functionality does not exist in the Drupal core but instead is provided by these contributed modules. Drupal still recommends mysql and doesn't treat postgresql and mysql equally. It would be nice if Drupal would keep track of which modules had been tested against postgresql, but they don't. If there were a pool of drupal/postgresql users willing to be testers, and module authors were aware of this, that would help also. I've been too busy to dive in and suggest/coordinate this, though. -Kevin Murphy ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] speeding up a query on a large table
I'm trying to speed up a query on a text column of a 14M-row table. Uncached query times vary between 1-20 seconds (maybe more), depending on the search term. In between time trials I've been trying to flush the disk buffer cache by selecting count(*) from a separate 4GB table, and times are pretty consistent for a given search term. If a search term hasn't been seen in a while, or my full table scan has purged memory, a query may take 20 seconds, whereas if the relevant pages are cached, it may take 8.5 milliseconds. I'd really like to avoid that 20 second turn-off for users. I'm guessing I need to put lots of RAM in this machine (currently only 2GB; max 8GB) and somehow pre-load the entire index? The production version of this database will be read-only, btw. Because this index will be key to the whole application, and because the number of possible search terms is so large, it would be nice if the entire index could somehow be preloaded into memory and encouraged to stay there. Alternatively, we could prime the disk buffers by doing searches on what we guess will be the most common terms. I wasn't paying attention to the recent thread about ram disks, but maybe this is a situation that might call for one? The rest of this message contains details about the situation. Thanks for the advice, as usual! -Kevin Murphy Sample query: explain analyze select * from tagged_genes where mention = 'bcl2'; QUERY PLAN - Index Scan using tagged_genes_mention_idx on tagged_genes (cost=0.00..327.64 rows=81 width=67) (actual time=28.694..5544.779 rows=848 loops=1) Index Cond: (mention = 'bcl2'::text) Total runtime: 5545.434 ms The index of the query column (mention) is 226,229 pages (= 1.7 GB?). There are 1.3M unique values in that column. I've run 'vacuum analyze' on the table. I also tried setting the statistics target to 1000, and it did speed up searches for some search terms. Just out of curiosity, and because I am interested in possibly using tsearch2 in the future, I created a tsvector column indexing the mention column (and added a GIST index and vacuum-analyzed the table again). tsearch2 is a lot slower, presumably because it's doing a lot more (although in this case, it doesn't return all that much more). Here is a typical result of a tsearch2 search on my data: explain analyze select * from tagged_genes where vector @@ to_tsquery('bcl2'); QUERY PLAN -- Index Scan using vector_idx on tagged_genes (cost=0.00..56092.98 rows=13983 width=67) (actual time=202.078..43122.688 rows=980 loops=1) Index Cond: (vector @@ '\'bcl2\''::tsquery) Total runtime: 43124.215 ms Blech. I'd love to use tsearch2, but it's hard to see it being useful with my puny hardware and not so puny data. I'm using PG 8.0.3 on Mac OS X 10.4.2 on a dual 2.5GHz G5 currently with 2GB RAM. The data is physically stored on an Xserve RAID array (seven 7200rpm ultra-ATA drives, RAID 5) connected via fibre channel directly to the G5. I did some tests of this array's performance for a single user. Using a large block size (256K), this array can do sequential reads at 134 MB/sec, but the performance drops to 12MB/sec for 4K sequential reads. Random 4K reads are 5MB/sec, and random 256K reads are 121MB/sec (all these according to "XBench" - not sure if it's a good benchmark). I'm using these shared memory settings: kern.sysv.shmmax=134217728 kern.sysv.shmall=32768 kern.sysv.shmmin=1 kern.sysv.shmmni=32 kern.sysv.shmseg=8 and from postgresql.conf: shared_buffers = 15000 work_mem = 32768 Everything else in postgresql.conf is default. --- Here is the table description: tagged_genes table: 13,982,464 rows 422,028 pages (although about half of that is the experimental tsvector column, though!) create table tagged_genes ( id bigint NOT NULL PRIMARY KEY, -- artificial primary key mention text, -- a gene name or description pmid bigint, -- identifies the document that the mention occurs in create_date timestamp NOT NULL, vector tsvector -- experimental tsearch2 index of mention column ); create index tg_mention_idx on tagged_genes(mention); create index tg_pmid_idxon tagged_genes(pmid); create index tg_vector_idx on tagged_genes(vector); Some trials with different search terms: mycn: Index Scan usi
[GENERAL] CLUSTER equivalent
Are the two following options equivalent? OPTION A (ordered insert): CREATE TABLE table1 (cluster_col TEXT, col2 INTEGER); CREATE INDEX idx1 ON table1(cluster_col); INSERT INTO table1 (cluster_col, col2) SELECT cluster_col, col2 FROM table1 ORDER BY cluster_col; OPTION B (unordered insert followed by CLUSTER): CREATE TABLE table1 (cluster_col TEXT, col2 INTEGER); CREATE INDEX idx1 ON table1(cluster_col); INSERT INTO table1 (cluster_col, col2) SELECT cluster_col, col2 FROM table1; CLUSTER idx1 ON table1; Thanks, Kevin Murphy P.S. On another topic, did I gather correctly from a recent thread that it would be more efficient to define the above table (if it were really only two columns) as: create table clustered_tagged_genes (integer pmid, text mention); i.e., with the integer field before the text field? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Note on scalar subquery syntax
I thought this might be helpful in the future to other duffers such as myself. The following is my big contribution to the documentation of the use of scalar subqueries ;-): You have to wrap a scalar subquery in its own parentheses even where you might think it to be unnecessary, such as when the subquery is the sole argument to a function. As an example, I wrote a function to explode, or unpack, the elements of an array onto separate rows (its return type is SETOF whatever), but it took me a while to figure out how to use it effectively in queries. You have to use it like this: RIGHT--> select * from array_explode((select array_col from table1 where col2 = 'something')); Note the "extra" set of parentheses. These are crucial: the syntax is invalid without these, as in: WRONG--> select * from array_explode(select array_col from table1 where col2 = 'something'); And no, as mentioned in many archived list messages, you can NOT do the following, which is what a lot of people (including me) seem to try first: WRONG--> select array_explode(array_col) from table1 where col2 = 'something'); (The previous command results in the error message: "set-valued function called in context that cannot accept a set"). -Kevin Murphy ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Note on scalar subquery syntax
Martijn van Oosterhout wrote: On Wed, Aug 03, 2005 at 09:40:26AM -0400, Kevin Murphy wrote: You have to wrap a scalar subquery in its own parentheses even where you might think it to be unnecessary, such as when the subquery is the sole argument to a function. It first guess I imagine it is because the syntax becomes ambiguous, expecially if you have multiple arguments to the function. Thanks, Martijn. Yes, I don't consider it to be entirely unreasonable; it was just surprising and strange-looking to me. -Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Note on scalar subquery syntax
Peter Fein wrote: Kevin Murphy wrote: As an example, I wrote a function to explode, or unpack, the elements of an array onto separate rows (its return type is SETOF whatever), but it took me a while to figure out how to use it effectively in queries. Mind posting it? I know I've had need of such I thing & IIRC others have asked as well... I'm no expert, but per Peter's request, here is a generic array-unpacking function that works in PostgreSQL 8.0. It can't be invoked if the argument doesn't have an explicit type. I.e. you would have to use it as: "select * from array_explode_generic('{apple,banana,cherry}'::text[]);" or "select * from array_explode_generic('{1,2,3}'::integer[]);". CREATE OR REPLACE FUNCTION array_explode(an_array anyarray) RETURNS SETOF anyelement AS $$ DECLARE idx integer; BEGIN FOR idx IN 1 .. ARRAY_UPPER(an_array, 1) LOOP RETURN NEXT an_array[idx]; END LOOP; RETURN; END; $$ LANGUAGE plpgsql; I would imagine that a type-specific version would be faster. For that, replace "anyarray" with, e.g. "integer[]", and "anyelement" with, e.g. "integer". -Kevin Murphy ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] How to explode an array into multiple rows
I'd like to do something like this: select array_explode(array_col) from table1 where col2 = 'something'; where array_explode returns the elements of array_col unpacked onto separate rows. I tried creating a function returning a setof type, but postgresql produces a "set-valued function called in context that cannot accept a set" error. I've seen this error in the list archives, but I'm not sure how to translate the simple cases discussed into this situation. I'm sure it's something simple, but it's been eluding me. Thanks, Kevin Murphy ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] table clustering brings joy
1 intersect-query/clustered brca1 2822 105.74 0.04 1 2 group-by-query/clustered brca1 2822 161.95 0.06 1.53 3 array-query/array brca1 2822 198.72 0.07 1.88 4 group-by-query/nonclustered brca1 2822 12113.8 4.29 114.56 1 intersect-query/clustered eps15,tax4,irs4 1 102.9 102.90 1 2 group-by-query/clustered eps15,tax4,irs4 1 112.58 112.58 1.09 3 array-query/array eps15,tax4,irs4 1 124.78 124.78 1.21 4 group-by-query/nonclustered eps15,tax4,irs4 1 1024.09 1024.09 9.95 -Kevin Murphy
Re: [GENERAL] table clustering brings joy
Greg Stark wrote: All that said clustering is indeed often quite effective. Especially if it makes an index scan efficient enough to win over sequential scans you can see some huge effects. It's most useful for tables that aren't undergoing lots of updates and don't need to be reclustered often. Which is my situation; most tables are read-only. Thanks for the comments. -Kevin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] speeding up a query on a large table
Mike Rylander wrote: On 8/17/05, Manfred Koizar <[EMAIL PROTECTED]> wrote: On Mon, 25 Jul 2005 17:50:55 -0400, Kevin Murphy <[EMAIL PROTECTED]> wrote: and because the number of possible search terms is so large, it would be nice if the entire index could somehow be preloaded into memory and encouraged to stay there. You could try to copy the relevant index file(s) to /dev/null to populate the OS cache ... That actually works fine. When I had big problems with a large GiST index I just used cat to dump it at /dev/null and the OS grabbed it. Of course, that was on linux so YMMV. Thanks, Manfred & Mike. That is a very nice solution. And just for the sake of the archive ... I can find the filename of the relevant index or table file name(s) by finding pg_class.relfilenode where pg_class.relname is the name of the entity, then doing, e.g.: sudo -u postgres find /usr/local/pgsql/data -name "somerelfilenode*". -Kevin Murphy ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] CLUSTER equivalent
I just wanted to confirm that the COPY command always stores data in the table in the order in which it appears in the import file. I.e., if the import file is sorted ahead of time, am I correct in assuming that the COPY command can have the same effect as CLUSTER'ing (or as creating a new table as an ordered select on the loaded table)? (Indexes would of course be applied after the data load.) Thanks, Kevin Murphy ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] CLUSTER equivalent
Neil Conway wrote: Kevin Murphy wrote: I just wanted to confirm that the COPY command always stores data in the table in the order in which it appears in the import file. This is not the case -- depending on the content of the FSM, the newly added rows might be distributed throughout the table. How about for a freshly created, empty table -- I should have qualified my original statement. Thanks, Kevin ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] optimizing common subqueries
My understanding is that PG does not eliminate common subqueries within complex queries. I just wanted to confirm this. I also tried wrapping subqueries in STABLE or IMMUTABLE functions returning SETOF, but from the PG 8.0.3 log, I see that the function is invoked redundantly anyway. I know that disk buffering has the effect of optimizing this sort of thing to some extent, but I was wondering if I can do better. Any other possibilities? The code below shows what I am trying to do. I created functions to make it easy for you to see where the common subqueries occur (note that one of the functions invokes the other function, which increases the redundancy.) CREATE FUNCTION pmids_by_mention(text) RETURNS SETOF integer AS $$ SELECT pmid FROM normalized_genes WHERE symbol IN ( SELECT DISTINCT symbol FROM normalized_genes WHERE mention = $1 ); $$ LANGUAGE SQL STRICT IMMUTABLE; CREATE FUNCTION common_pmids(text,text) RETURNS SETOF integer AS $$ SELECT * FROM pmids_by_mention($1) INTERSECT SELECT * FROM pmids_by_mention($2) $$ LANGUAGE SQL STRICT IMMUTABLE; SELECT PMID, COUNT(*) AS total FROM ( SELECT pmid FROM pmids_by_mention('mycn') as pmid WHERE pmid IN ( SELECT * FROM common_pmids('mycn','trka') ) UNION ALL SELECT pmid FROM pmids_by_mention('trka') as pmid WHERE pmid IN ( SELECT * FROM common_pmids('mycn','trka') ) ) AS subq GROUP BY pmid ORDER BY total desc; I doubt anybody cares, but ... I am doing an article lookup by genes which are mentioned in articles (pmid), where each gene may be referred to indirectly via an alias (mention). Each gene symbol has many aliases/mentions. (Unfortunately, it is also possible but rare for an alias/mention to map to more than one gene symbol). The query logic is as follows. For each mention/alias supplied by the user, find all articles connected to the gene (or, rarely, genes), even indirectly. Take the intersection of these sets to find the articles/pmid's containing all the specified genes. Unfortunately, in order to rank the articles by relevance, it is further necessary to come up with a result set containing one row for each qualifying gene mention in the set of matching articles. There can be any number of search terms (up to some limit), so the actual query has to be built dynamically on the fly by the application. Thanks, Kevin Murphy ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] STL problem in stored procedures
Tom Lane wrote: I think he needs to rewrite in C :-(. The backend is not C++ and I fear it's unlikely that libc++ will play nicely as a dynamic add-on. Grrr. I also have a C++ library that I'd like to use from PostgreSQL on Mac OS X. I have succeeded in calling a tiny toy C++ shared library from PG, but I don't know enough to understand what problems might be associated with using more complicated C++ code. It would be great if some C++/C guru could make a thorough analysis of C++ integration issues. Thanks, Kevin Murphy ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] aix build question re: duplicate symbol warning
Hi all, I'm trying to build PG 8.1 beta on an AIX server. The 'make' finishes without errors, but I'm getting lots of duplicate symbol warnings like the following one. What am I to make of these? gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -fno-strict-aliasing -Wl,-bnoentry -Wl,-H512 -Wl,-bM:SRE -o libpq.\ so libpq.a -L../../../src/port-L../../../src/port -lnsl -Wl,-bI:../../../src/backend/postgres.imp -Wl,-bE:libpq.exp ld: 0711-224 WARNING: Duplicate symbol: .pqStrerror ld: 0711-224 WARNING: Duplicate symbol: .pqGetpwuid ... I used a vanilla configure except for --without-readline. Tools that seem to be in use: AIX 5.1 GNU Make 3.80 gcc 3.2.2 ld - AIX version This is an 8-CPU machine that recently became disused in our research department and looks like a nice potential PG host. Thanks, Kevin Murphy ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] SQL injection
Can some knowledgeable person set the record straight on SQL injection, please? I thought that the simple answer was to use prepared statements with bind variables (except when you are letting the user specify whole chunks of SQL, ugh), but there are many people posting who either don't know about prepared statements or know something I don't. Thanks, Kevin Murphy P.S. I don't use PHP, but google informs me that PHP definitely has prepared statement options: PEAR::DB, PDO in 5.X+, etc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] database owner does not own public schema
Why, when I create a new database owned by a specified user (createdb --owner=somebody), would the public schema in that database not be owned by the user and moreover not be writable by the user? I'm using a fresh install of 8.1rc1, but the same thing seems to happen with an unfresh 8.0. Thanks, Kevin Murphy ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Getting prepared statement parameters in log when using JDBC with PG 8
I'm hoping to see the parameters of prepared statements in the postgresql log when using a JDBC client against postgresql 8.1? I saw a post about using the V2 protocol. Is there any downside to this? Thanks, Kevin P.S. Looking at the driver documentation, I see that I can append the protocol specification to the jdbc connect string, e.g.: jdbc:postgresql://host:port?protocolVersion=2 I'll give that a try. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] function accepting and returning rows; how to avoid parentheses
I'd like to have a function that acts as a row filter (that can optionally expand each row into multiple rows), but I don't know how to wangle this such that the output is not enclosed in parentheses, i.e. what I'm getting now is a single column of a composite type instead of multiple columns matching the original table layout. Example: CREATE TABLE sometable (key text, value real); INSERT INTO sometable VALUES ('A', 1); INSERT INTO sometable VALUES ('B', 2); -- unrealistic demo filter function CREATE OR REPLACE FUNCTION foo(arow sometable) RETURNS SETOF sometable AS $$ DECLARE BEGIN RETURN NEXT arow; RETURN NEXT arow; RETURN; END; $$ LANGUAGE plpgsql; -- SQL front-end for filter function CREATE OR REPLACE FUNCTION explode(sometable) RETURNS SETOF sometable as $$ SELECT * FROM foo($1) AS t; $$ LANGUAGE SQL STRICT IMMUTABLE; select explode(sometable.*) from sometable; explode --- (A,1) (A,1) (B,2) (B,2) Thanks, Kevin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] function accepting and returning rows; how to avoid
Merlin Moncure wrote: On 12/13/06, Kevin Murphy <[EMAIL PROTECTED]> wrote: I'd like to have a function that acts as a row filter (that can optionally expand each row into multiple rows), but I don't know how to wangle this such that the output is not enclosed in parentheses, i.e. what I'm getting now is a single column of a composite type instead of multiple columns matching the original table layout. functions defined in the sql language (as opposed to pl/pgsql) allow you to call them without explicitly using from...if you want to expand, just select from your result as you would expand any row variable. basically, have you tried: select (explode(sometable.*)).* from sometable; Thanks a lot. You must have been puzzled that I got so close and failed to figure it out. I hadn't really understood the {row}.* notation. In the meantime I did an end run with an insert trigger, but at least I understand this now. -Kevin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Optimization of unnecessary GROUP BY columns
When columns are required in a GROUP BY clause even though some of them are fully dependent on others, is there any plan for making it possible to do the GROUP BY only on the necessary columns? The 8.X documentation made me curious: Section 7.2.3 in the 8.X documentation ("The GROUP BY and HAVING Clauses"): "SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM products p LEFT JOIN sales s USING (product_id) GROUP BY product_id, p.name, p.price; ... Depending on how exactly the products table is set up, name and price may be fully dependent on the product ID, so the additional groupings could theoretically be unnecessary, but this is not implemented yet." -Kevin Murphy ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] selecting random rows
Here is Josh Berkus' solution for randomly picking a single row from a query. I think the FAQ (www.postgresql.org/docs/faqs.FAQ.html#item4.1) could be updated with a link to this solution, which is more practical for large queries. www.powerpostgresql.com/Random_Aggregate Here is a discussion by Greg Sabino Mullane about getting random subsets of table rows. Greg's approach involves modifying the table you want to query from and is not very general-purpose (although it is a nice read). people.planetpostgresql.org/greg/index.php?/archives/40-Getting-random-rows-from-a-database-table.html I've seen Josh's approach extended to the multiple-row-subset case. I think the perl pseudo-code would look like this: # $n is the desired number of rows while(<>) { if(rand($.)<$n) # This is the probability that the current line should be in the output if it were the last line of the input { # Remove (at random) one of the current selections splice(@lines,rand(@lines),1) if @lines==$n; # and add the latest selection at the end push(@lines,$_); } } Would it be possible to implement this as a function in PG? Aside: I'm fantasizing about a postgresql archive of user-submitted functions. Is the pgfoundry the closest thing to this? -Kevin Murphy ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] postgresql vs mysql (OT: perl)
Randal L. Schwartz wrote: Russ> Take perl for example. I have still yet to see readable Perl code. You can't read it if you're not familiar with it. Seconded. Perl is like the churkendoose -- hybrid strength, ugly as hell, only poultry known that can scare off a fox every time, whole barnyard loves having it around. The better I know it, the better I like it. A language's appeal always depends on your prior language and operating system experience, not to mention what you're trying to use it for, how you learn it, the support infrastructure around you while learning it, how much you're getting paid (in fame, fortune, and/or fun) to program in it, how long you've used it, how often you use it, what other languages you use alongside of it, who your friends are, how much of a sucker you are for what you read in the tech press, how patient you are, how creative you are, your tolerance for abstraction, the number and quality of programmers you want to be able to potentially contribute to or maintain your code, etc, etc. It's a big barnyard, and it all stinks. So let's just roll around in the mud and have some fun. -Kevin Murphy ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Q re installing Pg on OS X?
Seneca Cunningham wrote: On Sun, Apr 15, 2007 at 06:18:18PM -0400, Kynn Jones wrote: Still, it would be nice to have more up-to-date instructions on how to install PostgreSQL on OS X. Any pointers would be much appreciated! Try the main documentation And just for the record, PostgreSQL compiles and installs smoothly from source code when you have XCode (developer tools) installed, using the instructions in the INSTALL file in the tar ball. Notes: *) Use 'make' instead of 'gmake' as indicated (OS X make is GNU make) *) OS X doesn't have an 'adduser' command Either use the GUI or create a postgres user on the command line using niload, e.g.: sudo niload group .
[GENERAL] Building PG 8.2.3 for x86_64 on Mac OS X 10.4.9
Has anybody tried making a 64-bit PostgreSQL on an Apple XServe w/ Intel Woodcrest CPU's? My compile works, but the 'make check' fails because of failure to allocate shared memory. There's plenty of SYSV memory available. I compiled using: ./configure --without-readline --prefix=/usr/local/pgsql-8.2.3 CFLAGS='-arch x86_64' log/initdb.log contains: creating template1 database in /usr/local/src/postgresql-8.2.3/src/ test/regress/./tmp_check/data/base/1 ... FATAL: \ could not create shared memory segment: Cannot allocate memory DETAIL: Failed system call was shmget(key=1, size=1810432, 03600). HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or s\ wap space. To reduce the request size (currently 1810432 bytes), reduce PostgreSQL's shared_buffers parameter (curre\ ntly 50) and/or its max_connections parameter (currently 10). Output of: sudo ipcs -m -b: Shared Memory: T ID KEYMODE OWNERGROUP SEGSZ (no shared memory segments in use) /etc/sysctl.conf contains: kern.sysv.shmmax=268435456 kern.sysv.shmmin=1 kern.sysv.shmmni=32 kern.sysv.shmseg=8 kern.sysv.shmall=65536 I just started a 32-bit instance of PostgreSQL with no problems on this box, and it showed: sudo ipcs -m -b Shared Memory: T ID KEYMODE OWNERGROUP SEGSZ m 1310715432001 --rw--- postgres staff 126763008 Thanks, Kevin Murphy ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Building PG 8.2.3 for x86_64 on Mac OS X 10.4.9
On Apr 19, 2007, at 3:21 PM, Kevin Murphy wrote: Has anybody tried making a 64-bit PostgreSQL on an Apple XServe w/ Intel Woodcrest CPU's? My compile works, but the 'make check' fails because of failure to allocate shared memory. There's plenty of SYSV memory available. The call to shmget() is failing with ENOMEM. A standalone SYSV shared memory test program gives the same result when compiled with - arch x86_64. Harrumph. -Kevin Murphy ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Building PG 8.2.3 for x86_64 on Mac OS X 10.4.9
A.M. wrote: On Apr 19, 2007, at 3:21 PM, Kevin Murphy wrote: Has anybody tried making a 64-bit PostgreSQL on an Apple XServe w/ Intel Woodcrest CPU's? My compile works, but the 'make check' fails because of failure to allocate shared memory. There's plenty of SYSV memory available. The configure script will likely still configure for the 32-bit headers (so size_t will be 32 bits incorrectly). It will take a little more autoconf hacking to get a 64-bit build. sizeof(size_t) reports as 8. -Kevin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Building PG 8.2.3 for x86_64 on Mac OS X 10.4.9
Tom Lane wrote: Kevin Murphy <[EMAIL PROTECTED]> writes: Has anybody tried making a 64-bit PostgreSQL on an Apple XServe w/ Intel Woodcrest CPU's? creating template1 database in /usr/local/src/postgresql-8.2.3/src/ test/regress/./tmp_check/data/base/1 ... FATAL: \ could not create shared memory segment: Cannot allocate memory DETAIL: Failed system call was shmget(key=1, size=1810432, 03600). Hmm ... not sure if this is related, but there's something mighty fishy about that "key" parameter. I'd expect to see key=5432001, or something close to that depending on what port number you're using. But is this the case when doing 'make check'? I see this in postinit.c: /* * We're running a postgres bootstrap process or a standalone backend. * Create private "shmem" and semaphores. */ CreateSharedMemoryAndSemaphores(true, 0); where 0 is the port. Since the ENOMEM problem occurs in a simple shared memory test program, I'm inquiring on an Apple dev list. -Kevin Murphy ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Building PG 8.2.3 for x86_64 on Mac OS X 10.4.9
On Apr 19, 2007, at 3:21 PM, Kevin Murphy wrote: Has anybody tried making a 64-bit PostgreSQL on an Apple XServe w/ Intel Woodcrest CPU's? My compile works, but the 'make check' fails because of failure to allocate shared memory. There's plenty of SYSV memory available. The call to shmget() is failing with ENOMEM. A standalone SYSV shared memory test program gives the same result when compiled with -arch x86_64. OK, I finally found someone (Terry Lambert at Apple) to answer my question. Here's the thread: http://lists.apple.com/archives/darwin-kernel/2007/Apr/msg00021.html Excerpt: "It's trying to allocate the memory in the lower 4G of the process virtual address space, and unless you turn off the 4G page 0 mapping (or munamp() it after you're running but before you attempt to allocate shared memory), it's not going to see any free address space available." Regards, Kevin Murphy
[GENERAL] Permission denied to create database
Sleep deprived and surely doing something stupid here; I can't seem to confer the ability to create databases on a regular user. I always get "permission denied to create database". One note: template1 has had some C functions added to it. Could that be related to the problem? $ createdb -U joe joejunkdb createdb: database creation failed: ERROR: permission denied to create database $ sudo -u postgres psql -c "grant all on tablespace pg_default to joe" Password: GRANT $ createdb -U joe joejunkdb createdb: database creation failed: ERROR: permission denied to create database joe=> \db+ List of tablespaces Name| Owner | Location | Access privileges| Description +--+--++- pg_default | postgres | | {postgres=C/postgres,joe=C/postgres} | pg_global | postgres | || joe=> \l+ List of databases Name| Owner | Encoding | Tablespace |Description ---+--+--++--- ... template1 | postgres | UTF8 | pg_default | Default template database For some reason, I haven't succeeded by reading the docs and googling. Thanks for jarring my memory, Kevin Murphy PostgreSQL 8.2.4 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Permission denied to create database
Richard Huxton wrote: Kevin Murphy wrote: Sleep deprived and surely doing something stupid here; I can't seem to confer the ability to create databases on a regular user. As a superuser: ALTER USER joe CREATEDB Thanks, Richard and others who replied. I don't have to deal with permissions very often. -Kevin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] PostgreSQL on Linux PC vs MacOS X
On Nov 3, 2004, at 1:33 PM, Jeff Bohmer wrote: We use PostgreSQL 7.x on both OS X and Linux. We used to run OS X in production, but due to numerous problems we switched to Linux. OS X was not stable at all, especially under load. It was also a poor performer under load or not. Did you (or anyone) ever compare performance of PostgreSQL under PPC Linux running on the G4 or G5? -Kevin Murphy ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] out of memory during query execution
I'm certainly not an AIX expert, but I remember my 32-bit AIX programs being limited to 256MB of heap by default. When I linked, I think I had to ask for more maximum data page space using something like: -bmaxdata:0x4000 (which asks for 1GB, I believe) -Kevin Murphy ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] sqlite speed comparison
A Windows PostgreSQL guru who cares (;-)) might help this guy with his benchmark of mysql, firebird, sqlite, and postgresql: http://www.sqlite.org/cvstrac/wiki?p=SpeedComparison Obviously I don't care (and I don't use Windows). -Kevin Murphy ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] How I changed the encoding of template1 after the fact
I wanted to change the encoding of the template1 database, but when I tried to drop template1, I get the message, "ERROR: cannot drop a template database". The docs (http://www.postgresql.org/docs/8.1/interactive/manage-ag-templatedbs.html) say that this is possible, but a user comment on that page says you can't. Actually, you *can* drop a template database, if you first convert it into a non-template database, as per http://techdocs.postgresql.org/techdocs/pgsqladventuresep1.php Here's a condensed example, in which template1 is recreated to change its default encoding: -- Connect as the postgres superuser, e.g.: -- psql -U postgres template1 -- Then run: UPDATE pg_database SET datallowconn = TRUE where datname = 'template0'; \c template0 UPDATE pg_database SET datistemplate = FALSE where datname = 'template1'; drop database template1; create database template1 with template = template0 encoding = 'UNICODE'; UPDATE pg_database SET datistemplate = TRUE where datname = 'template1'; \c template1 UPDATE pg_database SET datallowconn = FALSE where datname = 'template0'; If you just wanted to clone template0, you would leave out the "encoding = 'UNICODE'" clause. Gurus, any corrections or safety advisories you care to make? Kevin Murphy ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] index scan backward plan question
I have a table for which PG 8.3 is guessing wrong about a plan when the result set gets large. For these large result sets, it uses an Index Scan Backward/Filter (slower) instead of Sort/Bitmap Heap Scan/Bitmap Index Scan (faster). See below. I fooled around with various planner variables, but the only thing I found that worked was: set enable_indexscan = off; BTW, without turning index scans off, the largest queries take many minutes to return (in fact, I've never had the patience to wait for the result). Is there some other way I can persuade PG to not ever use the Index Scan Backward approach? Thanks, Kevin Murphy Limit (cost=104804.79..110320.84 rows=25 width=229) (actual time=1653.686..10381.264 rows=25 loops=1) -> Index Scan Backward using merged_weight_date_idx on merged (cost=0.00..31295593.98 rows=141839 width=229) (actual time=3.888..10380.783 rows=500 loops=1) Filter: (symbol = 'ERVK6'::text) Total runtime: 10381.552 ms Limit (cost=278766.87..278766.93 rows=25 width=229) (actual time=5140.604..5140.639 rows=25 loops=1) -> Sort (cost=278765.81..279120.41 rows=141839 width=229) (actual time=5140.098..5140.571 rows=450 loops=1) Sort Key: weight, date -> Bitmap Heap Scan on merged (cost=920.44..245618.72 rows=141839 width=229) (actual time=61.265..354.795 rows=142814 loops=1) Recheck Cond: (symbol = 'ERVK6'::text) -> Bitmap Index Scan on merged_symbol_idx (cost=0.00..920.44 rows=141839 width=0) (actual time=58.846..58.846 rows=142814 loops=1) Index Cond: (symbol = 'ERVK6'::text) Total runtime: 5903.179 ms ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] index scan backward plan question
> Neil Conway wrote: > There is no such thing as "PG 8.3". I meant 8.1.3 > please provide the queries that trigger the problem and the > relevant schema definitions. Sorry about not posting more details initially. I was running out the door and was hasty. > Try re-running ANALYZE and retrying EXPLAIN ANALYZE. It was analyzed (and unmodified since analysis). I also rebuilt the table from scratch - same results. I'm beginning to think my performance problem is mostly related to tight memory. The same query that was insanely slow on one machine (1GB RAM) was reasonably fast on another with more memory (8GB RAM). Also, when I freed a bit more memory on the original machine, the performance wall went away. PostgreSQL seems to be making OK decisions about when to use the 'index scan backward' plan, but under tight memory conditions, the index scan would take forever while the alternative would be fine. My curiosity has sunk below my need to get real work done, so I don't require any further response. If anybody has too much time on his or her hands, you can read what I wrote (below) before I decided it wasn't worth pursuing this any further. Thanks, Kevin Murphy \d merged Table "public.merged" Column| Type | Modifiers --+-+--- pmid | integer | first_author | text| authors | text| title| text| journal | text| date | date| display_date | text| volume | text| issue| text| pages| text| symbol | text| weight | real| Indexes: "merged_data_idx" btree (date, first_author) "merged_first_author_idx" btree (first_author) "merged_pmid_idx" btree (pmid) "merged_symbol_idx" btree (symbol) "merged_weight_date_idx" btree (weight, date) The query in question searches on 'symbol', orders by the combination of 'weight' and 'date', and constrains by a limit and offset. The table has 7,839,274 rows; 16,247 distinct symbol values; 17,279 distinct weight values; 831,007 distinct weight-date combinations. Depending on the query search term and offset value, one of two different plans is chosen. For search terms matching large numbers of rows, and with a low enough offset, an 'index scan backward' plan is chosen, which is sometimes extremely fast (faster than the other plan). However, for the search terms matching very large numbers of rows, the 'index scan backward' plan can be molasses-slow (as much as 2 and 3 orders of magnitude). During such slow queries, the CPU is not doing much: postgres is consuming <=3% of CPU, and nothing else is happening on the box ('top' is using the most CPU). Disk I/O is consistent and low at around 1MB/sec while this is going on. Top doesn't show any swapping, although free physical memory is very low, but on the other hand there is lots of inactive memory (720MB). During these very long queries, I see a lock on merged_weight_date_idx (which is 168 MB in size, BTW) the whole time. There are no messages in the system or postgresql logs. Can it really be taking PG 490 seconds (see below) to read some fraction of 168MB into memory? * * * * * * * * postgresql.conf settings: shared_buffers = 15000 work_mem = 32768 maintenance_work_mem = 65536 default_statistics_target = 1000 I just noticed that default_statistics_target is set at one extreme. Since many of the columns have large numbers of values, highly skewed distributions, and the oddballs are often important, it was my understanding that this was not unreasonable (and early tests seemed to bear this out). I stand ready to be corrected. * * * * * * * * OS: OS X 10.4, 1 GB RAM, dual 1.8 Ghz PPC * * * * * * * * The following plan is produced for almost all values of 'symbol' except the values with the most records. It is also produced for the high-frequency symbols if the OFFSET is large enough. The following is a search for a less common symbol than the problematic cases ('CFTR' matches 8920 rows): EXPLAIN ANALYZE SELECT pmid, authors, title, journal, date, display_date, volume, issue, pages FROM merged WHERE symbol = 'CFTR' ORDER BY weight DESC, date DESC OFFSET 0 LIMIT 25; QUERY PLAN --- Limit (cost=29364.77..29364.83 rows=25 width=229) (actual time=1351.026..1351.040 rows=25 loops=1) -> Sort (cost=29364.77..29385.54 rows=8310 width=229) (actual time=1351.023..1351.028 rows=25 loops=1) Sort Key: weight, date -&g
Re: [GENERAL] How can I known the size of a database, table by table ?
Bruno Baguette wrote: I works on a PostgreSQL 8.0.4 database, and I would like to known the size (in bytes, Mbytes or anything else) of that database, for each table. You've already got several replies. Know that the PG documentation is excellent. If you visit www.postgresql.org, you will quickly find: http://www.postgresql.org/docs/8.0/interactive/diskusage.html The following won't help you, but just for completeness, in PG 8.1, dbsize functionality is built in, so you can just do: select tablename, pg_size_pretty(pg_total_relation_size(tablename)) as size from pg_tables where schemaname in ('public') order by pg_total_relation_size(tablename) desc; select pg_size_pretty(pg_database_size(current_database())) AS TOTAL; Also, replace or add to 'public' as required by the schemas you use. The sizes reported include the indexes for each table. The total includes not just your tables but the system tables also. Unless you have collected lots of statistics on the database, the system tables (pg_*) will probably only take up a few MB of space. Note that pg_size_pretty() _rounds_ the raw byte numbers, so the pretty results may understate the space requirements by half a unit. If you want to exclude the system tables from the total database size, then instead of pg_database_size(), you could use: select pg_size_pretty(cast (sum(pg_total_relation_size(tablename)) as bigint)) as size from pg_tables where schemaname in ('public'); Kevin Murphy ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Please explain the gin index
PG tsearch2 users, I notice there is an 8.1 backport of tsearch2 for 8.2 (http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch2WhatsNew). Has anybody used this? What are the performance differences between the new GIN index and GIST? Thanks, Kevin Murphy ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] number of distinct values in tsearch2 gist index
In the output of gist_stat() in the gevel contrib module, is the number of tuples the number of distinct values stored in the index? Is the number (6M) so large because the positions are part of the values? I'm guessing I can't determine the number of distinct lexemes from the output of gist_stat() select gist_stat('idxfti_idex'); gist_stat -- Number of levels: 5 Number of pages: 244881 Number of leaf pages: 234515 Number of tuples: 6017003 Number of invalid tuples: 0 Number of leaf tuples: 5772123 Total size of tuples: 1846434636 bytes Total size of leaf tuples: 1779845516 bytes Total size of index: 2006065152 bytes (1 row) Time: 193023.012 ms Thanks, Kevin Murphy ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Messages to pgsql-general list not being posted
Yesterday I sent two messages to pgsql-general@postgresql.org, and neither one posted, as far as I can tell. "Please explain the gin index" - 7/18/06 10:44 AM EDT "number of distinct values in tsearch2 gist index" - 7/18/06 1:24 PM EDT This has also happened to me on numerous previous occasions. Is there heavy-handed moderation or filtering going on here? Thanks, Kevin Murphy ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Messages to pgsql-general list not being posted
Tom Lane wrote: Kevin Murphy <[EMAIL PROTECTED]> writes: Yesterday I sent two messages to pgsql-general@postgresql.org, and neither one posted, as far as I can tell. Perhaps your own incoming spam filtering is dropping the messages? Hmm, I checked the spam filter folder, to no avail. I have no explanation. I'll check the archives in the future. -Kevin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] access method "gin" does not exist
I'm trying to test the 8.1 backport of the 8.2 GIN index and tsearch2 functionality. The patch is applied successfully (to 8.1.4, on OS X 10.4.7 w/ xcode 2.3), the build and install goes well, stop & start of postmaster is done, but initializing tsearch2 fails: psql -U postgres minitest < /usr/local/src/postgresql-8.1.4/contrib/tsearch2/tsearch2.sql ... ERROR: access method "gin" does not exist This error occurs at the following statement: CREATE OPERATOR CLASS gin_tsvector_ops DEFAULT FOR TYPE tsvector USING gin AS OPERATOR1 @@ (tsvector, tsquery), OPERATOR2 @@@ (tsvector, tsquery) RECHECK, FUNCTION1 bttextcmp(text, text), FUNCTION2 gin_extract_tsvector(tsvector,internal), FUNCTION3 gin_extract_tsquery(tsquery,internal,internal), FUNCTION4 gin_ts_consistent(internal,internal,tsquery), STORAGE text; Sanity checks: 1) In PSQL: select amname from pg_am where amname like '%gin%'; amname (0 rows) Hmm. Should I expect this query to return something? show gin_fuzzy_search_limit; gin_fuzzy_search_limit 0 (1 row) Thanks for any tips, Kevin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] COLLATE
Filip Rembiałkowski wrote: Greg Stark wrote: But the closest partial solution suggested so far is the pg_xfrm (sic) function that has been implemented and posted at least three times by three different posters to the postgres mailing lists. In the interest of avoiding a fourth independent implementation I'll attach the one I use below, it's not big. But I have no idea how to compile/install it. I tried compiling but it failed: pgdba ( at ) sulaco:~/src/postgresql-8.1.3/src$ gcc -I ./include -fPIC -c pg_strxfrm.c pg_strxfrm.c: In function 'pg_strxfrm': pg_strxfrm.c:98: error: 'Warn_restart' undeclared (first use in this function) pg_strxfrm.c:98: error: (Each undeclared identifier is reported only once pg_strxfrm.c:98: error: for each function it appears in.) What is the answer to Filip's question? I didn't see an answer in the list archives. I've seen several copies of Joe Conway's pg_strxfrm.c code on the web, and it always refers to the Warn_restart variable, which doesn't seem to exist in the 8.1.4 code that I'm using. I am working with a database in UTF-8 encoding using "C" collation - but I'd occasionally like to ORDER BY columns containing real UTF-8 data. Would the pg_strxfrm() function get used in a new operator class function? I'll read up on operator classes in chapter 32.14 of the docs, but if someone has a simple example, it might help other searchers of the archives. Thanks, Kevin Murphy ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] COLLATE
Tom Lane wrote: Kevin Murphy <[EMAIL PROTECTED]> writes: What is the answer to Filip's question? I didn't see an answer in the list archives. I've seen several copies of Joe Conway's pg_strxfrm.c code on the web, and it always refers to the Warn_restart variable, which doesn't seem to exist in the 8.1.4 code that I'm using. Warn_restart hasn't existed since PG 7.4. I would imagine that the code needs to be tweaked to use a PG_TRY construct instead of direct setjmp hacking. Yes, I'm a user, not a hacker. I was hoping that someone had done this already. Anyway, I gave PG_TRY a try, and the code superficially works. I have no idea what I'm doing; you can see what I did below. Confirm that instead of: memcpy(&save_restart, &Warn_restart, sizeof(save_restart)); if (sigsetjmp(Warn_restart, 1) != 0) { memcpy(&Warn_restart, &save_restart, sizeof(Warn_restart)); newlocale = setlocale(LC_COLLATE, oldlocale); if (!newlocale) elog(PANIC, "setlocale failed to reset locale: %s", localestr); siglongjmp(Warn_restart, 1); } ... code here ... memcpy(&Warn_restart, &save_restart, sizeof(Warn_restart)); it should be: PG_TRY(); { ... code here ... } PG_CATCH(); { newlocale = setlocale(LC_COLLATE, oldlocale); if (!newlocale) elog(PANIC, "setlocale failed to reset locale: %s", localestr); } PG_END_TRY(); Thanks, Kevin Murphy ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] syntax error but command executes anyway?
Scott, > > egenome_dev=# \i > > /Users/murphy/cvs/egora/sql/data_port/port_identifiers.sql > > INSERT 0 1672036 > > This INSERT 0 part tells you it didn't actually insert anything. Actually, that's not true. I didn't know what this number was until just now, but I looked it up: for a single-row insert, it's the OID of the new row; otherwise, it's 0. The 1672036, on the other hand, means that 1,672,036 rows were inserted. -Kevin Murphy ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] syntax error but command executes anyway?
On Saturday 19 June 2004 06:57 pm, Tom Lane wrote: > > egenome_dev=# \i > > /Users/murphy/cvs/egora/sql/data_port/port_identifiers.sql INSERT 0 > > 1672036 > > psql:/Users/murphy/cvs/egora/sql/data_port/port_identifiers.sql:15: > > ERROR: syntax error at or near "sourcecode" at character 2 > > It seems mighty suspicious that psql is reporting a syntax error at line > 15 of the file when cat is only showing 9 lines. I suspect that the > insert you are showing us did execute, but then something further on > in the file is producing the syntax error. > > I am wondering whether cat on OS X stops at embedded nulls, or something > stupid like that. It sure looks like there must be garbage in the > port_identifiers.sql file beyond what cat has printed here. What do you > see when you examine the file with other tools? (Try "od -c" if nothing > else springs to mind.) > > It is possible that the problem is not entirely cat's fault but has > something to do with the way that psql's \! command invokes cat. > Does cat at the shell prompt produce the same output? > > regards, tom lane It is indeed very weird, since the script is not that long. I piped it through 'od -a' to confirm. However, I retract my complaint, since I can't reproduce it! I had to take a bad DIMM out of this machine a few days ago; maybe I need to run that memory test again. Still, it would seem remarkable if a memory error could produce my initial results rather than a crash or hang. I am also feeling like compiling PG again, since it was initially compiled with that bad DIMM (but again, what would the odds be?) Tom, as always, thanks for your mellow and rapid response to questions. I'll let you know if this crops up again in a reproducible way. -Kevin Murphy ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] zombie primary key lurches out of database to devour the brains of the unwary
Using PG 7.4.5 on Mac OS X 10.3 ... I have a primary key that I can't destroy and can't create. One weird symptom is that when I use \d in psql to attempt to display the constraint, there is no output at all! Normally, psql either shows the constraint or reports that the constraint doesn't exist. (BTW, I know that the default clauses below are kind of weird, but I am using PG to process data for a database that doesn't use NULL and uses 0 and empty strings as default values. The DDL is generated automatically, so the defaults don't all make sense, but they should be harmless. Don't worry; the app is almost ported to PG ;-) In the transcript below, snp_main_chr22 is the table, and there is supposed to be a primary key snp_main_chr22_pk on the refsnp_id column: egenome_test=# egenome_test=# \d snp_main_chr22 Table "build.snp_main_chr22" Column| Type | Modifiers -+---+--- refsnp_id | integer | default 0 variation | character varying(10) | default ''::character varying het | character varying(20) | default ''::character varying validated | character varying(5) | default ''::character varying chr | character varying(2) | default ''::character varying assay_size | integer | default 0 pop_size| integer | default 0 seq_pos | integer | default 0 transcribed | character varying(1) | default ''::character varying egenome_test=# egenome_test=# alter table snp_main_chr22 drop constraint snp_main_chr22_pk; ERROR: constraint "snp_main_chr22_pk" does not exist egenome_test=# drop table snp_main_chr22 cascade; DROP TABLE egenome_test=# \d snp_main_chr22_pk egenome_test=# \d snp_main_chr22_pk_gibberish Did not find any relation named "snp_main_chr22_pk_gibberish". egenome_test=# CREATE TABLE snp_main_chr22 ( refsnp_id integer DEFAULT 0, variation varchar(10) DEFAULT '', hetvarchar(20) DEFAULT '', validated varchar(5) DEFAULT '', chrvarchar(2) DEFAULT '', assay_size integer DEFAULT 0, pop_size integer DEFAULT 0, seq_posinteger DEFAULT 0, transcribedvarchar(1) DEFAULT '' ); egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# CREATE TABLE egenome_test=# ALTER TABLE snp_main_chr22 ADD CONSTRAINT snp_main_chr22_pk PRIMARY KEY (refsnp_id) ; egenome_test-# egenome_test-# egenome_test-# NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "snp_main_chr22_pk" for table "snp_main_chr22" ERROR: relation "snp_main_chr22_pk" already exists egenome_test=# egenome_test=# \d snp_main_chr22 Table "build.snp_main_chr22" Column| Type | Modifiers -+---+--- refsnp_id | integer | default 0 variation | character varying(10) | default ''::character varying het | character varying(20) | default ''::character varying validated | character varying(5) | default ''::character varying chr | character varying(2) | default ''::character varying assay_size | integer | default 0 pop_size| integer | default 0 seq_pos | integer | default 0 transcribed | character varying(1) | default ''::character varying egenome_test=# Thanks for any advice. Kevin Murphy ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] how to constrain a query to return 1 or 0 rows (or >1 or 0 rows)
This is probably a stupid question, but ... I'd like to be able to take an existing query and modify it to return a single row if that's what the base query returns, and 0 rows if the base query returns multiple rows. Similarly, I'd like to also modify it to return multiple rows if that's what the base query returns, and 0 rows if the base query return a single row. What's a good way to do this? Thanks, Kevin Murphy ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] the current scoop on ilike and indexes
I am pretty sure the answer is no, but ... is there any way to get 'ilike' to use an index? It seems like something that a lot of people would want to do. Otherwise, should I just create redundant case-mapped columns and use 'like'? Thanks, Kevin Murphy ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] the current scoop on ilike and indexes
On Sep 21, 2004, at 4:52 PM, Thomas F.O'Connell wrote: You can use an index on an expression like "lower( col ) LIKE ... " as long as the LIKE expression is left-anchored. See Yes, I know that already. I wasn't talking about LIKE; I was talking about ILIKE. The data in the column is mixed-case. -Kevin ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] mailing list archive search form broken?
I receive the error message "Try to compose a less restrictive search query or check spelling" regardless of search term when attempting to search the list "PgSQL - General" via the mailing list archive search form at: http://archives.postgresql.org/pgsql-general/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[GENERAL] using COPY table FROM STDIN within script run as psql -f file.sql
This is a tip for the record in case it helps somebody else in the future. I have an import script that relies on a stored procedure that runs as a trigger on inserts into a temporary table. The script looks like this: -- create table -- ... -- define procedure and trigger -- ... -- import data via COPY command: COPY temp_table FROM STDIN WITH NULL AS ''; However, when run as "psql -f import.sql The solution is to use the psql "\COPY" command instead (and remove the trailing semi-colon, which cannot be used with psql commands). I.e. this command will work: \COPY temp_table FROM STDIN WITH NULL AS ''; -Kevin Murphy ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]