[PERFORM] The right SHMMAX and FILE_MAX

2011-04-30 Thread Phoenix Kiula
Hi. I'm on a 64 Bit CentOS 5 system, quadcore processor, 8GB RAM and tons of data storage (1 TB SATAII disks). The current SHMMAX and SHMMIN are (commas added for legibility) -- kernel.shmmax = 68,719,476,736 kernel.shmall = 4,294,967,296 Now, according to my reading in the PG manual and this li

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Samuel Gendler
On Sat, Apr 30, 2011 at 5:12 PM, Jeff Janes wrote: > > > gist indices are designed to make this type of thing fast, by using > techniques to rule out most of those comparisons without actually > performing them. I don't know enough about the > guts of either your distance function or the gist in

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Jeff Janes
On Sat, Apr 30, 2011 at 3:29 PM, Joel Reymont wrote: > > On Apr 30, 2011, at 11:11 PM, Jeff Janes wrote: > >> But what exactly are you inserting?  The queries you reported below >> are not the same as the ones you originally described. > > I posted the wrong query initially. The only difference is

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Joel Reymont
On Apr 30, 2011, at 11:11 PM, Jeff Janes wrote: > But what exactly are you inserting? The queries you reported below > are not the same as the ones you originally described. I posted the wrong query initially. The only difference is in the table that holds the probability array. I'm inserting

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Jeff Janes
On Sat, Apr 30, 2011 at 2:15 PM, Joel Reymont wrote: > > On Apr 30, 2011, at 7:24 PM, Kevin Grittner wrote: > >> If this is where most of the time is, the next thing is to run it >> with EXPLAIN ANALYZE, and post the output. > > I was absolutely wrong about the calculation taking < 1s, it actually

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Joel Reymont
On Apr 30, 2011, at 7:36 PM, Kevin Grittner wrote: > It may even be amenable to knnGiST indexing (a new feature coming in > 9.1), which would let you do your select with an ORDER BY on the > distance. I don't think I can wait for 9.1, need to go live in a month, with PostgreSQL or without. > P

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Joel Reymont
On Apr 30, 2011, at 7:24 PM, Kevin Grittner wrote: > If this is where most of the time is, the next thing is to run it > with EXPLAIN ANALYZE, and post the output. I was absolutely wrong about the calculation taking < 1s, it actually takes about 30s for 2 million rows. Still, the difference be

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Kevin Grittner
Joel Reymont wrote: > I'm calculating distance between probability vectors, e.g. topics > that a document belongs to and the topics of an ad. > > The distance function is already a C function. Topics are > float8[150]. > > Distance is calculated against all documents in the database There's

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Kevin Grittner
[rearranging to correct for top-posting] Joel Reymont wrote: > Kevin Grittner wrote: >> Joel Reymont wrote: >> >>> We have 2 million documents now and linking an ad to all of them >>> takes 5 minutes on my top-of-the-line SSD MacBook Pro. >> >> How long does it take to run just the SELECT part

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Joel Reymont
I'm calculating distance between probability vectors, e.g. topics that a document belongs to and the topics of an ad. The distance function is already a C function. Topics are float8[150]. Distance is calculated against all documents in the database so it's arable scan. Sent from my comfortable

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Pierre C
If you want to search by geographical coordinates, you could use a gist index which can optimize that sort of things (like retrieving all rows which fit in a box). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Joel Reymont
Calculating distance involves giving an array of 150 float8 to a pgsql function, then calling a C function 2 million times (at the moment), giving it two arrays of 150 float8. Just calculating distance for 2 million rows and extracting the distance takes less than a second. I think that includes s

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Kevin Grittner
Joel Reymont wrote: > We have 2 million documents now and linking an ad to all of them > takes 5 minutes on my top-of-the-line SSD MacBook Pro. How long does it take to run just the SELECT part of the INSERT by itself? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@po

[PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Joel Reymont
I have a stored proc that potentially inserts hundreds of thousands, potentially millions, of rows (below). This stored proc is part of the the sequence of creating an ad campaign and links an ad to documents it should be displayed with. A few of these stored procs can run concurrently as users

Re: [PERFORM] REINDEX takes half a day (and still not complete!)

2011-04-30 Thread Kenneth Marshall
On Sat, Apr 30, 2011 at 05:26:36PM +0800, Phoenix Kiula wrote: > On Sat, Apr 30, 2011 at 4:07 PM, Greg Smith wrote: > > On 04/23/2011 03:44 PM, Robert Haas wrote: > >> > >> On Apr 17, 2011, at 11:30 AM, Phoenix Kiula > >> ?wrote: > >> > >>> > >>> Postgres is 8.2.9. > >>> > >>> > >> > >> An upgrade

Re: [PERFORM] REINDEX takes half a day (and still not complete!)

2011-04-30 Thread Phoenix Kiula
On Sat, Apr 30, 2011 at 4:07 PM, Greg Smith wrote: > On 04/23/2011 03:44 PM, Robert Haas wrote: >> >> On Apr 17, 2011, at 11:30 AM, Phoenix Kiula >>  wrote: >> >>> >>> Postgres is 8.2.9. >>> >>> >> >> An upgrade would probably help you a lot, and as others have said it >> sounds like your hardware

Re: [PERFORM] REINDEX takes half a day (and still not complete!)

2011-04-30 Thread Greg Smith
On 04/23/2011 03:44 PM, Robert Haas wrote: On Apr 17, 2011, at 11:30 AM, Phoenix Kiula wrote: Postgres is 8.2.9. An upgrade would probably help you a lot, and as others have said it sounds like your hardware is failing, so you probably want to deal with that first. I am a bit surp