[PERFORM] Index creation
Any tips for speeding up index creation? I need to bulk load a large table with 100M rows and several indexes, some of which span two columns. By dropping all indexes prior to issuing the 'copy from' command, the operation completes 10x as fast (1.5h vs 15h). Unfortunately, recreating a single index takes nearly as long as loading all of the data into the table; this more or less eliminates the time gained by dropping the index in the first place. Also, there doesn't seem to be a simple way to disable/recreate all indexes for a specific table short of explicitely dropping and later recreating each index? -- Eric Jain ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Index creation
On Wed, 7 Jan 2004 18:08:06 +0100 "Eric Jain" <[EMAIL PROTECTED]> wrote: > Any tips for speeding up index creation? > > I need to bulk load a large table with 100M rows and several indexes, > some of which span two columns. > > By dropping all indexes prior to issuing the 'copy from' command, the > operation completes 10x as fast (1.5h vs 15h). > > Unfortunately, recreating a single index takes nearly as long as > loading all of the data into the table; this more or less eliminates > the time gained by dropping the index in the first place. > > Also, there doesn't seem to be a simple way to disable/recreate all > indexes for a specific table short of explicitely dropping and later > recreating each index? Before creating your index bump up your sort_mem high. set sort_mem = 64000 create index foo on baz(a, b); BIG increases. [This also helps on FK creation] -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] RAID array stripe sizes
Does anyone have any data to support arguing for a particular stripe size in RAID-0? Do large stripe sizes allow drives to stream data more efficiently or defeat read-ahead? -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Index creation
On Wed, 7 Jan 2004, Eric Jain wrote: > Any tips for speeding up index creation? > > I need to bulk load a large table with 100M rows and several indexes, > some of which span two columns. > > By dropping all indexes prior to issuing the 'copy from' command, the > operation completes 10x as fast (1.5h vs 15h). > > Unfortunately, recreating a single index takes nearly as long as loading > all of the data into the table; this more or less eliminates the time > gained by dropping the index in the first place. > > Also, there doesn't seem to be a simple way to disable/recreate all > indexes for a specific table short of explicitely dropping and later > recreating each index? Note that you can issue the following command to see all the index definitions for a table: select * from pg_indexes where tablename='sometable'; And store those elsewhere to be reused when you need to recreate the index. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Find original number of rows before applied LIMIT/OFFSET?
I need to know that original number of rows that WOULD have been returned by a SELECT statement if the LIMIT / OFFSET where not present in the statement. Is there a way to get this data from PG ? SELECT ... ; > returns 100,000 rows but, SELECT ... LIMIT x OFFSET y; > returns at most x rows In order to build a list pager on a web site, I want to select 'pages' of a result set at a time. However, I need to know the original select result set size because I still have to draw the 'page numbers' to display what pages are available. I've done this TWO ways in the past: 1) TWO queries. The first query will perform a SELECT COUNT(*) ...; and the second query performs the actualy SELECT ... LIMIT x OFFSET y; 2) Using PHP row seek and only selecting the number of rows I need. Here is an example of method number 2 in PHP: //-- function query_assoc_paged ($sql, $limit=0, $offset=0) { $this->num_rows = false; // open a result set for this query... $result = $this->query($sql); if (! $result) return (false); // save the number of rows we are working with $this->num_rows = @pg_num_rows($result); // moves the internal row pointer of the result to point to our // desired offset. The next call to pg_fetch_assoc() would return // that row. if (! empty($offset)) { if (! @pg_result_seek($result, $offset)) { return (array()); }; } // gather the results together in an array of arrays... $data = array(); while (($row = pg_fetch_assoc($result)) !== false) { $data[] = $row; // After reading N rows from this result set, free our memory // and return the rows we fetched... if (! empty($limit) && count($data) >= $limit) { pg_free_result($result); return ($data); } } pg_free_result($result); return($data); } //-- In this approach, I am 'emulating' the LIMIT / OFFSET features in PostgreSQL by just seeking forward in the result set (offset) and only fetching the number of rows that match my needs (LIMIT). QUESTION: Is this the best way to do this, or is there a more efficient way to get at the data I want? Is there a variable set in PG that tells me the original number of rows in the query? Something like: SELECT ORIG_RESULT_SIZE, ... ... LIMIT x OFFSET y; Or can I run another select right afterwards...like: SELECT ... ... LIMIT x OFFSET y; SELECT unfiltered_size_of_last_query(); Any thoughts? Sure, the PHP function I'm using above 'works', but is it the most efficient? I hope I'm not actually pulling all 100,000 records across the wire when I only intend to show 10 at a time. See what I'm getting at? TIA, Dante - D. Dante Lorenso [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Find original number of rows before applied LIMIT/OFFSET?
"D. Dante Lorenso" <[EMAIL PROTECTED]> writes: > Any thoughts? Sure, the PHP function I'm using above 'works', but is it > the most efficient? I hope I'm not actually pulling all 100,000 records > across the wire when I only intend to show 10 at a time. See what I'm > getting at? I tend to do it using a separate select count(*). My thinking is that the count(*) query can be simplified and exclude things like the ORDER BY clause and any select list entries that require extra work. It can often even exclude whole joins. By doing a separate query I can do that extra work only for the rows that i actually need for display. Hopefully using an index to pull up those rows. And do the count(*) in the most efficient way possible, probably a sequential scan with no joins for foreign keys etc. But I suspect the two methods both work out to suck about equally. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] failures on machines using jfs
Hi all, Chris Browne (one of my colleagues here) has posted some tests in the past indicating that jfs may be the fastest filesystem for Postgres use on Linux. We have lately had a couple of cases where machines either locked up, slowed down to the point of complete unusability, or died completely while using jfs. We are _not_ sure that jfs is in fact the culprit. In one case, a kernel panic appeared to be referring to the jfs kernel module, but I can't be sure as I lost the output immediately thereafter. Yesterday, we had a problem of data corruption on a failed jfs volume. None of this is to say that jfs is in fact to blame, nor even that, if it is, it does not have something to do with the age of our installations, &c. (these are all RH 8). In fact, I suspect hardware in both cases. But I thought I'd mention it just in case other people are seeing strange behaviour, on the principle of "better safe than sorry." A -- Andrew Sullivan 204-4141 Yonge Street Afilias CanadaToronto, Ontario Canada <[EMAIL PROTECTED]> M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])