[PERFORM] Index creation

2004-01-07 Thread Eric Jain
Any tips for speeding up index creation?

I need to bulk load a large table with 100M rows and several indexes,
some of which span two columns.

By dropping all indexes prior to issuing the 'copy from' command, the
operation completes 10x as fast (1.5h vs 15h).

Unfortunately, recreating a single index takes nearly as long as loading
all of the data into the table; this more or less eliminates the time
gained by dropping the index in the first place.

Also, there doesn't seem to be a simple way to disable/recreate all
indexes for a specific table short of explicitely dropping and later
recreating each index?

--
Eric Jain


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Index creation

2004-01-07 Thread Jeff
On Wed, 7 Jan 2004 18:08:06 +0100
"Eric Jain" <[EMAIL PROTECTED]> wrote:

> Any tips for speeding up index creation?
> 
> I need to bulk load a large table with 100M rows and several indexes,
> some of which span two columns.
> 
> By dropping all indexes prior to issuing the 'copy from' command, the
> operation completes 10x as fast (1.5h vs 15h).
> 
> Unfortunately, recreating a single index takes nearly as long as
> loading all of the data into the table; this more or less eliminates
> the time gained by dropping the index in the first place.
> 
> Also, there doesn't seem to be a simple way to disable/recreate all
> indexes for a specific table short of explicitely dropping and later
> recreating each index?

Before creating your index bump up your sort_mem high.

set sort_mem = 64000  
create index foo on baz(a, b);

BIG increases.
[This also helps on FK creation]


-- 
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] RAID array stripe sizes

2004-01-07 Thread Greg Stark

Does anyone have any data to support arguing for a particular stripe size in
RAID-0? Do large stripe sizes allow drives to stream data more efficiently or
defeat read-ahead?

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Index creation

2004-01-07 Thread scott.marlowe
On Wed, 7 Jan 2004, Eric Jain wrote:

> Any tips for speeding up index creation?
> 
> I need to bulk load a large table with 100M rows and several indexes,
> some of which span two columns.
> 
> By dropping all indexes prior to issuing the 'copy from' command, the
> operation completes 10x as fast (1.5h vs 15h).
> 
> Unfortunately, recreating a single index takes nearly as long as loading
> all of the data into the table; this more or less eliminates the time
> gained by dropping the index in the first place.
> 
> Also, there doesn't seem to be a simple way to disable/recreate all
> indexes for a specific table short of explicitely dropping and later
> recreating each index?

Note that you can issue the following command to see all the index 
definitions for a table:

select * from pg_indexes where tablename='sometable';

And store those elsewhere to be reused when you need to recreate the 
index.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Find original number of rows before applied LIMIT/OFFSET?

2004-01-07 Thread D. Dante Lorenso
I need to know that original number of rows that WOULD have been returned
by a SELECT statement if the LIMIT / OFFSET where not present in the 
statement.
Is there a way to get this data from PG ?

   SELECT
   ... ;
   > returns 100,000 rows

but,

   SELECT
   ...
   LIMIT x
   OFFSET y;
   > returns at most x rows

In order to build a list pager on a web site, I want to select 'pages' of a
result set at a time.  However, I need to know the original select 
result set
size because I still have to draw the 'page numbers' to display what 
pages are
available.

I've done this TWO ways in the past:

   1) TWO queries.  The first query will perform a SELECT COUNT(*) ...; and
   the second query performs the actualy SELECT ... LIMIT x OFFSET y;
   2) Using PHP row seek and only selecting the number of rows I need.

Here is an example of method number 2 in PHP:

   //--
   function query_assoc_paged ($sql, $limit=0, $offset=0) {
   $this->num_rows = false;
   // open a result set for this query...
   $result = $this->query($sql);
   if (! $result) return (false);
   // save the number of rows we are working with
   $this->num_rows = @pg_num_rows($result);
   // moves the internal row pointer of the result to point to our
   // desired offset. The next call to pg_fetch_assoc() would return
   // that row.
   if (! empty($offset)) {
   if (! @pg_result_seek($result, $offset)) {
   return (array());
   };
   }
   // gather the results together in an array of arrays...
   $data = array();
   while (($row = pg_fetch_assoc($result)) !== false) {
   $data[] = $row;
  
   // After reading N rows from this result set, free our memory
   // and return the rows we fetched...
   if (! empty($limit) && count($data) >= $limit) {
   pg_free_result($result);
   return ($data);
   }
   }

   pg_free_result($result);
   return($data);
   }
   //--

In this approach, I am 'emulating' the LIMIT / OFFSET features in PostgreSQL
by just seeking forward in the result set (offset) and only fetching the
number of rows that match my needs (LIMIT).
QUESTION: Is this the best way to do this, or is there a more efficient way
to get at the data I want?  Is there a variable set in PG that tells me the
original number of rows in the query?  Something like:
   SELECT ORIG_RESULT_SIZE, ...
   ...
   LIMIT x
   OFFSET y;
Or can I run another select right afterwards...like:

   SELECT ...
   ...
   LIMIT x
   OFFSET y;
   SELECT unfiltered_size_of_last_query();

Any thoughts?  Sure, the PHP function I'm using above 'works', but is it
the most efficient?  I hope I'm not actually pulling all 100,000 records
across the wire when I only intend to show 10 at a time.  See what I'm
getting at?
TIA,

Dante

-
D. Dante Lorenso
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Find original number of rows before applied LIMIT/OFFSET?

2004-01-07 Thread Greg Stark

"D. Dante Lorenso" <[EMAIL PROTECTED]> writes:

> Any thoughts?  Sure, the PHP function I'm using above 'works', but is it
> the most efficient?  I hope I'm not actually pulling all 100,000 records
> across the wire when I only intend to show 10 at a time.  See what I'm
> getting at?

I tend to do it using a separate select count(*). My thinking is that the
count(*) query can be simplified and exclude things like the ORDER BY clause
and any select list entries that require extra work. It can often even exclude
whole joins.

By doing a separate query I can do that extra work only for the rows that i
actually need for display. Hopefully using an index to pull up those rows. And
do the count(*) in the most efficient way possible, probably a sequential scan
with no joins for foreign keys etc.

But I suspect the two methods both work out to suck about equally.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] failures on machines using jfs

2004-01-07 Thread Andrew Sullivan
Hi all,

Chris Browne (one of my colleagues here) has posted some tests in the
past indicating that jfs may be the fastest filesystem for Postgres
use on Linux.

We have lately had a couple of cases where machines either locked up,
slowed down to the point of complete unusability, or died completely
while using jfs.  We are _not_ sure that jfs is in fact the culprit. 
In one case, a kernel panic appeared to be referring to the jfs
kernel module, but I can't be sure as I lost the output immediately
thereafter.  Yesterday, we had a problem of data corruption on a
failed jfs volume.

None of this is to say that jfs is in fact to blame, nor even that,
if it is, it does not have something to do with the age of our
installations, &c. (these are all RH 8).  In fact, I suspect hardware
in both cases.  But I thought I'd mention it just in case other
people are seeing strange behaviour, on the principle of "better
safe than sorry."

A
-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])