Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread NikhilS
Hi, On Feb 6, 2008 9:05 AM, Greg Smith <[EMAIL PROTECTED]> wrote: > On Tue, 5 Feb 2008, Simon Riggs wrote: > > > On Tue, 2008-02-05 at 15:50 -0500, Jignesh K. Shah wrote: > >> > >> Even if it is a single core, the mere fact that the loading process > will > >> eventually wait for a read from the

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Greg Smith
On Tue, 5 Feb 2008, Simon Riggs wrote: On Tue, 2008-02-05 at 15:50 -0500, Jignesh K. Shah wrote: Even if it is a single core, the mere fact that the loading process will eventually wait for a read from the input file which cannot be non-blocking, the OS can timeslice it well for the second pro

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Simon Riggs
On Tue, 2008-02-05 at 15:50 -0500, Jignesh K. Shah wrote: > Is there a way such an operation can be spawned as a worker process? > Generally during such loading - which most people will do during > "offpeak" hours I expect additional CPU resources available. By > delegating such additional work

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Heikki Linnakangas
Jignesh K. Shah wrote: Is there a way such an operation can be spawned as a worker process? Generally during such loading - which most people will do during "offpeak" hours I expect additional CPU resources available. By delegating such additional work to worker processes, we should be able to

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Jignesh K. Shah
Commercial Db bulk loaders work the same way.. they give you an option as a fast loader provided in case of error, the whole table is truncated. This I think also has real life advantages where PostgreSQL is used as datamarts which are recreated every now and then from other systems and they w

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Jignesh K. Shah
Hi Heikki, Is there a way such an operation can be spawned as a worker process? Generally during such loading - which most people will do during "offpeak" hours I expect additional CPU resources available. By delegating such additional work to worker processes, we should be able to capitalize

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Heikki Linnakangas
Dimitri Fontaine wrote: Le mardi 05 février 2008, Simon Riggs a écrit : I'll look at COPY FROM internals to make this faster. I'm looking at this now to refresh my memory; I already had some plans on the shelf. Maybe stealing some ideas from pg_bulkload could somewhat help here? http://pgfou

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Simon Riggs
On Tue, 2008-02-05 at 13:47 -0500, Jignesh K. Shah wrote: > That sounds cool to me too.. > > How much work is to make pg_bulkload to work on 8.3? An Integrated > version is certainly more beneficial. > Specially I think it will also help for other setups like TPC-E too > where this is a problem

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Jignesh K. Shah
That sounds cool to me too.. How much work is to make pg_bulkload to work on 8.3? An Integrated version is certainly more beneficial. Specially I think it will also help for other setups like TPC-E too where this is a problem. Regards, Jignesh Simon Riggs wrote: On Tue, 2008-02-05 at 1

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Simon Riggs
On Tue, 2008-02-05 at 18:15 +0100, Dimitri Fontaine wrote: > Le mardi 05 février 2008, Simon Riggs a écrit : > > I'll look at COPY FROM internals to make this faster. I'm looking at > > this now to refresh my memory; I already had some plans on the shelf. > > Maybe stealing some ideas from pg_bulk

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Dimitri Fontaine
Le mardi 05 février 2008, Simon Riggs a écrit : > I'll look at COPY FROM internals to make this faster. I'm looking at > this now to refresh my memory; I already had some plans on the shelf. Maybe stealing some ideas from pg_bulkload could somewhat help here? http://pgfoundry.org/docman/view.php

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Dimitri Fontaine
Le mardi 05 février 2008, Simon Riggs a écrit : > It runs a stream of COPY statements, so only first would be optimized > with the "empty table optimization". The number of rows per COPY statement is configurable, so provided you have an estimation of the volume to import (wc -l), you could tweak

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Matthew
On Tue, 5 Feb 2008, Richard Huxton wrote: So what's wrong with "reserving" the space using the WAL, then everyone else will know. After all, when you write the data to the WAL, you must have an idea of where it is meant to end up. My suggestion is that you go through all the motions of writing

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Jignesh K. Shah
One of the problems with "Empty Table optimization" is that if there are indexes created then it is considered as no longer empty. Commercial databases have options like "IRRECOVERABLE" clause along with DISK PARTITIONS and CPU partitions for their bulk loaders. So one option turns off loggi

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Richard Huxton
Matthew wrote: On Tue, 5 Feb 2008, Richard Huxton wrote: Why would you need to lock the table? Because you're not really writing the WAL, which means you can't let anyone else get their data into any of the blocks you are writing into. You'd basically want to write the disk blocks then "atta

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Richard Huxton
Simon Riggs wrote: On Tue, 2008-02-05 at 15:05 +, Richard Huxton wrote: Only by locking the table, which serializes access, which then slows you down or at least restricts other options. Plus if you use pg_loader then you'll find only the first few rows optimized and all the rest not. Hmm -

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Matthew
On Tue, 5 Feb 2008, Richard Huxton wrote: Why would you need to lock the table? Because you're not really writing the WAL, which means you can't let anyone else get their data into any of the blocks you are writing into. You'd basically want to write the disk blocks then "attach" them in some

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Richard Huxton
Matthew wrote: On Tue, 5 Feb 2008, Simon Riggs wrote: In the case of a bulk upload to an empty table (or partition?) could you not optimise the WAL away? That is, shouldn't the WAL basically be a simple transformation of the on-disk blocks? You'd have to explicitly sync the file(s) for the table

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Matthew
On Tue, 5 Feb 2008, Simon Riggs wrote: In the case of a bulk upload to an empty table (or partition?) could you not optimise the WAL away? That is, shouldn't the WAL basically be a simple transformation of the on-disk blocks? You'd have to explicitly sync the file(s) for the table/indexes of cour

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Simon Riggs
On Tue, 2008-02-05 at 15:05 +, Richard Huxton wrote: > > Only by locking the table, which serializes access, which then slows you > > down or at least restricts other options. Plus if you use pg_loader then > > you'll find only the first few rows optimized and all the rest not. > > Hmm - the

Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Viviane Lestic
2008/2/5, Ansgar -59cobalt- Wiechers wrote: > Your planner estimates are way off. Try increasing the statistics target > for the columns used in this query and re-analyze the tables after doing > so. I first set STATISTICS to 1000 for column zoneindex_test and saw no significant improvement (with

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Richard Huxton
Simon Riggs wrote: On Tue, 2008-02-05 at 14:43 +, Richard Huxton wrote: Simon Riggs wrote: On Tue, 2008-02-05 at 15:06 +0100, Dimitri Fontaine wrote: Le lundi 04 février 2008, Jignesh K. Shah a écrit : Multiple table loads ( 1 per table) spawned via script is bit better but hits wal prob

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Simon Riggs
On Tue, 2008-02-05 at 14:43 +, Richard Huxton wrote: > Simon Riggs wrote: > > On Tue, 2008-02-05 at 15:06 +0100, Dimitri Fontaine wrote: > >> > >> Le lundi 04 février 2008, Jignesh K. Shah a écrit : > > >>> Multiple table loads ( 1 per table) spawned via script is bit better > >>> but hits wa

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Matthew
On Tue, 5 Feb 2008, Richard Huxton wrote: In the case of a bulk upload to an empty table (or partition?) could you not optimise the WAL away? Argh. If I hadn't had to retype my email, I would have suggested that before you. ;) Matthew -- Unfortunately, university regulations probably prohi

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Matthew
Apologies for the blank email - mailer problems. I lost all my nicely typed stuff, too. On Tue, 5 Feb 2008, Dimitri Fontaine wrote: Multiple table loads ( 1 per table) spawned via script is bit better but hits wal problems. pgloader will too hit the WAL problem, but it still may have its ben

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Richard Huxton
Simon Riggs wrote: On Tue, 2008-02-05 at 15:06 +0100, Dimitri Fontaine wrote: Le lundi 04 février 2008, Jignesh K. Shah a écrit : Multiple table loads ( 1 per table) spawned via script is bit better but hits wal problems. pgloader will too hit the WAL problem, but it still may have its bene

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Matthew
---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Simon Riggs
On Tue, 2008-02-05 at 15:06 +0100, Dimitri Fontaine wrote: > Hi, > > Le lundi 04 février 2008, Jignesh K. Shah a écrit : > > Single stream loader of PostgreSQL takes hours to load data. (Single > > stream load... wasting all the extra cores out there) > > I wanted to work on this at the pgloader

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Dimitri Fontaine
Hi, Le lundi 04 février 2008, Jignesh K. Shah a écrit : > Single stream loader of PostgreSQL takes hours to load data. (Single > stream load... wasting all the extra cores out there) I wanted to work on this at the pgloader level, so CVS version of pgloader is now able to load data in parallel,

Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Guillaume Smet
On Feb 5, 2008 12:47 PM, Viviane Lestic <[EMAIL PROTECTED]> wrote: > Could someone help me analyze this problem? > I don't manage to see if the problem comes from bad tsearch tuning, > postgresql configuration, or something else... Can you try to replace zoneindex_test @@ q with zoneindex_test @@

Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Ansgar -59cobalt- Wiechers
On 2008-02-05 Viviane Lestic wrote: > QUERY PLAN > - > Sort (cost=2345.54..2345.58 rows=16 width=308) (actual > time=270638.774..270643.142 row

[PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Viviane Lestic
Hi, I'm having a performance problem on a request using Tsearch2: the request strangely takes several minutes. I've tried to follow Tsearch tuning recommendations, I've searched through the archives, but I can't seem to find a solution to solve my problem. The ts_vector field was created using d

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Simon Riggs
On Mon, 2008-02-04 at 17:55 -0500, Jignesh K. Shah wrote: > Doing it at low scales is not attractive. > > Commercial databases are publishing at scale factor of 1000(about 1TB) > to 1(10TB) with one in 30TB space. So ideally right now tuning > should start at 1000 scale factor. I don't unde

Re: [PERFORM] Performance problems inside a stored procedure.

2008-02-05 Thread Matthew Lunnon
Thanks for your help Андрей your English is easily understandable and much better than my ... (Russian?). I managed to get the results of an analyze and this showed that an index was not being used correctly. It seems that I was passing in a varchar and not casting it to an int and this stopped

Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Simon Riggs
On Mon, 2008-02-04 at 17:33 -0500, Jignesh K. Shah wrote: > First of all I think it is a violation of TPC rules to publish numbers > without auditing them first. So even if I do the test to show the > better performance of PostgreSQL 8.3, I cannot post it here or any > public forum without doi