Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
On 29 Apr 2004 at 19:17, Tom Lane wrote: > Josh Berkus <[EMAIL PROTECTED]> writes: > > Certainly the fact that MSSQL is essentially a single-user database makes > > things easier for them. > > Our recent testing (cf the "Xeon" thread) says that the interlocking we > do to make the world safe for

Re: [PERFORM] Insert only tables and vacuum performance

2004-04-29 Thread Joseph Shraibman
Tom Lane wrote: Joseph Shraibman <[EMAIL PROTECTED]> writes: I have a table that is never updated, only INSERTED into. Is there a way I can prevent vacuum wasting time on this table What makes you think vacuum is wasting much time on this table? AFAICS it will only update any unfixed hint bits .

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Dennis Bjorklund
On Fri, 30 Apr 2004, Gary Doades wrote: > I should have also pointed out that MSSQL reported that same index scan > as taking 65% of the overall query time. It was just "faster". The > overall query took 103ms in MSSQL. Are your results based on a single client accessing the database and no conc

Re: [PERFORM] Insert only tables and vacuum performance

2004-04-29 Thread Tom Lane
Joseph Shraibman <[EMAIL PROTECTED]> writes: > I have a table that is never updated, only INSERTED into. Is there a way > I can prevent vacuum wasting time on this table What makes you think vacuum is wasting much time on this table? AFAICS it will only update any unfixed hint bits ...

Re: [PERFORM] why can't 2 indexes be used at once?

2004-04-29 Thread Tom Lane
Joseph Shraibman <[EMAIL PROTECTED]> writes: > My question: why can't it go through the first index, get a list of > pages in the table, then go through the second index, union the result > with the results from first index, and then go into the table? See TODO list ...

Re: [PERFORM] Insert only tables and vacuum performance

2004-04-29 Thread Joseph Shraibman
Rod Taylor wrote: Or even better an offset into the datatable for the earliest deleted row, so if you have a table where you update the row shortly after insert and then never touch it vacuum can skip most of the table (inserts are done at the end of the table, right?) Inserts are done at the e

Re: [PERFORM] Insert only tables and vacuum performance

2004-04-29 Thread Rod Taylor
> Or even better an offset into the datatable for the earliest deleted > row, so if you have a table where you update the row shortly after > insert and then never touch it vacuum can skip most of the table > (inserts are done at the end of the table, right?) Inserts are done at the end of the

Re: [PERFORM] Insert only tables and vacuum performance

2004-04-29 Thread Joseph Shraibman
Joseph Shraibman wrote: I have a table that is never updated, only INSERTED into. Is there a way I can prevent vacuum wasting time on this table besides vacuuming each table in the db by itself and omitting this table? How feasable would it be to have a marker somewhere in pg that is "updated s

[PERFORM] why can't 2 indexes be used at once?

2004-04-29 Thread Joseph Shraibman
Lets say I have two columns, A and B. They are each indexed seperately. If I do a query like: SELECT * FROM table WHERE A = 1 AND B = 2; postgres can only use one index. I assume that postgres uses the index data to narrow down pages in the table to visit when doing its search. Then it goes th

[PERFORM] Insert only tables and vacuum performance

2004-04-29 Thread Joseph Shraibman
I have a table that is never updated, only INSERTED into. Is there a way I can prevent vacuum wasting time on this table besides vacuuming each table in the db by itself and omitting this table? How feasable would it be to have a marker somewhere in pg that is "updated since last vacuum" that w

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes: > Certainly the fact that MSSQL is essentially a single-user database makes > things easier for them. Our recent testing (cf the "Xeon" thread) says that the interlocking we do to make the world safe for multiple backends has a fairly high cost (at least on

[PERFORM] analyzer/planner and clustered rows

2004-04-29 Thread Joseph Shraibman
How does the analyzer/planner deal with rows clustered together? Does it just assume that if this col is clustered on then the actual data will be clustered? What if the data in the table happens to be close together because it was inserted together originally? ---(end

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
> > Having picked out an index scan as being the highest time user I > concentrated on that in this case and compared the same index scan on > MSSQL. At least MSSQL reported it as an index scan on the same index > for the same number of rows. > I should have also pointed out that MSSQL rep

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
On 29 Apr 2004 at 17:54, Tom Lane wrote: > "Gary Doades" <[EMAIL PROTECTED]> writes: > > In this example the statistics don't matter. > > Don't they? > > A prior poster mentioned that he thought MSSQL tries to keep all its > indexes in memory.  I wonder whether you are giving Postgres a fair

Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Josh Berkus
Vitaly, > I am in the middle of going through them all, till now disabling the > enable_mergejoin really helped. In that case, your random_page_cost is probably too low. Check the ratio of per-tuple times on index vs. seqscan seeks. -- -Josh Berkus Aglio Database Solutions San Francisco

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Tom Lane
"Gary Doades" <[EMAIL PROTECTED]> writes: > In this example the statistics don't matter. Don't they? A prior poster mentioned that he thought MSSQL tries to keep all its indexes in memory. I wonder whether you are giving Postgres a fair chance to do the same. What postgresql.conf settings are y

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
On 29 Apr 2004 at 13:54, Josh Berkus wrote: > Gary, > > > It's also quite possble the MSSQL simply has more efficient index scanning > implementation that we do.They've certainly had incentive; their storage > system sucks big time for random lookups and they need those fast indexes. > (

Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Vitaly Belman
Hello Manfred, I thank everyone for helping me on this - So many tips. I am in the middle of going through them all, till now disabling the enable_mergejoin really helped. Also, I agree that the design might be flawed (I could use triggers and stuff like that) but for now I am just comparing how

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Josh Berkus
Gary, > In this example the statistics don't matter. The plans used were the same for > MSSQL and Postgres. I was trying to eliminate the difference in plans > between the two, which obviously does make a difference, sometimes in > MSSQL favour and sometimes the other way round. Both systems,

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
On 29 Apr 2004 at 15:35, Kenneth Marshall wrote: > Did you try to cluster based on the index? > > --Ken Yes, This speeds up the index scan a little (12%). This to me just reinforces the overhead that subsequently having to go and fetch the data tuple actually has on the performance. Cheers, G

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
> It's also entirely possible your indices are using inaccurate > statistical information. Have you ANALYZEd recently? > In this example the statistics don't matter. The plans used were the same for MSSQL and Postgres. I was trying to eliminate the difference in plans between the two, which o

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Rosser Schwarz
while you weren't looking, Gary Doades wrote: > Recently I have been looking at raw performance (CPU, IO) > rather than the plans. I have some test queries that (as far > as I can determine) use the same access plans on PostgreSQL > and SQLServer. Getting to the detail, an index scan of an > ind

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Rod Taylor
> I would be nice to get a feel for how much performance loss would be incurred in > maintaining the index flags against possible performance gains for getting the data > back > out again. I guess the real question is, why maintain index flags and not simply drop the index entry altogether? A

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
> > I guess the real question is, why maintain index flags and not simply > drop the index entry altogether? > > A more interesting case would be to have the backend process record > index tuples that it would invalidate (if committed), then on commit > send that list to a garbage collection proc

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
On 29 Apr 2004 at 19:03, Manfred Koizar wrote: > While the storage overhead could be reduced to 1 bit (not a joke) we'd > still have the I/O overhead of locating and updating index tuples for > every heap tuple deleted/updated. But this is what a lot of DBMSs do and seem to do well enough. I can

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-29 Thread Josh Berkus
Rob, > I would like to see the same, as I have a system that exhibits the same behavior > on a production db that's running 7.4.1. If you checked the thread follow-ups, you'd see that *decreasing* spins_per_delay was not beneficial. Instead, try increasing them, one step at a time: (take b

Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Manfred Koizar
On Thu, 29 Apr 2004 13:36:47 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote: >The reason for the function is that the sort routines (hash aggregation >included) will not stop in mid-sort Good point. Servus Manfred ---(end of broadcast)--- TIP 9: the

Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Rod Taylor
On Thu, 2004-04-29 at 13:13, Manfred Koizar wrote: > On Wed, 28 Apr 2004 08:23:35 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote: > >The below plpgsql forces the kind of algorithm we wish the planner could > >choose. It should be fairly quick irregardless of dataset. > > That reminds me of hash aggre

Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Manfred Koizar
On Wed, 28 Apr 2004 08:23:35 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote: >The below plpgsql forces the kind of algorithm we wish the planner could >choose. It should be fairly quick irregardless of dataset. That reminds me of hash aggregation. So here's another idea for Vitaly: SELECT b

Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Manfred Koizar
On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> [ ... visibility information in index tuples ... ] >Storing that information would at least double the overhead space used >for each index tuple. The resulting index bloat would significantly >slow index operations by requ

Re: [PERFORM] [HACKERS] Number of pages in a random sample

2004-04-29 Thread Manfred Koizar
On Mon, 26 Apr 2004 08:08:16 -0700, Sailesh Krishnamurthy <[EMAIL PROTECTED]> wrote: > "A Bi-Level Bernoulli Scheme for Database Sampling" > Peter Haas, Christian Koenig (SIGMOD 2004) Does this apply to our problem? AFAIK with Bernoulli sampling you don't know the sample size in advance. Anyway

[PERFORM] Use arrays or not?

2004-04-29 Thread Roelant Ossewaarde
Hi, I am building an application using postgresql to store XML-records. There is a debate within the group of developers about the best way to store our data. I hope you can help us make a decision. The data consists of XML-records, with a lot of XML-fields. I want to store the XML as it is, so t

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-29 Thread ohp
Hi I'd LOVE to contribute on this but I don't have vmstat and I'm not running linux. How can I help? Regards On Wed, 28 Apr 2004, Robert Creager wrote: > Date: Wed, 28 Apr 2004 18:57:53 -0600 > From: Robert Creager <[EMAIL PROTECTED]> > To: Josh Berkus <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED]