Re: Ending EXPLAIN ANALYZE early (was Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work)

2006-06-12 Thread Zeugswetter Andreas DCP SD
> >> This bothers me a bit, because in > >> fact the effects if any of the tested query would have been rolled > >> back. Not sure we have any choice though. If we expose the error > >> then we'll have problems with clients not showing the EXPLAIN > >> results. > > > I think we should leave

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-10 Thread Benny Amorsen
> "MvO" == Martijn van Oosterhout writes: MvO> What we want is just a monotonically increasing counter that can MvO> be read quickly and consistantly, we're not majorly fussed if it MvO> doesn't match real time. This puts us back to CPU cycle counters, MvO> but they have drawbacks of their ow

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-10 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > There seem to be two types of overhead going on. There's the amount of time > spent in gettimeofday itself which is pretty consistent. That is a fact not in evidence. The impression I had from that linux-kernel discussion was that the problematic kernel co

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-10 Thread Tom Lane
Martijn van Oosterhout writes: > The interesting thing about this is that they obviously are gearing > gettimeofday() to be accurate, rather than just considering it a > counter that is somewhat close to real time. At the expense of speed. Not sure that that's an accurate description. What I thi

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-10 Thread Tom Lane
Martijn van Oosterhout writes: > Right now I'm confused though. I was under the impression the changes > were going to be ripped out because it was decided to be unworkable. I > think improvements can be made but I'm unsure if there's any > interest... I've reverted the current patch because it c

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Martijn van Oosterhout
On Fri, Jun 09, 2006 at 03:55:28PM -0400, Tom Lane wrote: > Greg Stark <[EMAIL PROTECTED]> writes: > > There seem to be two types of overhead going on. There's the amount of time > > spent in gettimeofday itself which is pretty consistent. > > That is a fact not in evidence. The impression I had

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > To avoid user confusion it would reasonable to print out a line at the > > bottom: > > > Explain analyze profiling overhead removed: xxx ms > > > That also gives the user feedback on how precise their explain an

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > To avoid user confusion it would reasonable to print out a line at the bottom: > Explain analyze profiling overhead removed: xxx ms > That also gives the user feedback on how precise their explain analyze results > are. If they see that the overhead bein

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > Are we still going to subtract out the gettimeofday overhead? > > > I was always more excited about that than the sampling aspect. I've run into > > queries where EXPLAIN ANALYZE results were deceptive due to the ge

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Are we still going to subtract out the gettimeofday overhead? > I was always more excited about that than the sampling aspect. I've run into > queries where EXPLAIN ANALYZE results were deceptive due to the gettimeofday > overhead but I've never run into a

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > I was intending to push forward with the idea of being able to get > numbers out of a canceled EXPLAIN. That will allow you to get some > information even when the underlying query runs longer than you're > willing to tolerate. I still say that the number o

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Martijn van Oosterhout
On Fri, Jun 09, 2006 at 11:22:11AM -0400, A.M. wrote: > It would be nice to keep the gettimeofday()s wherever they are most useful > on hardware/software where they are cheap. Perhaps a compile-time option? I think making it optional at compile time is worse, because then at support time you might

Re: Ending EXPLAIN ANALYZE early (was Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work)

2006-06-09 Thread Tom Lane
"Zeugswetter Andreas DCP SD" <[EMAIL PROTECTED]> writes: >> This bothers me a bit, because in >> fact the effects if any of the tested query would have been >> rolled back. Not sure we have any choice though. If we >> expose the error then we'll have problems with clients not >> showing the E

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Simon Riggs
On Fri, 2006-06-09 at 10:56 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I propose we revert the sampling patch (sorry Martijn) > > yah ... > > > and go with the > > patch to have an explain_analyze_timing parameter (default=on). > > This I'm unexcited about. EXPLAIN out

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Tom Lane
Martijn van Oosterhout writes: > On Fri, Jun 09, 2006 at 10:00:20AM -0400, Tom Lane wrote: >> I had thought we were applying an order-of-ten-percent correction by >> subtracting SampleOverhead, not an order-of-10x correction :-( > Eh? The whole point is to call gettimeofday() much less often. If

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread A.M.
It would be nice to keep the gettimeofday()s wherever they are most useful on hardware/software where they are cheap. Perhaps a compile-time option? On Fri, June 9, 2006 11:18 am, Martijn van Oosterhout wrote: > On Fri, Jun 09, 2006 at 10:00:20AM -0400, Tom Lane wrote: > >> To tell you the truth,

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Martijn van Oosterhout
On Fri, Jun 09, 2006 at 10:00:20AM -0400, Tom Lane wrote: > To tell you the truth, this information makes me even less pleased with > the sampling-gettimeofday patch than I was before. If gettimeofday() in > itself increases the runtime of a node by a factor of 10, then just > trying to subtract o

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > I propose we revert the sampling patch (sorry Martijn) yah ... > and go with the > patch to have an explain_analyze_timing parameter (default=on). This I'm unexcited about. EXPLAIN output isn't all that transparent anyway, and losing the extra cue of se

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Simon Riggs
On Fri, 2006-06-09 at 10:00 -0400, Tom Lane wrote: > I think the bottom line here is that there are some machines out there > where gettimeofday() is fast enough for our purposes, and some where > it is nowhere near fast enough. And that kernel-level fixes may be > possible for some of the latter

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Tom Lane
"Larry Rosenman" writes: > Does my post from yesterday (39ms without explain analyze, 280+ with explain > analyze) on FreeBSD/amd64, Dual Xeon's in HTT mode help? Well, it confirms that FreeBSD is subject to the same problem ;-). I think the bottom line here is that there are some machines out t

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Hmm, I take that back. I updated to the current CVS tip on this machine > > and now I don't see the problem. Without EXPLAIN ANALYZE, the times are > > around 115-117 ms. With EXPLAIN ANALYZE, 149-152ms. > > At the moment, CVS HEA

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Larry Rosenman
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: >> Hmm, I take that back. I updated to the current CVS tip on this >> machine and now I don't see the problem. Without EXPLAIN ANALYZE, >> the times are around 115-117 ms. With EXPLAIN ANALYZE, 149-152ms. > > At the moment, CVS HEAD s

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Hmm, I take that back. I updated to the current CVS tip on this machine > and now I don't see the problem. Without EXPLAIN ANALYZE, the times are > around 115-117 ms. With EXPLAIN ANALYZE, 149-152ms. At the moment, CVS HEAD still has the sampling pat

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Alvaro Herrera
Simon Riggs wrote: > On Thu, 2006-06-08 at 17:21 -0400, Tom Lane wrote: > > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > > Tried on two machines. The first (Turion64 laptop) gives 44-45 ms for > > > the SELECT, and 50-51 ms for the EXPLAIN ANALYZE. > > > > > The second machine, desktop Celeron

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Alvaro Herrera
Alvaro Herrera wrote: > The second machine, desktop Celeron 533, gives 197-200 ms for the SELECT > and 788-790 for the EXPLAIN ANALYZE. I guess this is the reproduction > you were looking for. Hmm, I take that back. I updated to the current CVS tip on this machine and now I don't see the proble

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-09 Thread Simon Riggs
On Thu, 2006-06-08 at 17:21 -0400, Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Tried on two machines. The first (Turion64 laptop) gives 44-45 ms for > > the SELECT, and 50-51 ms for the EXPLAIN ANALYZE. > > > The second machine, desktop Celeron 533, gives 197-200 ms for the S

Re: Ending EXPLAIN ANALYZE early (was Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work)

2006-06-09 Thread Zeugswetter Andreas DCP SD
> This bothers me a bit, because in > fact the effects if any of the tested query would have been > rolled back. Not sure we have any choice though. If we > expose the error then we'll have problems with clients not > showing the EXPLAIN results. I think we should leave it in top level, thro

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Agent M
It's worth noting that on Darwin (on Apple hardware) gettimeofday is never a syscall whereas on Linux (AFAIK), it always is. On Jun 8, 2006, at 7:58 PM, Mark Kirkwood wrote: Tom Lane wrote: Alvaro Herrera <[EMAIL PROTECTED]> writes: Wow, that is slow. Maybe a problem in the kernel? Perhaps

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Mark Kirkwood
Tom Lane wrote: Alvaro Herrera <[EMAIL PROTECTED]> writes: Wow, that is slow. Maybe a problem in the kernel? Perhaps something similar to this: http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/index.html#1282 Yeah, that's a pretty interesting thread. I came across something similar on a

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Wow, that is slow. Maybe a problem in the kernel? Perhaps something > similar to this: > http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/index.html#1282 Yeah, that's a pretty interesting thread. I came across something similar on a Red Hat inter

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Jim C. Nasby
On Thu, Jun 08, 2006 at 05:32:36PM -0400, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > The powerbook tests were not very repeatable at 100,000 rows, so I > > bumped up to 1M. The results still aren't very repeatable... > > Hmm, I suspect you pushed it into the realm where it's

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Jim C. Nasby
Just tested 2 windows boxes, both running 8.1.3 and XP SP2. P4 2.8G desktop 47ms297ms Pentium M 2G laptop 40ms240ms -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasi

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > The powerbook tests were not very repeatable at 100,000 rows, so I > bumped up to 1M. The results still aren't very repeatable... Hmm, I suspect you pushed it into the realm where it's doing actual I/O ... which we don't want for this particular test.

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Tried on two machines. The first (Turion64 laptop) gives 44-45 ms for > > the SELECT, and 50-51 ms for the EXPLAIN ANALYZE. > > > The second machine, desktop Celeron 533, gives 197-200 ms for the SELECT > > and 788-790 for the EXPLA

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Jim C. Nasby
On Thu, Jun 08, 2006 at 04:58:07PM -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > So the timing is clearly responsible for the additional time I'm > > personally experiencing and very likely to be that for others also. > > Well, that's certainly unsurprising, but the question

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tried on two machines. The first (Turion64 laptop) gives 44-45 ms for > the SELECT, and 50-51 ms for the EXPLAIN ANALYZE. > The second machine, desktop Celeron 533, gives 197-200 ms for the SELECT > and 788-790 for the EXPLAIN ANALYZE. I guess this is

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Alvaro Herrera
Tom Lane wrote: > If anyone else is still following this discussion, could you try the > test case Simon gave a bit earlier (select count(*) from 10-row > table, check \timing with and without explain analyze)? If we could get > a clearer picture of which platforms show the problem and which

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Larry Rosenman
Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: >> So the timing is clearly responsible for the additional time I'm >> personally experiencing and very likely to be that for others also. > > Well, that's certainly unsurprising, but the question is why it's such > a large overhead for you

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Alvaro Herrera
Simon Riggs wrote: > postgres=# explain analyze select count(*) from accounts; >QUERY PLAN > > Aggregate (cost=2890.00

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > So the timing is clearly responsible for the additional time I'm > personally experiencing and very likely to be that for others also. Well, that's certainly unsurprising, but the question is why it's such a large overhead for you when it's not on other ap

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Simon Riggs
On Thu, 2006-06-08 at 12:56 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > So far we have myself, Kevin, Martijn and Luke all saying there is a > > distortion or a massive overhead caused by EXPLAIN ANALYZE. > > http://archives.postgresql.org/pgsql-hackers/2006-03/msg00954.php

Re: Ending EXPLAIN ANALYZE early (was Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work)

2006-06-08 Thread Alvaro Herrera
Tom Lane wrote: > A possible objection to this is that running a query inside a > subtransaction might have different/worse performance than running it > at top level. I don't recall any severe bottlenecks of that kind but > that doesn't mean there aren't any (Alvaro, any comments?) Nope, nothin

Re: Ending EXPLAIN ANALYZE early (was Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work)

2006-06-08 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: > > Tom Lane <[EMAIL PROTECTED]> writes: > > Would it be possible to make a whole new protocol message for EXPLAIN > > results? > > I'm really unwilling to get into that. For one thing, that would > absolutely posi

Re: Ending EXPLAIN ANALYZE early (was Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work)

2006-06-08 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: > Would it be possible to make a whole new protocol message for EXPLAIN results? I'm really unwilling to get into that. For one thing, that would absolutely positively break *all* use of EXPLAIN from un-fixed clients

Re: Ending EXPLAIN ANALYZE early (was Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work)

2006-06-08 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > After we've printed the results, we have a bit of a problem: if > ExplainCancelPending is set, we now need to abort the transaction. It > would not do at all to allow an incompletely executed UPDATE to commit. > I experimented with throwing an elog() at the

Ending EXPLAIN ANALYZE early (was Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work)

2006-06-08 Thread Tom Lane
I wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: >> A full EXPLAIN ANALYZE is always desirable - we agree on that. The >> question is what we do when one is not available. > The least bad alternative I've heard is to let EXPLAIN ANALYZE print > out stats-so-far if the query is canceled by contro

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > So far we have myself, Kevin, Martijn and Luke all saying there is a > distortion or a massive overhead caused by EXPLAIN ANALYZE. > http://archives.postgresql.org/pgsql-hackers/2006-03/msg00954.php > http://archives.postgresql.org/pgsql-patches/2006-05/msg

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Simon Riggs
On Thu, 2006-06-08 at 10:27 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Wed, 2006-06-07 at 17:28 -0400, Tom Lane wrote: > >> The overhead seems to be on the order of a couple tens of percent usually. > >> I don't see how that makes the difference between an EXPLAIN ANALY

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > On Wed, 2006-06-07 at 17:28 -0400, Tom Lane wrote: >> The overhead seems to be on the order of a couple tens of percent usually. >> I don't see how that makes the difference between an EXPLAIN ANALYZE you >> can run and one you can't. > Well, thats not my

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-08 Thread Simon Riggs
On Wed, 2006-06-07 at 17:28 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Wed, 2006-06-07 at 16:56 -0400, Tom Lane wrote: > >> Certainly the removal of timing > >> is not going to convert an intolerable EXPLAIN ANALYZE runtime into an > >> acceptable one; > > > I disagre

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Jim C. Nasby
On Wed, Jun 07, 2006 at 05:05:23PM -0400, Tom Lane wrote: > The other thing that was bothering me was whether disk I/O might be > undercounted by an interrupt-driven method. I kinda doubt that any > kernel will save up N interrupts that occur while the process is blocked > on a slow read() ... you

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > On Wed, 2006-06-07 at 16:56 -0400, Tom Lane wrote: >> Certainly the removal of timing >> is not going to convert an intolerable EXPLAIN ANALYZE runtime into an >> acceptable one; > I disagree, as have others. The overhead seems to be on the order of a co

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Simon Riggs
On Wed, 2006-06-07 at 16:56 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Wed, 2006-06-07 at 15:32 -0400, Tom Lane wrote: > >> I propose we revert this patch and think about an interrupt-driven > >> sampling method instead. > > > I don't have much more faith in crazy sche

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Martijn van Oosterhout
On Wed, Jun 07, 2006 at 04:04:33PM -0400, Tom Lane wrote: > Martijn van Oosterhout writes: > > Anyway, as a test, if you take the approach that the measurement at > > item X only applies to the tuples immediately preceding it, for the > > data you posted you get a result of 0.681148 seconds. How l

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > On Wed, Jun 07, 2006 at 11:34:30AM -0400, Tom Lane wrote: >> timer interrupt routine does this once every few milliseconds: > One issue is that on very fast queries, you'd get absolutely no data > this way. Yeah. Of course, on "very fast" queries you

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > On Wed, 2006-06-07 at 15:32 -0400, Tom Lane wrote: >> I propose we revert this patch and think about an interrupt-driven >> sampling method instead. > I don't have much more faith in crazy scheme No.2 either. (Mine or > yours...) > Can we just have an opt

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Simon Riggs
On Wed, 2006-06-07 at 15:32 -0400, Tom Lane wrote: > Ah-hah, I've sussed it. > so I'm > afraid this approach to sampling EXPLAIN ANALYZE is a failure. Hmmm, I thought we already got that bit...sorry to sound negative. > I propose we revert this patch and think about an interrupt-driven > sampli

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Jim C. Nasby
On Wed, Jun 07, 2006 at 11:34:30AM -0400, Tom Lane wrote: > timer interrupt routine does this once every few milliseconds: One issue is that on very fast queries, you'd get absolutely no data this way. A possible solution would be to start with an extremely short timer interval, and ramp it up if

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
Martijn van Oosterhout writes: > Anyway, as a test, if you take the approach that the measurement at > item X only applies to the tuples immediately preceding it, for the > data you posted you get a result of 0.681148 seconds. How long did that > query run that produced that data? I didn't save t

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Martijn van Oosterhout
On Wed, Jun 07, 2006 at 03:32:35PM -0400, Tom Lane wrote: > On reflection it's easy to imagine other cases where some iterations > take much longer than others in a not-very-predictable way. For > instance, a join where only a subset of the outer tuples have a match > is going to act that way. I

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
Ah-hah, I've sussed it. The faulty assumption can actually be stated as: "all the executions, except maybe the first, will take approximately the same amount of time". The failing test case I've been looking at is one where the system decides to use a "batched" hash join, and in this plan type so

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> Nope, that is certainly not the explanation, because the hash table is >> loaded in the (single) call of the Hash node at the start of the query. >> It is static all through the sampled-and-not executions of the Hash J

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Nope, that is certainly not the explanation, because the hash table is > loaded in the (single) call of the Hash node at the start of the query. > It is static all through the sampled-and-not executions of the Hash Join > node, which is where our problem is.

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Martijn van Oosterhout writes: >> This would mean that we wouldn't be assuming that tuples near the end >> take as long as tuples near the beginning. Except we're now dealing >> will smaller numbers, so I'm worried about error accumlation. > Hm, that would

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Greg Stark
Martijn van Oosterhout writes: > This would mean that we wouldn't be assuming that tuples near the end > take as long as tuples near the beginning. Except we're now dealing > will smaller numbers, so I'm worried about error accumlation. Hm, that would explain why Hash joins suffer from this esp

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
Martijn van Oosterhout writes: > On Wed, Jun 07, 2006 at 09:53:32AM -0400, Tom Lane wrote: >> If we do have to revert, I'd propose that we pursue the notion of >> interrupt-driven sampling like gprof uses. > How would that work? You could then estimate how much time was spent in > each node, but

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Martijn van Oosterhout
On Wed, Jun 07, 2006 at 09:53:32AM -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Do we agree the idea can't go further? What next? > > It still needs investigation; I'm no longer convinced that the TLB-flush > theory is correct. See rest of thread. We may well have to reve

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > Do we agree the idea can't go further? What next? It still needs investigation; I'm no longer convinced that the TLB-flush theory is correct. See rest of thread. We may well have to revert the current patch, but I'd like to be sure we understand why. If

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-07 Thread Simon Riggs
On Tue, 2006-06-06 at 16:50 -0400, Tom Lane wrote: > I have a theory about this, and it's not pleasant at all. What I > think is that we have a Heisenberg problem here: the act of invoking > gettimeofday() actually changes what is measured. > If this theory is correct, then the entire notion of

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> And two, that upper plan nodes seem much more affected than lower >> ones. That makes sense because the execution cycle of an upper node >> will involve touching more userspace data than a lower node, and >> therefore

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > And two, that upper plan nodes seem much more affected than lower > ones. That makes sense because the execution cycle of an upper node > will involve touching more userspace data than a lower node, and > therefore more of the flushed TLB entries will need

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Tom Lane
Martijn van Oosterhout writes: > This doesn't make any sense at all. How can a sampling run that only > sampled 7% of the actual tuples, end up with a actual measured time > that's more than 50% of the actual final runtime? AFAICS, the only conclusion is that the sampled executions are in fact ta

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Martijn van Oosterhout
On Tue, Jun 06, 2006 at 04:50:28PM -0400, Tom Lane wrote: > But that still leaves me with a problem because my machine is clearly > overestimating the correction needed. I added some printouts and got > > raw totaltime = 0.370937 > per_iter = 0.000156913, SampleOverhead = 3.28e-06 > adj totaltime

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > If that's the case, then maybe a more sopdisticated method of measuring > the overhead would work. I think you missed the point: the time spent in gettimeofday() itself is not the major overhead of EXPLAIN ANALYZE. At least it appears that this is the

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Jim C. Nasby
On Tue, Jun 06, 2006 at 04:50:28PM -0400, Tom Lane wrote: > I have a theory about this, and it's not pleasant at all. What I > think is that we have a Heisenberg problem here: the act of invoking > gettimeofday() actually changes what is measured. That is, the > runtime of the "second part" of Ex

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Tom Lane
I wrote: > The problem looks to be an underestimation of SampleOverhead, and on > reflection it's clear why: what CalculateSampleOverhead is measuring > isn't the total overhead, but the time between the two gettimeofday > calls. Which is probably about half the true overhead. On further thought,

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Tom Lane
Martijn van Oosterhout writes: > Wierd. Can you get the output of *instr in each call of > InstrEndLoop? Preferably after it does the calculation but before it > clears the values. So we get an idea of number of samples and what it > guesses. SampleOverhead would be good too. The problem looks to

Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-06 Thread Martijn van Oosterhout
On Mon, Jun 05, 2006 at 11:02:33PM -0400, Tom Lane wrote: > Just got this rather surprising result: > The "Total runtime" is correct AFAICT, which puts the top node's "actual > time" rather far out in left field. This is pretty repeatable on my old > slow HPPA machine. A new Xeon shows less of

[HACKERS] That EXPLAIN ANALYZE patch still needs work

2006-06-05 Thread Tom Lane
Just got this rather surprising result: regression=# \timing Timing is on. regression=# explain analyze select * from tenk1 a join tenk1 b on a.unique1 = b.unique2; QUERY PLAN