On 24.01.2013 22:35, Tom Lane wrote:
Alexander Korotkov<aekorot...@gmail.com> writes:
There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from "look
sequentially and choose random" to "look in random order". I think we need
to include comparison of CPU time.
Hmm ... actually, isn't that an argument in favor of Heikki's method?
The way he's doing it, we can exit without making additional penalty
calls once we've found a zero-penalty tuple and decided not to look
further (which is something with a pretty high probability).
No, I think Alexander is right, although I believe the difference is
minimal in practice.
If we assume that the there are no zero-penalty tuples on the page, with
both approaches you have to call penalty on every tuple to know which is
best. If there are zero-penalty tuples, then there is a small
difference. With Alexander's method, you can stop looking as soon as you
find a zero-penalty tuple (the order you check the tuples is random).
With my method, you can stop looking as soon as you find a zero-penalty
tuple, *and* you decide to not look further. With the 1/2 probability to
stop looking further, you give up on average after 2 tuples.
So if I'm doing my math right, my patch does on average between 1x - 2x
as many penalty calls as Alexander's, depending on how many zero-penalty
tuples there are. The 2x case is when the page is full of zero-penalty
tuples, in which case the difference isn't big in absolute terms; 2
penalty calls per page versus 1.
BTW, one thing that I wondered about this: How expensive is random()?
I'm assuming not very, but I don't really know. Alexander's patch called
random() for every tuple on the page, while I call it only once for each
equal-penalty tuple. If it's really cheap, I think my/Tom's patch could
be slightly simplified by always initializing keep_current_best with
random() at top of the function, and eliminating the -1 "haven't decided
yet" state. OTOH, if random() is expensive, I note that we only need one
bit of randomness, so we could avoid calling random() so often if we
utilized all the bits from each random() call.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers