On 24.01.2013 22:35, Tom Lane wrote:
Alexander Korotkov<aekorot...@gmail.com>  writes:
There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from "look
sequentially and choose random" to "look in random order". I think we need
to include comparison of CPU time.

Hmm ... actually, isn't that an argument in favor of Heikki's method?
The way he's doing it, we can exit without making additional penalty
calls once we've found a zero-penalty tuple and decided not to look
further (which is something with a pretty high probability).

No, I think Alexander is right, although I believe the difference is minimal in practice.

If we assume that the there are no zero-penalty tuples on the page, with both approaches you have to call penalty on every tuple to know which is best. If there are zero-penalty tuples, then there is a small difference. With Alexander's method, you can stop looking as soon as you find a zero-penalty tuple (the order you check the tuples is random). With my method, you can stop looking as soon as you find a zero-penalty tuple, *and* you decide to not look further. With the 1/2 probability to stop looking further, you give up on average after 2 tuples.

So if I'm doing my math right, my patch does on average between 1x - 2x as many penalty calls as Alexander's, depending on how many zero-penalty tuples there are. The 2x case is when the page is full of zero-penalty tuples, in which case the difference isn't big in absolute terms; 2 penalty calls per page versus 1.

BTW, one thing that I wondered about this: How expensive is random()? I'm assuming not very, but I don't really know. Alexander's patch called random() for every tuple on the page, while I call it only once for each equal-penalty tuple. If it's really cheap, I think my/Tom's patch could be slightly simplified by always initializing keep_current_best with random() at top of the function, and eliminating the -1 "haven't decided yet" state. OTOH, if random() is expensive, I note that we only need one bit of randomness, so we could avoid calling random() so often if we utilized all the bits from each random() call.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to