Re: [HACKERS] gistchoose vs. bloat

Heikki Linnakangas Thu, 24 Jan 2013 12:53:38 -0800

On 24.01.2013 22:35, Tom Lane wrote:

Alexander Korotkov<aekorot...@gmail.com>  writes:

There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from "look
sequentially and choose random" to "look in random order". I think we need
to include comparison of CPU time.


Hmm ... actually, isn't that an argument in favor of Heikki's method?
The way he's doing it, we can exit without making additional penalty
calls once we've found a zero-penalty tuple and decided not to look
further (which is something with a pretty high probability).

No, I think Alexander is right, although I believe the difference isminimal in practice.

If we assume that the there are no zero-penalty tuples on the page, withboth approaches you have to call penalty on every tuple to know which isbest. If there are zero-penalty tuples, then there is a smalldifference. With Alexander's method, you can stop looking as soon as youfind a zero-penalty tuple (the order you check the tuples is random).With my method, you can stop looking as soon as you find a zero-penaltytuple, *and* you decide to not look further. With the 1/2 probability tostop looking further, you give up on average after 2 tuples.

So if I'm doing my math right, my patch does on average between 1x - 2xas many penalty calls as Alexander's, depending on how many zero-penaltytuples there are. The 2x case is when the page is full of zero-penaltytuples, in which case the difference isn't big in absolute terms; 2penalty calls per page versus 1.

BTW, one thing that I wondered about this: How expensive is random()?I'm assuming not very, but I don't really know. Alexander's patch calledrandom() for every tuple on the page, while I call it only once for eachequal-penalty tuple. If it's really cheap, I think my/Tom's patch couldbe slightly simplified by always initializing keep_current_best withrandom() at top of the function, and eliminating the -1 "haven't decidedyet" state. OTOH, if random() is expensive, I note that we only need onebit of randomness, so we could avoid calling random() so often if weutilized all the bits from each random() call.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

Reply via email to