On Thu, Jan 24, 2013 at 11:26 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote:
> On 21.01.2013 15:19, Heikki Linnakangas wrote: > >> On 21.01.2013 15:06, Tom Lane wrote: >> >>> Jeff Davis<pg...@j-davis.com> writes: >>> >>>> On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote: >>>> >>>>> I looked at this patch. ISTM we should not have the option at all but >>>>> just do it always. I cannot believe that always-go-left is ever a >>>>> preferable strategy in the long run; the resulting imbalance in the >>>>> index will surely kill any possible benefit. Even if there are some >>>>> cases where it miraculously fails to lose, how many users are going to >>>>> realize that applies to their case and make use of the option? >>>>> >>>> >>> Sounds good to me. >>>> >>> >>> If I remember correctly, there was also an argument that it may be >>>> useful for repeatable test results. That's a little questionable for >>>> performance (except in those cases where few penalties are identical >>>> anyway), but could plausibly be useful for a crash report or something. >>>> >>> >>> Meh. There's already a random decision, in the equivalent place and for >>> a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever >>> complained about that being indeterminate, so I'm unconvinced that >>> there's a market for it with gist. >>> >> >> I wonder if it would work for gist to do something similar to >> _bt_findinsertloc, and have a bias towards the left page, but sometimes >> descend to one of the pages to the right. You would get the cache >> locality of usually descending down the same subtree, but also fill the >> pages to the right. Jeff / Alexander, want to give that a shot? >> > > I did some experimenting with that. I used the same test case Alexander > did, with geonames data, and compared unpatched version, the original > patch, and the attached patch that biases the first "best" tuple found, but > still sometimes chooses the other equally good ones. > > testname | initsize | finalsize | idx_blks_read | idx_blks_hit > ----------------+----------+--**---------+---------------+----**---------- > patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331 > unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647 > unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759 > patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738 > origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373 > > All these tests were performed with shared_buffers=4MB, so that the index > won't fit completely in shared buffers, and I could use the > idx_blk_read/hit ratio as a measure of cache-friendliness. The "origpath" > test was with a simplified version of Alexander's patch, see attached. It's > the same as the original, but with the randomization-option and gist-build > related code removed. The patched-10 and patched-2 tests are two variants > with my patch, with 1/10 and 1/2 chance of moving to the next equal tuple, > respectively. The differences in cache hit ratio might be just a result of > different index sizes. I included two unpatched runs above to show that > there's a fair amount of noise in these tests. That's because of the random > nature of the test case; it picks rows to delete and insert at random. > > I think the conclusion is that all of these patches are effective. The > 1/10 variant is less effective, as expected, as it's closer in behavior to > the unpatched behavior than the others. The 1/2 variant seems as good as > the original patch. > Does two unpatched-4mb lines are different by coincidence? If so then variance is significant and we need more experiments to actually compare patched-2-4mb and origpatch-4mb lines. There is another cause of overhead when use randomization in gistchoose: extra penalty calls. It could be significant when index fits to cache. In order evade it I especially change behaviour of my patch from "look sequentially and choose random" to "look in random order". I think we need to include comparison of CPU time. ------ With best regards, Alexander Korotkov.