Re: [HACKERS] gistchoose vs. bloat

Alexander Korotkov Thu, 24 Jan 2013 12:09:38 -0800

On Thu, Jan 24, 2013 at 11:26 PM, Heikki Linnakangas <
[email protected]> wrote:


> On 21.01.2013 15:19, Heikki Linnakangas wrote:
>
>> On 21.01.2013 15:06, Tom Lane wrote:
>>
>>> Jeff Davis<[email protected]> writes:
>>>
>>>> On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:
>>>>
>>>>> I looked at this patch. ISTM we should not have the option at all but
>>>>> just do it always. I cannot believe that always-go-left is ever a
>>>>> preferable strategy in the long run; the resulting imbalance in the
>>>>> index will surely kill any possible benefit. Even if there are some
>>>>> cases where it miraculously fails to lose, how many users are going to
>>>>> realize that applies to their case and make use of the option?
>>>>>
>>>>
>>>  Sounds good to me.
>>>>
>>>
>>>  If I remember correctly, there was also an argument that it may be
>>>> useful for repeatable test results. That's a little questionable for
>>>> performance (except in those cases where few penalties are identical
>>>> anyway), but could plausibly be useful for a crash report or something.
>>>>
>>>
>>> Meh. There's already a random decision, in the equivalent place and for
>>> a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
>>> complained about that being indeterminate, so I'm unconvinced that
>>> there's a market for it with gist.
>>>
>>
>> I wonder if it would work for gist to do something similar to
>> _bt_findinsertloc, and have a bias towards the left page, but sometimes
>> descend to one of the pages to the right. You would get the cache
>> locality of usually descending down the same subtree, but also fill the
>> pages to the right. Jeff / Alexander, want to give that a shot?
>>
>
> I did some experimenting with that. I used the same test case Alexander
> did, with geonames data, and compared unpatched version, the original
> patch, and the attached patch that biases the first "best" tuple found, but
> still sometimes chooses the other equally good ones.
>
>     testname    | initsize | finalsize | idx_blks_read | idx_blks_hit
> ----------------+----------+--**---------+---------------+----**----------
>  patched-10-4mb | 75497472 |  90202112 |       5853604 |     10178331
>  unpatched-4mb  | 75145216 |  94863360 |       5880676 |     10185647
>  unpatched-4mb  | 75587584 |  97165312 |       5903107 |     10183759
>  patched-2-4mb  | 74768384 |  81403904 |       5768124 |     10193738
>  origpatch-4mb  | 74883072 |  82182144 |       5783412 |     10185373
>
> All these tests were performed with shared_buffers=4MB, so that the index
> won't fit completely in shared buffers, and I could use the
> idx_blk_read/hit ratio as a measure of cache-friendliness. The "origpath"
> test was with a simplified version of Alexander's patch, see attached. It's
> the same as the original, but with the randomization-option and gist-build
> related code removed. The patched-10 and patched-2 tests are two variants
> with my patch, with 1/10 and 1/2 chance of moving to the next equal tuple,
> respectively. The differences in cache hit ratio might be just a result of
> different index sizes. I included two unpatched runs above to show that
> there's a fair amount of noise in these tests. That's because of the random
> nature of the test case; it picks rows to delete and insert at random.
>
> I think the conclusion is that all of these patches are effective. The
> 1/10 variant is less effective, as expected, as it's closer in behavior to
> the unpatched behavior than the others. The 1/2 variant seems as good as
> the original patch.
>

Does two unpatched-4mb lines are different by coincidence? If so then
variance is significant and we need more experiments to actually compare
patched-2-4mb and origpatch-4mb lines.
There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from "look
sequentially and choose random" to "look in random order". I think we need
to include comparison of CPU time.

------
With best regards,
Alexander Korotkov.

Re: [HACKERS] gistchoose vs. bloat

Reply via email to