Re: [HACKERS] WIP: Fast GiST index build

Heikki Linnakangas Fri, 12 Aug 2011 01:24:29 -0700

On 11.08.2011 23:30, Alexander Korotkov wrote:

On Thu, Aug 11, 2011 at 2:28 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:

On 10.08.2011 22:44, Alexander Korotkov wrote:

Manual and readme updates.


Thanks, I'm reviewing these now.

Do we want to expose the level-step and buffersize parameters to users?
They've been useful during testing, but I'm thinking we should be able to
guess good enough values for them automatically, and just remove the
options. It's pretty much impossible for a user to tune them correctly, it
would require deep knowledge of the buffering algorithm.

I'm thinking that even when you explicitly turn buffering on, we should
still process the first 10000 or so tuples with simple inserts. That way we
always have a sample of tuples to calculate the average tuple size from.
It's plausible that if the input data is ordered, looking at the first N
tuples will give skewed sample, but I don't think there's much danger of
that in practice. Even if the data is ordered, the length of GiST tuples
shouldn't vary much.

What happens if we get the levelstep and pagesPerBuffer estimates wrong?
How sensitive is the algorithm to that? Or will we run out of memory? Would
it be feasible to adjust those in the middle of the index build, if we e.g
exceed the estimated memory usage greatly?



I see the following risks.

For levelstep:
Too small: not so great IO benefit as can be
Too large:
   1) If subtree doesn't fit effective_cache, much more IO then should be
(because of cache misses during buffer emptying)
   2) If last pages of buffers don't fit to maintenance_work_mem, possible
OOM

Hmm, we could avoid running out of memory if we used a LRU cachereplacement policy on the buffer pages, instead of explicitly unloadingthe buffers. 1) would still apply, though.

For buffersize:
Too small: less IO benefit, becuse buffer size is relatively small in
comparison with sub-tree size.
Too large: greater CPU overhead (because of more penalty calls) then can be
with same IO benefit.

Thereby I propose following.
1) Too large levelstep is greatest risk. Let's use pessimistic estimate for
it. Pessimistic estimate has following logic:
largest sub-tree =>  maximal tuples per page =>  minimal tuple size
Thereby always using minimal tuple size in levelstep calculation we exclude
greatest risks.
2) Risks of buffersize are comparable and not too critical. Thats why I
propose to use size of first 10000 tuples for estimate.


Yep, sounds reasonable.

I think it would also be fairly simple to decrease levelstep and/oradjust buffersize on-the-fly. The trick would be in figuring out theheuristics on when to do that.

Another thing occurred to me while looking at the buffer emptyingprocess: At the moment, we stop emptying after we've flushed 1/2 buffersize worth of tuples. The point of that is to avoid overfilling alower-level buffer, in the case that the tuples we emptied all landed onthe same lower-level buffer. Wouldn't it be fairly simple to detect thatcase explicitly, and stop the emptying process only if one of thelower-level buffers really fills up? That should be more efficient, asyou would have "swap" between different subtrees less often.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

Reply via email to