On 11.08.2011 23:30, Alexander Korotkov wrote:
On Thu, Aug 11, 2011 at 2:28 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com> wrote:
On 10.08.2011 22:44, Alexander Korotkov wrote:
Manual and readme updates.
Thanks, I'm reviewing these now.
Do we want to expose the level-step and buffersize parameters to users?
They've been useful during testing, but I'm thinking we should be able to
guess good enough values for them automatically, and just remove the
options. It's pretty much impossible for a user to tune them correctly, it
would require deep knowledge of the buffering algorithm.
I'm thinking that even when you explicitly turn buffering on, we should
still process the first 10000 or so tuples with simple inserts. That way we
always have a sample of tuples to calculate the average tuple size from.
It's plausible that if the input data is ordered, looking at the first N
tuples will give skewed sample, but I don't think there's much danger of
that in practice. Even if the data is ordered, the length of GiST tuples
shouldn't vary much.
What happens if we get the levelstep and pagesPerBuffer estimates wrong?
How sensitive is the algorithm to that? Or will we run out of memory? Would
it be feasible to adjust those in the middle of the index build, if we e.g
exceed the estimated memory usage greatly?
I see the following risks.
For levelstep:
Too small: not so great IO benefit as can be
Too large:
1) If subtree doesn't fit effective_cache, much more IO then should be
(because of cache misses during buffer emptying)
2) If last pages of buffers don't fit to maintenance_work_mem, possible
OOM
Hmm, we could avoid running out of memory if we used a LRU cache
replacement policy on the buffer pages, instead of explicitly unloading
the buffers. 1) would still apply, though.
For buffersize:
Too small: less IO benefit, becuse buffer size is relatively small in
comparison with sub-tree size.
Too large: greater CPU overhead (because of more penalty calls) then can be
with same IO benefit.
Thereby I propose following.
1) Too large levelstep is greatest risk. Let's use pessimistic estimate for
it. Pessimistic estimate has following logic:
largest sub-tree => maximal tuples per page => minimal tuple size
Thereby always using minimal tuple size in levelstep calculation we exclude
greatest risks.
2) Risks of buffersize are comparable and not too critical. Thats why I
propose to use size of first 10000 tuples for estimate.
Yep, sounds reasonable.
I think it would also be fairly simple to decrease levelstep and/or
adjust buffersize on-the-fly. The trick would be in figuring out the
heuristics on when to do that.
Another thing occurred to me while looking at the buffer emptying
process: At the moment, we stop emptying after we've flushed 1/2 buffer
size worth of tuples. The point of that is to avoid overfilling a
lower-level buffer, in the case that the tuples we emptied all landed on
the same lower-level buffer. Wouldn't it be fairly simple to detect that
case explicitly, and stop the emptying process only if one of the
lower-level buffers really fills up? That should be more efficient, as
you would have "swap" between different subtrees less often.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers