I'd like to base on paper "Generalizing "Search" in Generalized Search Trees" by Paul Aoki, section 4.1 "Multiple Key Support" (http://www.sai.msu.su/~megera/postgres/gist/papers/csd-97-950.pdf)

Proposed algorithm (without details about nulls etc):
1) n = 1  - set column's number to first one
2) form vector of keys of n-th column
3) set left and right union keys to NULL (see below)
4) call user-defined pickSplit method for n-th column
5) if it was a last column then return - page is splitted
6) try to find keys on left page with zero penalty with right union key and
   keys on right page with left union. Note, we check only keys in n-th
   column. Let M is a total number of keys with zero penalties with
   opposite unions. So we have M keys/tuples which can be freely
   distributed between left and right page. Penalty is calculated by call
   user-defined Penalty() method for n-th column.
7) if M == 0 then return - page is splitted
8) n++
9) form vector of keys of n-th column, but use only tuples, determined in
   step 6
10)set left and right union keys. Its forms from tuples which can't be
   freely distributed between page. These tuples are determined in step 6
11)go to step 4.


That algorithm requires to small change in interface of pickSplit() method.
It works with GIST_SPLITVEC structure. But step 6 requires that pickSplit()
should knows: are unions already formed? If not, pickSplit() should work
as now. If yes, pickSplit() must take in consideration formed left and right unions, and it can't to 'reduce' that unions.

I suggest add one boolean field to GIST_SPLITVEC: isSubSplit (suggest exact name, pls)
if isSubSplit == false then unions are not formed, pickSplit works as now.
if isSubSplit == true then unions are already formed, pickSplit should use
formed keys. So, after pickSplit() call isSubSplit should be always 'false', if it's not then GiST's core should say at least a warning about unsupported feature in opclass. BTW, in this case, GiST may compose old (with formed before) unions with suggested by pickSplit() by maximizing penalty between left and right keys.

Also, I plan to split GIST_SPLITVEC to 2 structures: one of its will be argument for pickSplit() and another will use internally in GiST core. Now GIST_SPLITVEC contains a lot of field that's needed only for core.

Thoughts, suggestions, objections?

--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to