On 2005-11-21, at 19:32, Martijn van Oosterhout wrote:

On Mon, Nov 21, 2005 at 04:58:25PM +0100, Grzegorz Jaskiewicz wrote:
my conquers with Gist index for custom type are nearly finished. It
is working as it is now, but there are few problems here and there.
One of em, being amount of disc space index it self takes. The type
stucture it self takes 160bytes. Adding 100.000 rows into table -
CREATE TABLE blah (a serial, b customType);

Let's see, 160bytes means you'll get aboud 50 keys per page. So you
would expect 2000 leaf page, 40 level 1 pages. This should be less than
20-30MB

yep;

You mean you sometimes put the same elements in the two halves? You
shouldn't do that. The whole point is that the search will descend any
node that matches consistant, but any single key should only appear
once in each index.

picksplit should *split* the set, not return two sets about the same
size as you started...

Nope, I mean that 'masks' created to match either 'half' sometimes match elements in the other one. This shouldn't be a big deal, just one level to go down on query to much more specific result set.
I have fixed that with, somewhat hack.
Here's some pseudo code

sort_data(input);
find_split_point(input);
mask1 = generate_two_masks(input[0]);
mask2 = generate_two_masks(input[1]);

foreach(input) {
        bool a = matches1(input);
        bool b = matches2(input);
        if ( a && b ) {
                if ( left_index == 0 ) {
                        left[left_index++] = input;     
                }
                else {
                        if ( right_index == 0 ) {
                                right[right_index++] = input;
                                continue;
                        }
/* this part is new code, and helped a lot, now gist index takes much less space
and is much faster, because of lower I/O consumption*/
                        if ( loop_index % 2 ) {
                                right[right_index++] = input;
                        }
                        else {
                                left[left_index++] = input;
                        }
                }
        }
        else {
                if ( a) left[left_index++] = input;
                if ( b) right[right_index++] = input;
        }
}


mask1 = generate(left );
mask2 = generate(right);

return (left, right, blah, blih, others);

Ok, so the part with i%2 helped a lot, it distributes elements matching both masks evenly.

Thanks guys.

I will play with k-means, and see if they will work better with no hacks. Either way, I have to have some code that will handle "matches both" case.

Thanks again.

--
GJ

"If we knew what we were doing, it wouldn't be called Research, would it?" - AE




---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to