Re: [HACKERS] why is gist index taking so much space on the disc

Grzegorz Jaskiewicz Mon, 21 Nov 2005 11:18:22 -0800


On 2005-11-21, at 19:32, Martijn van Oosterhout wrote:

On Mon, Nov 21, 2005 at 04:58:25PM +0100, Grzegorz Jaskiewicz wrote:

my conquers with Gist index for custom type are nearly finished. It
is working as it is now, but there are few problems here and there.
One of em, being amount of disc space index it self takes. The type
stucture it self takes 160bytes. Adding 100.000 rows into table -
CREATE TABLE blah (a serial, b customType);


Let's see, 160bytes means you'll get aboud 50 keys per page. So you

would expect 2000 leaf page, 40 level 1 pages. This should be lessthan

20-30MB

yep;

You mean you sometimes put the same elements in the two halves? You
shouldn't do that. The whole point is that the search will descend any
node that matches consistant, but any single key should only appear
once in each index.

picksplit should *split* the set, not return two sets about the same
size as you started...

Nope, I mean that 'masks' created to match either 'half' sometimesmatch elements in the other one.This shouldn't be a big deal, just one level to go down on query tomuch more specific result set.

I have fixed that with, somewhat hack.
Here's some pseudo code

sort_data(input);
find_split_point(input);
mask1 = generate_two_masks(input[0]);
mask2 = generate_two_masks(input[1]);

foreach(input) {
        bool a = matches1(input);
        bool b = matches2(input);
        if ( a && b ) {
                if ( left_index == 0 ) {
                        left[left_index++] = input;     
                }
                else {
                        if ( right_index == 0 ) {
                                right[right_index++] = input;
                                continue;
                        }

/* this part is new code, and helped a lot, now gist index takes muchless space

and is much faster, because of lower I/O consumption*/
                        if ( loop_index % 2 ) {
                                right[right_index++] = input;
                        }
                        else {
                                left[left_index++] = input;
                        }
                }
        }
        else {
                if ( a) left[left_index++] = input;
                if ( b) right[right_index++] = input;
        }
}


mask1 = generate(left );
mask2 = generate(right);

return (left, right, blah, blih, others);

Ok, so the part with i%2 helped a lot, it distributes elementsmatching both masks evenly.


Thanks guys.

I will play with k-means, and see if they will work better with nohacks. Either way, I have to have some code that will handle "matchesboth" case.


Thanks again.

--
GJ

"If we knew what we were doing, it wouldn't be called Research, wouldit?" - AE





---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [HACKERS] why is gist index taking so much space on the disc

Reply via email to