Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-14 Thread Oleg Bartunov
On Mon, 14 Jul 2008, Rudolf Lippan wrote: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Message-ID: <[EMAIL PROTECTED]> X-

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-14 Thread Rudolf Lippan
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Message-ID: <[EMAIL PROTECTED]> X-Sender: [EMAIL PROTECTED] Received: from 74-9

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-14 Thread Oleg Bartunov
On Mon, 14 Jul 2008, Tom Lane wrote: Alvaro Herrera <[EMAIL PROTECTED]> writes: Tom Lane wrote: Isn't the vacuum_delay_point() good enough? But that's in the outer loop ... I mean here: You'd need one heckuva lot of lexemes in a tsvector to make that Tom, I like the 'heckuva' word (in r

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-13 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Isn't the vacuum_delay_point() good enough? > But that's in the outer loop ... I mean here: You'd need one heckuva lot of lexemes in a tsvector to make that important. Do we have CHECK_FOR_INTERRUPTS() in any other loops over tsvect

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-13 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Should we have a CHECK_FOR_INTERRUPTS() call in the inner loop of > > compute_tsvector_stats? > > Isn't the vacuum_delay_point() good enough? But that's in the outer loop ... I mean here: Index: src/backend/tsearch/ts_typanalyze.c

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-13 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Should we have a CHECK_FOR_INTERRUPTS() call in the inner loop of > compute_tsvector_stats? Isn't the vacuum_delay_point() good enough? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-13 Thread Alvaro Herrera
Tom Lane wrote: > =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > > OK, here's the (hopefully final) version of the typanalyze function for > > tsvectors. It applies to HEAD and passes regression tests. > > > I now plan to move towards a selectivity function that'll use the > > ga

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-13 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > OK, here's the (hopefully final) version of the typanalyze function for > tsvectors. It applies to HEAD and passes regression tests. > I now plan to move towards a selectivity function that'll use the > gathered statistics. Applied

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-12 Thread Jan Urbański
OK, here's the (hopefully final) version of the typanalyze function for tsvectors. It applies to HEAD and passes regression tests. I now plan to move towards a selectivity function that'll use the gathered statistics. Thanks to everyone for their reviews and comments, Cheers, Jan -- Jan Urb

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-11 Thread Jan Urbański
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: Come to think of it, the current code is in a way a variant of Lossy Counting, it's just doing the pruning after each and every new element, isn't it? Interesting comment. In LC's terms we have w=1 therefore e=1 the

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > Come to think of it, the current code is in a way a variant of Lossy > Counting, it's just doing the pruning after each and every new element, > isn't it? Interesting comment. In LC's terms we have w=1 therefore e=1 therefore the ma

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Jan Urbański
Oleg Bartunov wrote: On Wed, 9 Jul 2008, Jan Urbaski wrote: Jan Urbaski wrote: Do you think it's worthwhile to implement the LC algorithm in C and send it out, so others could try it out? Heck, maybe it's worthwhile to replace the current compute_minimal_stats() algorithm with LC and see how

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Jan Urbański
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: Tom Lane wrote: Well, (1) the normal measure would be statistics_target *tsvectors*, and we'd have to translate that to lexemes somehow; my proposal is just to use a fixed constant instead of tsvector width as in your

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Oleg Bartunov
On Wed, 9 Jul 2008, Jan Urbaski wrote: Jan Urbaski wrote: Do you think it's worthwhile to implement the LC algorithm in C and send it out, so others could try it out? Heck, maybe it's worthwhile to replace the current compute_minimal_stats() algorithm with LC and see how that compares? I

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> The way I think it ought to work is that the number of lexemes stored in >> the final pg_statistic entry is statistics_target times a constant >> (perhaps 100). I don't like having it vary depending on tsvector width

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Jan Urbański
Tom Lane wrote: The way I think it ought to work is that the number of lexemes stored in the final pg_statistic entry is statistics_target times a constant (perhaps 100). I don't like having it vary depending on tsvector width I think the existing code puts at most statistics_target elements i

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > Still, there's a decision to be made: after how many lexemes should the > pruning occur? The way I think it ought to work is that the number of lexemes stored in the final pg_statistic entry is statistics_target times a constant (perh

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Jan Urbański wrote: >> Oh, one important thing. You need to choose a bucket width for the LC >> algorithm, that is decide after how many elements will you prune your >> data structure. I chose to prune after every twenty tsvectors. > Do you prune a

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Jan Urbański
Alvaro Herrera wrote: Jan Urbański wrote: Oh, one important thing. You need to choose a bucket width for the LC algorithm, that is decide after how many elements will you prune your data structure. I chose to prune after every twenty tsvectors. Do you prune after X tsvectors regardless of

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Alvaro Herrera
Jan Urbański wrote: > Oh, one important thing. You need to choose a bucket width for the LC > algorithm, that is decide after how many elements will you prune your > data structure. I chose to prune after every twenty tsvectors. Do you prune after X tsvectors regardless of the numbers of lexe

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Jan Urbański
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: Do you think it's worthwhile to implement the LC algorithm in C and send it out, so others could try it out? Heck, maybe it's worthwhile to replace the current compute_minimal_stats() algorithm with LC and see how tha

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-08 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > Do you think it's worthwhile to implement the LC algorithm in C and send > it out, so others could try it out? Heck, maybe it's worthwhile to > replace the current compute_minimal_stats() algorithm with LC and see > how that compares

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-08 Thread Jan Urbański
Jan Urbański wrote: If you think the Lossy Counting method has potential, I could test it somehow. Using my current work I could extract a stream of lexemes as ANALYZE sees it and run it through a python implementation of the algorithm to see if the result makes sense. I hacked together a sim

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-07 Thread Jan Urbański
Tom Lane wrote: target would typically be around 10 or so. It really wasn't designed to be industrial-strength code. (It was still a big improvement over what we'd had before :-(.) So I'm not very comfortable with taking it as the design base for something that does need to be industrial-stren

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-07 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > Once again: the whole algorithm is a ripoff from analyze.c, with the > dllist instead of an array because you don't know how big tracking list > will you need and with a hashtable, because the tracking list will > probably be large a

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-06 Thread Jan Urbański
Heikki Linnakangas wrote: Tom Lane wrote: "Heikki Linnakangas" <[EMAIL PROTECTED]> writes: Tom Lane wrote: Hmm, I had just assumed without looking too closely that it was stats target times a fudge factor. What is the rationale for doing it as above? I don't think I like the idea of the limi

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-04 Thread Heikki Linnakangas
Tom Lane wrote: "Heikki Linnakangas" <[EMAIL PROTECTED]> writes: Tom Lane wrote: The data structure I'd suggest is a simple array of pointers to the underlying hash table entries. Since you have a predetermined maximum number of lexemes to track, you can just palloc the array once --- you don'

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-04 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> The data structure I'd suggest is a simple array of pointers >> to the underlying hash table entries. Since you have a predetermined >> maximum number of lexemes to track, you can just palloc the array once >> --- you don't need

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-04 Thread Heikki Linnakangas
Tom Lane wrote: The data structure I'd suggest is a simple array of pointers to the underlying hash table entries. Since you have a predetermined maximum number of lexemes to track, you can just palloc the array once --- you don't need the expansibility properties of a list. The number of lex

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-03 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > attached are two patches against HEAD. The smaller one is meant to be > commited - it adds some functions that manipulate double-linked lists, > namely inserting a new cell after or before another cell and swapping > two adjacent cel

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-06-27 Thread Josh Berkus
Jan, > Hm... someone apparently added this mail to the wiki pag independently > of me, so there were two duplicate entries. I found the second > description better, so I removed my original entry and left the other > one. Yeah, I've been going through -hackers and -patches and adding stuff to th

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-06-27 Thread Jan Urbański
Jan Urbański wrote: I'll add the first one to the commit fest page, and I'm sending it to -hackers with congratulations on the decision to ditch -patches ;) Hm... someone apparently added this mail to the wiki pag independently of me, so there were two duplicate entries. I found the second d

[HACKERS] gsoc, text search selectivity and dllist enhancments

2008-06-27 Thread Jan Urbański
Hi, attached are two patches against HEAD. The smaller one is meant to be commited - it adds some functions that manipulate double-linked lists, namely inserting a new cell after or before another cell and swapping two adjacent cells. It felt like being back in the first year of studies. I ho