On Mon, 14 Jul 2008, Rudolf Lippan wrote:
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
X-
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
X-Sender: [EMAIL PROTECTED]
Received: from 74-9
On Mon, 14 Jul 2008, Tom Lane wrote:
Alvaro Herrera <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Isn't the vacuum_delay_point() good enough?
But that's in the outer loop ... I mean here:
You'd need one heckuva lot of lexemes in a tsvector to make that
Tom, I like the 'heckuva' word (in r
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Isn't the vacuum_delay_point() good enough?
> But that's in the outer loop ... I mean here:
You'd need one heckuva lot of lexemes in a tsvector to make that
important. Do we have CHECK_FOR_INTERRUPTS() in any other loops
over tsvect
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Should we have a CHECK_FOR_INTERRUPTS() call in the inner loop of
> > compute_tsvector_stats?
>
> Isn't the vacuum_delay_point() good enough?
But that's in the outer loop ... I mean here:
Index: src/backend/tsearch/ts_typanalyze.c
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Should we have a CHECK_FOR_INTERRUPTS() call in the inner loop of
> compute_tsvector_stats?
Isn't the vacuum_delay_point() good enough?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To
Tom Lane wrote:
> =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
> > OK, here's the (hopefully final) version of the typanalyze function for
> > tsvectors. It applies to HEAD and passes regression tests.
>
> > I now plan to move towards a selectivity function that'll use the
> > ga
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
> OK, here's the (hopefully final) version of the typanalyze function for
> tsvectors. It applies to HEAD and passes regression tests.
> I now plan to move towards a selectivity function that'll use the
> gathered statistics.
Applied
OK, here's the (hopefully final) version of the typanalyze function for
tsvectors. It applies to HEAD and passes regression tests.
I now plan to move towards a selectivity function that'll use the
gathered statistics.
Thanks to everyone for their reviews and comments,
Cheers,
Jan
--
Jan Urb
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
Come to think of it, the current code is in a way a variant of Lossy
Counting, it's just doing the pruning after each and every new element,
isn't it?
Interesting comment. In LC's terms we have w=1 therefore e=1 the
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
> Come to think of it, the current code is in a way a variant of Lossy
> Counting, it's just doing the pruning after each and every new element,
> isn't it?
Interesting comment. In LC's terms we have w=1 therefore e=1 therefore
the ma
Oleg Bartunov wrote:
On Wed, 9 Jul 2008, Jan Urbaski wrote:
Jan Urbaski wrote:
Do you think it's worthwhile to implement the LC algorithm in C and
send it out, so others could try it out? Heck, maybe it's worthwhile
to replace the current compute_minimal_stats() algorithm with LC and
see how
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Well, (1) the normal measure would be statistics_target *tsvectors*,
and we'd have to translate that to lexemes somehow; my proposal is just
to use a fixed constant instead of tsvector width as in your
On Wed, 9 Jul 2008, Jan Urbaski wrote:
Jan Urbaski wrote:
Do you think it's worthwhile to implement the LC algorithm in C and send it
out, so others could try it out? Heck, maybe it's worthwhile to replace the
current compute_minimal_stats() algorithm with LC and see how that compares?
I
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> The way I think it ought to work is that the number of lexemes stored in
>> the final pg_statistic entry is statistics_target times a constant
>> (perhaps 100). I don't like having it vary depending on tsvector width
Tom Lane wrote:
The way I think it ought to work is that the number of lexemes stored in
the final pg_statistic entry is statistics_target times a constant
(perhaps 100). I don't like having it vary depending on tsvector width
I think the existing code puts at most statistics_target elements i
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
> Still, there's a decision to be made: after how many lexemes should the
> pruning occur?
The way I think it ought to work is that the number of lexemes stored in
the final pg_statistic entry is statistics_target times a constant
(perh
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Jan UrbaÅski wrote:
>> Oh, one important thing. You need to choose a bucket width for the LC
>> algorithm, that is decide after how many elements will you prune your
>> data structure. I chose to prune after every twenty tsvectors.
> Do you prune a
Alvaro Herrera wrote:
Jan Urbański wrote:
Oh, one important thing. You need to choose a bucket width for the LC
algorithm, that is decide after how many elements will you prune your
data structure. I chose to prune after every twenty tsvectors.
Do you prune after X tsvectors regardless of
Jan Urbański wrote:
> Oh, one important thing. You need to choose a bucket width for the LC
> algorithm, that is decide after how many elements will you prune your
> data structure. I chose to prune after every twenty tsvectors.
Do you prune after X tsvectors regardless of the numbers of lexe
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
Do you think it's worthwhile to implement the LC algorithm in C and send
it out, so others could try it out? Heck, maybe it's worthwhile to
replace the current compute_minimal_stats() algorithm with LC and see
how tha
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
> Do you think it's worthwhile to implement the LC algorithm in C and send
> it out, so others could try it out? Heck, maybe it's worthwhile to
> replace the current compute_minimal_stats() algorithm with LC and see
> how that compares
Jan Urbański wrote:
If you think the Lossy Counting method has potential, I could test it
somehow. Using my current work I could extract a stream of lexemes as
ANALYZE sees it and run it through a python implementation of the
algorithm to see if the result makes sense.
I hacked together a sim
Tom Lane wrote:
target would typically be around 10 or so. It really wasn't designed to
be industrial-strength code. (It was still a big improvement over what
we'd had before :-(.) So I'm not very comfortable with taking it as the
design base for something that does need to be industrial-stren
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
> Once again: the whole algorithm is a ripoff from analyze.c, with the
> dllist instead of an array because you don't know how big tracking list
> will you need and with a hashtable, because the tracking list will
> probably be large a
Heikki Linnakangas wrote:
Tom Lane wrote:
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Hmm, I had just assumed without looking too closely that it was stats
target times a fudge factor. What is the rationale for doing it as
above? I don't think I like the idea of the limi
Tom Lane wrote:
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
The data structure I'd suggest is a simple array of pointers
to the underlying hash table entries. Since you have a predetermined
maximum number of lexemes to track, you can just palloc the array once
--- you don'
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> The data structure I'd suggest is a simple array of pointers
>> to the underlying hash table entries. Since you have a predetermined
>> maximum number of lexemes to track, you can just palloc the array once
>> --- you don't need
Tom Lane wrote:
The data structure I'd suggest is a simple array of pointers
to the underlying hash table entries. Since you have a predetermined
maximum number of lexemes to track, you can just palloc the array once
--- you don't need the expansibility properties of a list.
The number of lex
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
> attached are two patches against HEAD. The smaller one is meant to be
> commited - it adds some functions that manipulate double-linked lists,
> namely inserting a new cell after or before another cell and swapping
> two adjacent cel
Jan,
> Hm... someone apparently added this mail to the wiki pag independently
> of me, so there were two duplicate entries. I found the second
> description better, so I removed my original entry and left the other
> one.
Yeah, I've been going through -hackers and -patches and adding stuff to th
Jan Urbański wrote:
I'll add the first one to the commit fest page, and I'm sending it to
-hackers with congratulations on the decision to ditch -patches ;)
Hm... someone apparently added this mail to the wiki pag independently
of me, so there were two duplicate entries. I found the second
d
Hi,
attached are two patches against HEAD. The smaller one is meant to be
commited - it adds some functions that manipulate double-linked lists,
namely inserting a new cell after or before another cell and swapping
two adjacent cells. It felt like being back in the first year of
studies. I ho
33 matches
Mail list logo