Alexander Korotkov writes:
> On Tue, Apr 5, 2011 at 5:05 PM, Robert Haas wrote:
>> I am probably being stupid here, but doesn't the number of links to
>> rows grow proportionately to the number of n-grams?
> Number of links to rows grow proportionally to total number of extracted
> q-grams, but
On Tue, Apr 5, 2011 at 5:05 PM, Robert Haas wrote:
> I am probably being stupid here, but doesn't the number of links to
> rows grow proportionately to the number of n-grams?
Number of links to rows grow proportionally to total number of extracted
q-grams, but not proportionally to number of uni
On Tue, Apr 5, 2011 at 8:41 AM, Alexander Korotkov wrote:
> For example, here is distribution of q-grams count in 120 Mb of dblp paper
> titles (pretty large dataset).
> q count
> 2 7218
> 3 115107
> 4 589428
> 5 1648453
> 6 3336685
> Number of 5-grams if about 15x larger than number of 3-g
For example, here is distribution of q-grams count in 120 Mb of dblp paper
titles (pretty large dataset).
q count
27218
3 115107
4 589428
5 1648453
6 3336685
Number of 5-grams if about 15x larger than number of 3-grams. But most part
of index space will be occupied by links to the rows(abou
On Tue, Apr 5, 2011 at 3:56 PM, Robert Haas wrote:
> So with q=5, the index will be approximately 10x larger than with q=3.
> Maybe that's OK, I'm not sure. But it is a big difference.
Not whole index will be approximately 10x larger, but only entries pages
number (which contains btree on gin
On Tue, Apr 5, 2011 at 4:52 AM, Alexander Korotkov wrote:
> On Mon, Apr 4, 2011 at 9:01 PM, Robert Haas wrote:
>>
>> On Mon, Apr 4, 2011 at 12:38 PM, Alexander Korotkov
>> wrote:
>> > relatively small when q <= 5. Accordingly, I think we should expect
>> > indexes
>> > to be usable with at least
On Mon, Apr 4, 2011 at 9:01 PM, Robert Haas wrote:
> On Mon, Apr 4, 2011 at 12:38 PM, Alexander Korotkov
> wrote:
> > relatively small when q <= 5. Accordingly, I think we should expect
> indexes
> > to be usable with at least with q = 5.
>
> I defer to your opinion on this, since you know more
On Mon, Apr 4, 2011 at 12:38 PM, Alexander Korotkov
wrote:
> relatively small when q <= 5. Accordingly, I think we should expect indexes
> to be usable with at least with q = 5.
I defer to your opinion on this, since you know more about it than I
do. But I think it would still be worthwhile to w
On Mon, Apr 4, 2011 at 6:56 PM, Robert Haas wrote:
> On Mon, Apr 4, 2011 at 7:35 AM, Alexander Korotkov
> wrote:
> > I would like to propose a q-gram module which would have following
> > differences in comparison with pg_trgm:
> > 1) Focus on acceleration of edit distance (e.g. levenshtein dist
On Mon, Apr 4, 2011 at 7:35 AM, Alexander Korotkov wrote:
> I would like to propose a q-gram module which would have following
> differences in comparison with pg_trgm:
> 1) Focus on acceleration of edit distance (e.g. levenshtein distance)
> queries and LIKE/ILIKE queries
> 2) Support of various
Here is text of my GSoC proposal. Given details probably makes essence of my
proposal clear. Any comments are welcome.
*Name of project*
Q-gram indexing module
*
*
*Synopsis*
Currently PostgreSQL has support for trigram-based string collection
indexing in pg_trgm module. Indexes in pg_trgm was o
On Mon, Mar 28, 2011 at 7:27 PM, Tom Lane wrote:
> Robert Haas writes:
> > I'm afraid I don't know this code well enough to give you any
> > meaningful feedback, but I hope someone will.
>
> Really Oleg and Teodor are the only people well-qualified to comment on
> such stuff. (It sounds reasona
Robert Haas writes:
> On Fri, Mar 25, 2011 at 1:32 PM, Alexander Korotkov
> wrote:
>> I would like to ask you about currency of the work above. I propose to
>> develop functionality of GIN and GiST q-gram indexes with following
>> features:
> I'm afraid I don't know this code well enough to give
On Fri, Mar 25, 2011 at 1:32 PM, Alexander Korotkov
wrote:
> I would like to ask you about currency of the work above. I propose to
> develop functionality of GIN and GiST q-gram indexes with following
> features:
> 1) Handle edit distance (e.g. levenshtein distance) and LIKE/ILIKE
> queries(using
On Fri, Mar 25, 2011 at 8:32 PM, Alexander Korotkov
wrote:
> I would like to ask you about currency of the work above.
This seems to be a mess of words. Sorry for my bad english. Actually,
I meant that I need a appraisal of my proposal.
With best regards,
Alexander Korotkov.
--
Sent via pg
Hackers,
I would like to ask you about currency of the work above. I propose to
develop functionality of GIN and GiST q-gram indexes with following
features:
1) Handle edit distance (e.g. levenshtein distance) and LIKE/ILIKE
queries(using GIN partial match if no full q-grams can be extracted
from
16 matches
Mail list logo