Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2014-08-06 Thread Bruce Momjian
FYI, I have kept this email from 2011 about poor performance of parsed words in headline generation. If someone wants to research it, please do so: http://www.postgresql.org/message-id/1314117620.3700.12.camel@dragflick ---

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2013-01-24 Thread Bruce Momjian
On Wed, Aug 15, 2012 at 11:09:18PM +0530, Sushant Sinha wrote: > I will do the profiling and present the results. Sushant, do you have any profiling results on this issue from August? --- > > On Wed, 2012-08-15 at 12:45 -0

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Sushant Sinha
I will do the profiling and present the results. On Wed, 2012-08-15 at 12:45 -0400, Tom Lane wrote: > Bruce Momjian writes: > > Is this a TODO? > > AFAIR nothing's been done about the speed issue, so yes. I didn't > like the idea of creating a user-visible knob when the speed issue > might be f

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Tom Lane
Bruce Momjian writes: > Is this a TODO? AFAIR nothing's been done about the speed issue, so yes. I didn't like the idea of creating a user-visible knob when the speed issue might be fixable with internal algorithm improvements, but we never followed up on this in either fashion.

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Bruce Momjian
This might indicate that the hlCover() item is resolved. --- On Wed, Aug 24, 2011 at 10:08:11AM +0530, Sushant Sinha wrote: > > > Actually, this code seems probably flat-out wrong: won't every > successful call of

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Bruce Momjian
Is this a TODO? --- On Tue, Aug 23, 2011 at 10:31:42PM -0400, Tom Lane wrote: > Sushant Sinha writes: > >> Doesn't this force the headline to be taken from the first N words of > >> the document, independent of where the ma

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
> > Actually, this code seems probably flat-out wrong: won't every > successful call of hlCover() on a given document return exactly the same > q value (end position), namely the last token occurrence in the > document? How is that helpful? > >regards, tom lane > There is

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Tom Lane
Sushant Sinha writes: >> Doesn't this force the headline to be taken from the first N words of >> the document, independent of where the match was? That seems rather >> unworkable, or at least unhelpful. > In headline generation function, we don't have any index or knowledge of > where the match

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
> > Here is a simple patch that limits the number of words during the > > tokenization phase and puts an upper-bound on the headline generation. > > Doesn't this force the headline to be taken from the first N words of > the document, independent of where the match was? That seems rather > unwor

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar ago 23 15:59:18 -0300 2011: > Sushant Sinha writes: > > Given a document and a query, the goal of headline generation is to > > produce text excerpts in which the query appears. > > ... right ... > > > Here is a simple patch that limits the number of words

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Tom Lane
Sushant Sinha writes: > Given a document and a query, the goal of headline generation is to > produce text excerpts in which the query appears. ... right ... > Here is a simple patch that limits the number of words during the > tokenization phase and puts an upper-bound on the headline generatio

[HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
Given a document and a query, the goal of headline generation is to produce text excerpts in which the query appears. Currently the headline generation in postgres follows the following steps: 1. Tokenize the documents and obtain the lexemes 2. Decide on lexemes that should be the part of the head