Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2009-04-13 Thread Sushant Sinha
Headline generation uses hlCover to get fragments in text with *all* query items. In case there is no such fragment, it does not return anything. What you are asking will either require returning *maximally* matching covers or handling it as a separate case. -Sushant. On Mon, 2009-04-13 at 20:5

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2009-04-13 Thread Tom Lane
Sushant Sinha writes: > Headline generation uses hlCover to get fragments in text with *all* > query items. In case there is no such fragment, it does not return > anything. > What you are asking will either require returning *maximally* matching > covers or handling it as a separate case. Effic

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2009-04-13 Thread Tom Lane
Sushant Sinha writes: > Sorry for the delay. Here is the patch with FragmentDelimiter option. > It requires an extra option in HeadlineParsedText and uses that option > during generateHeadline. I did some editing of the documentation for this patch and noticed that the explanation of the fragmen

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-08-02 Thread Sushant Sinha
Sorry for the delay. Here is the patch with FragmentDelimiter option. It requires an extra option in HeadlineParsedText and uses that option during generateHeadline. Implementing notion of fragments in HeadlineParsedText and a separate function to join them seems more complicated. So for the time

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Oleg Bartunov
On Wed, 23 Jul 2008, Sushant Sinha wrote: I guess it is more readable to add cover separator at the end of a fragment than in the front. Let me know what you think and I can update it. FragmentsDelimiter should *separate* fragments and that says all. Not very difficult algorithmic problem, it

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Sushant Sinha
I guess it is more readable to add cover separator at the end of a fragment than in the front. Let me know what you think and I can update it. I think the right place for cover separator is in the structure HeadlineParsedText just like startsel and stopsel. This will enable users to specify their

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Oleg Bartunov
btw, is it intentional to have '' in headline ? =# select ts_headline('1 2 3 4 5 1 2 3 1','1&4'::tsquery,'MaxFragments=1'); ts_headline - ... 4 5 1 Oleg On Wed, 23 Jul 2008, Teodor Sigaev wrote: Let me know of any other changes that are needed. Looks li

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Teodor Sigaev
Let me know of any other changes that are needed. Looks like ready to commit, but documentation is needed. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailin

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-17 Thread Sushant Sinha
Fixed some off by one errors pointed by Oleg and errors in excluding overlapping fragments. Also adding test queries and updating regression tests. Let me know of any other changes that are needed. -Sushant. On Thu, 2008-07-17 at 03:28 +0400, Oleg Bartunov wrote: > On Wed, 16 Jul 2008, Susha

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-16 Thread Oleg Bartunov
On Wed, 16 Jul 2008, Sushant Sinha wrote: I will add test queries and their results for the corner cases in a separate file. I guess the only thing I am confused about is what should be the behavior of headline generation when Query items have words of size less than ShortWord. I guess the answe

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-16 Thread Sushant Sinha
I will add test queries and their results for the corner cases in a separate file. I guess the only thing I am confused about is what should be the behavior of headline generation when Query items have words of size less than ShortWord. I guess the answer is to ignore ShortWord parameter but let me

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-16 Thread Oleg Bartunov
Sushant, first, please, provide simple test queries, which demonstrate the right work in the corner cases. This will helps reviewers to test your patch and helps you to make sure your new version is ok. For example: =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery);

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-15 Thread Sushant Sinha
attached are two patches: 1. documentation 2. regression tests for headline with fragments. -Sushant. On Tue, 2008-07-15 at 13:29 +0400, Teodor Sigaev wrote: > > Attached a new patch that: > > > > 1. fixes previous bug > > 2. better handles the case when cover size is greater than the MaxWords

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-15 Thread Teodor Sigaev
Attached a new patch that: 1. fixes previous bug 2. better handles the case when cover size is greater than the MaxWords. Looks good, I'll make some tests with real-world application. I have not yet added the regression tests. The regression test suite seemed to be only ensuring that the fu

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-14 Thread Sushant Sinha
Attached a new patch that: 1. fixes previous bug 2. better handles the case when cover size is greater than the MaxWords. Basically it divides a cover greater than MaxWords into fragments of MaxWords, resizes each such fragment so that each end of the fragment contains a query word and then evalua

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-30 Thread Teodor Sigaev
1. Respects ShortWord and MinWords 2. Uses hlCover instead of Cover 3. Does not store norm (or lexeme) for headline marking 4. Removes ts_rank.h 5. Earlier it was counting even NONWORDTOKEN in the headline. Now it only counts the actual words and excludes spaces etc. I have also changed NumFrag

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-21 Thread Sushant Sinha
I have an attached an updated patch with following changes: 1. Respects ShortWord and MinWords 2. Uses hlCover instead of Cover 3. Does not store norm (or lexeme) for headline marking 4. Removes ts_rank.h 5. Earlier it was counting even NONWORDTOKEN in the headline. Now it only counts the actual w

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-05 Thread Teodor Sigaev
A couple of caveats: 1. ts_headline testing was done with current cvs head where as headline_with_fragments was done with postgres 8.3.1. 2. For headline_with_fragments, TSVector for the document was obtained by joining with another table. Are these differences understandable? That is possible

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-03 Thread Sushant Sinha
My main argument for using Cover instead of hlCover was that Cover will be faster. I tested the default headline generation that uses hlCover with the current patch that uses Cover. There was not much difference. So I think you are right in that we do not need norms and we can just use hlCover. I

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-03 Thread Teodor Sigaev
Why we need norms? We don't need norms at all - all matched HeadlineWordEntry already marked by HeadlineWordEntry->item! If it equals to NULL then this word isn't contained in tsquery. hlCover does the exact thing that Cover in tsrank does which is to find the cover that contains the query

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-02 Thread Sushant Sinha
Efficiency: I realized that we do not need to store all norms. We need to only store store norms that are in the query. So I moved the addition of norms from addHLParsedLex to hlfinditem. This should add very little memory overhead to existing headline generation. If this is still not acceptable f

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-02 Thread Teodor Sigaev
I have attached a new patch with respect to the current cvs head. This produces headline in a document for a given query. Basically it identifies fragments of text that contain the query and displays them. New variant is much better, but... HeadlineParsedText contains an array of actual words

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-31 Thread Sushant Sinha
I have attached a new patch with respect to the current cvs head. This produces headline in a document for a given query. Basically it identifies fragments of text that contain the query and displays them. DESCRIPTION HeadlineParsedText contains an array of actual words but not information abou

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-24 Thread Pierre-Yves Strub
On Sat, May 24, 2008 at 11:18 PM, Sushant Sinha <[EMAIL PROTECTED]> wrote: > Does this mean we want a unified function ts_headline and we trigger the > fragments if NumFragments is specified? It seems that introducing a new > function which can take configuration OID, or name is complex as there >

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-24 Thread Sushant Sinha
Now I understand the code much better. A few more questions on headline generation that I was not able to get from the code: 1. Why is hlparsetext used to parse the document rather than the parsetext function? Since words to be included in the headline will be marked afterwords, it seems more rea

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-23 Thread Teodor Sigaev
stucked with the function LexizeExec which I do not totally understand (... and is not well documents too :) ) Sorry for that. LexizeExec() is a play around supporting thesaurus dictionary, which is designed to replace phrase by phrase. So, if it see first matched word then it asks the parse

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-23 Thread Teodor Sigaev
[moved to -hackers, because talk is about implementation details] I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1 (http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php) Thank you. 1 > diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c now contr