Headline generation uses hlCover to get fragments in text with *all*
query items. In case there is no such fragment, it does not return
anything.
What you are asking will either require returning *maximally* matching
covers or handling it as a separate case.
-Sushant.
On Mon, 2009-04-13 at 20:5
Sushant Sinha writes:
> Headline generation uses hlCover to get fragments in text with *all*
> query items. In case there is no such fragment, it does not return
> anything.
> What you are asking will either require returning *maximally* matching
> covers or handling it as a separate case.
Effic
Sushant Sinha writes:
> Sorry for the delay. Here is the patch with FragmentDelimiter option.
> It requires an extra option in HeadlineParsedText and uses that option
> during generateHeadline.
I did some editing of the documentation for this patch and noticed that
the explanation of the fragmen
Sorry for the delay. Here is the patch with FragmentDelimiter option.
It requires an extra option in HeadlineParsedText and uses that option
during generateHeadline.
Implementing notion of fragments in HeadlineParsedText and a separate
function to join them seems more complicated. So for the time
On Wed, 23 Jul 2008, Sushant Sinha wrote:
I guess it is more readable to add cover separator at the end of a fragment
than in the front. Let me know what you think and I can update it.
FragmentsDelimiter should *separate* fragments and that says all.
Not very difficult algorithmic problem, it
I guess it is more readable to add cover separator at the end of a fragment
than in the front. Let me know what you think and I can update it.
I think the right place for cover separator is in the structure
HeadlineParsedText just like startsel and stopsel. This will enable users to
specify their
btw, is it intentional to have '' in headline ?
=# select ts_headline('1 2 3 4 5 1 2 3 1','1&4'::tsquery,'MaxFragments=1');
ts_headline
-
... 4 5 1
Oleg
On Wed, 23 Jul 2008, Teodor Sigaev wrote:
Let me know of any other changes that are needed.
Looks li
Let me know of any other changes that are needed.
Looks like ready to commit, but documentation is needed.
--
Teodor Sigaev E-mail: [EMAIL PROTECTED]
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailin
Fixed some off by one errors pointed by Oleg and errors in excluding
overlapping fragments.
Also adding test queries and updating regression tests.
Let me know of any other changes that are needed.
-Sushant.
On Thu, 2008-07-17 at 03:28 +0400, Oleg Bartunov wrote:
> On Wed, 16 Jul 2008, Susha
On Wed, 16 Jul 2008, Sushant Sinha wrote:
I will add test queries and their results for the corner cases in a
separate file. I guess the only thing I am confused about is what should
be the behavior of headline generation when Query items have words of
size less than ShortWord. I guess the answe
I will add test queries and their results for the corner cases in a
separate file. I guess the only thing I am confused about is what should
be the behavior of headline generation when Query items have words of
size less than ShortWord. I guess the answer is to ignore ShortWord
parameter but let me
Sushant,
first, please, provide simple test queries, which demonstrate the right work
in the corner cases. This will helps reviewers to test your patch and
helps you to make sure your new version is ok. For example:
=# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery);
attached are two patches:
1. documentation
2. regression tests
for headline with fragments.
-Sushant.
On Tue, 2008-07-15 at 13:29 +0400, Teodor Sigaev wrote:
> > Attached a new patch that:
> >
> > 1. fixes previous bug
> > 2. better handles the case when cover size is greater than the MaxWords
Attached a new patch that:
1. fixes previous bug
2. better handles the case when cover size is greater than the MaxWords.
Looks good, I'll make some tests with real-world application.
I have not yet added the regression tests. The regression test suite
seemed to be only ensuring that the fu
Attached a new patch that:
1. fixes previous bug
2. better handles the case when cover size is greater than the MaxWords.
Basically it divides a cover greater than MaxWords into fragments of
MaxWords, resizes each such fragment so that each end of the fragment
contains a query word and then evalua
1. Respects ShortWord and MinWords
2. Uses hlCover instead of Cover
3. Does not store norm (or lexeme) for headline marking
4. Removes ts_rank.h
5. Earlier it was counting even NONWORDTOKEN in the headline. Now it
only counts the actual words and excludes spaces etc.
I have also changed NumFrag
I have an attached an updated patch with following changes:
1. Respects ShortWord and MinWords
2. Uses hlCover instead of Cover
3. Does not store norm (or lexeme) for headline marking
4. Removes ts_rank.h
5. Earlier it was counting even NONWORDTOKEN in the headline. Now it
only counts the actual w
A couple of caveats:
1. ts_headline testing was done with current cvs head where as
headline_with_fragments was done with postgres 8.3.1.
2. For headline_with_fragments, TSVector for the document was obtained
by joining with another table.
Are these differences understandable?
That is possible
My main argument for using Cover instead of hlCover was that Cover will
be faster. I tested the default headline generation that uses hlCover
with the current patch that uses Cover. There was not much difference.
So I think you are right in that we do not need norms and we can just
use hlCover.
I
Why we need norms?
We don't need norms at all - all matched HeadlineWordEntry already marked by
HeadlineWordEntry->item! If it equals to NULL then this word isn't contained in
tsquery.
hlCover does the exact thing that Cover in tsrank does which is to find
the cover that contains the query
Efficiency: I realized that we do not need to store all norms. We need
to only store store norms that are in the query. So I moved the addition
of norms from addHLParsedLex to hlfinditem. This should add very little
memory overhead to existing headline generation.
If this is still not acceptable f
I have attached a new patch with respect to the current cvs head. This
produces headline in a document for a given query. Basically it
identifies fragments of text that contain the query and displays them.
New variant is much better, but...
HeadlineParsedText contains an array of actual words
I have attached a new patch with respect to the current cvs head. This
produces headline in a document for a given query. Basically it
identifies fragments of text that contain the query and displays them.
DESCRIPTION
HeadlineParsedText contains an array of actual words but not
information abou
On Sat, May 24, 2008 at 11:18 PM, Sushant Sinha <[EMAIL PROTECTED]> wrote:
> Does this mean we want a unified function ts_headline and we trigger the
> fragments if NumFragments is specified? It seems that introducing a new
> function which can take configuration OID, or name is complex as there
>
Now I understand the code much better. A few more questions on headline
generation that I was not able to get from the code:
1. Why is hlparsetext used to parse the document rather than the
parsetext function? Since words to be included in the headline will be
marked afterwords, it seems more rea
stucked with the function LexizeExec which I do not totally understand
(... and is not well documents too :) )
Sorry for that. LexizeExec() is a play around supporting thesaurus dictionary,
which is designed to replace phrase by phrase. So, if it see first matched word
then it asks the parse
[moved to -hackers, because talk is about implementation details]
I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1
(http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php)
Thank you.
1 > diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c
now contr
27 matches
Mail list logo