Hi Karl,
Where is the introduction of below algorithm? Thanks.
"Very simple algorithmic solutions usually involve ranking top senstances
by looking at distribution of terms in sentances, paragraphs and the
whole document. I implemented something like this a couple of years back
that worked fairly w
Compare with classical VSM, lucene just ignore the denominator (|Q|*|D|) of
similarity formula,
but it add norm(t,d) and coord(q,d) to calculate the fraction of terms in
Query and Doc,
so it's a modified implementation of VSM in practice.
Do you just want to verify which implementation of VSM in "
50.67 186.41 38936841 143240688
>
> See attached for hardware info and the CPU call tree (taken from YourKit).
>
> I would appreciate your recommendations.
>
>
> Jamie
>
>
> h t wrote:
> Hi Michael,
> I guess the hotspot of lucene is
> org.apach
I guess you can implement createBitSet() more effciently by using
Filer,but not BooleanQuery
2008/2/25, Gabriel Landais <[EMAIL PROTECTED]>:
>
> Gabriel Landais a écrit :
>
> > How to create a Filter for a field in Collection?
> > First, split Collection in Collection with
> > BooleanQuery.maxCl
Did you use the keywords in two calls?
2008/2/27, fangz <[EMAIL PROTECTED]>:
>
>
> Hi,
>
> I am using a simple java program to test the search speed. The index file
> is
> about 1.93G in size. I initiated an indexsearcher and built a query using
> the query parser: parser.parse("entity:fail"). The
Hi Michael,
I guess the hotspot of lucene is
org.apache.lucene.search.IndexSearcher.search()
Hi Jamie,
What's the original text size of a million emails?
I estimate the size of an email is around 100k, is this true?
When you doing search, what kind keywords did you input, words or short
sentence?
http://www.shifttab.cn:8001/wiki
2007/10/31, Marco <[EMAIL PROTECTED]>:
>
> It seems that the problem is when I add the token created by
> EdgeNGramTokenizer in in the index.
> If the token contains a space (for example apple com) I have to add to
> the index with Field.Index.TOKENIZED otherwise t