Re: How do i get a text summary

2008-02-28 Thread h t
Hi Karl, Where is the introduction of below algorithm? Thanks. "Very simple algorithmic solutions usually involve ranking top senstances by looking at distribution of terms in sentances, paragraphs and the whole document. I implemented something like this a couple of years back that worked fairly w

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread h t
Compare with classical VSM, lucene just ignore the denominator (|Q|*|D|) of similarity formula, but it add norm(t,d) and coord(q,d) to calculate the fraction of terms in Query and Doc, so it's a modified implementation of VSM in practice. Do you just want to verify which implementation of VSM in "

Re: Lucene Search Performance

2008-02-27 Thread h t
50.67 186.41 38936841 143240688 > > See attached for hardware info and the CPU call tree (taken from YourKit). > > I would appreciate your recommendations. > > > Jamie > > > h t wrote: > Hi Michael, > I guess the hotspot of lucene is > org.apach

Re: Security filtering from external DB

2008-02-26 Thread h t
I guess you can implement createBitSet() more effciently by using Filer,but not BooleanQuery 2008/2/25, Gabriel Landais <[EMAIL PROTECTED]>: > > Gabriel Landais a écrit : > > > How to create a Filter for a field in Collection? > > First, split Collection in Collection with > > BooleanQuery.maxCl

Re: Inconsistent Search Speed

2008-02-26 Thread h t
Did you use the keywords in two calls? 2008/2/27, fangz <[EMAIL PROTECTED]>: > > > Hi, > > I am using a simple java program to test the search speed. The index file > is > about 1.93G in size. I initiated an indexsearcher and built a query using > the query parser: parser.parse("entity:fail"). The

Re: Lucene Search Performance

2008-02-26 Thread h t
Hi Michael, I guess the hotspot of lucene is org.apache.lucene.search.IndexSearcher.search() Hi Jamie, What's the original text size of a million emails? I estimate the size of an email is around 100k, is this true? When you doing search, what kind keywords did you input, words or short sentence?

Re: EdgeNGramTokenizer

2007-11-04 Thread h t
http://www.shifttab.cn:8001/wiki 2007/10/31, Marco <[EMAIL PROTECTED]>: > > It seems that the problem is when I add the token created by > EdgeNGramTokenizer in in the index. > If the token contains a space (for example apple com) I have to add to > the index with Field.Index.TOKENIZED otherwise t