RE: Extracting all documents for a given search

2011-09-18 Thread Uwe Schindler
In recent Lucene versions there is an implementation of the mentioned collector to count hits, so there is no need to implement it: http://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/TotalHitCountCollector.html Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http:

Re: Extracting all documents for a given search

2011-09-18 Thread Trejkaz
On Mon, Sep 19, 2011 at 3:50 AM, Charlie Hubbard wrote: > Here was the prior API I was calling: > >        Hits hits = getSearcher().search( query, filter, sort ); > > The new API: > >        TopDocs hits = getSearcher().search( query, filter, startDoc + > length, sort ); > > So the question is wh

lukeall.jar for lucene version3.4

2011-09-18 Thread janwen
Where is the luke version3.4 along with lucene version 3.4.I can not browser the index generated by lucene 3.4. thanks 2011-09-19 janwen | China website : http://www.qianpin.com/ - 网易闪电邮(fm.163.com),您的专属邮件管家 发件人: Simon Willnauer 发送时

Re: use QueryTermExtractor for RangeQueries

2011-09-18 Thread Simon Willnauer
On Sat, Sep 17, 2011 at 7:19 AM, S Eslamian wrote: > ofcourse I do this. This is my sample cod to get terms: > > > PrefixQuery pq = new PrefixQuery(new Term("field","hell*")); > rewritenQuery = indexSearcher.rewrite(pq); > QueryTermExtractor qte = new QueryTermExtractor(); > WeightedTerm[] wt = qt

Re: Extracting all documents for a given search

2011-09-18 Thread Charlie Hubbard
Here was the prior API I was calling: Hits hits = getSearcher().search( query, filter, sort ); The new API: TopDocs hits = getSearcher().search( query, filter, startDoc + length, sort ); So the question is what new API can I use that allows me to extract all documents matching t

Re: Size of lucene norm file

2011-09-18 Thread Erick Erickson
Here's a useful link as well: http://lucene.apache.org/java/3_0_2/fileformats.html#file-names Erick On Sun, Sep 18, 2011 at 1:17 AM, roz dev wrote: > Norms (*.nrm) > > Norms are an index time normalization factor that can be factored into > scoring. Document and field boosts as well as length n

Re: Size of lucene norm file

2011-09-18 Thread Li Li
docNum * IndexedFieldsNum * 1 Bytes you should disable indexed fields which are not used for relevancy rank. On Sun, Sep 18, 2011 at 5:20 AM, roz dev wrote: > Hi, > > I want to estimate the size of NORM file that lucene will generate for a 20 > Gb index which has 2.5 Million Docs and 50 fields

RE: How to estimate the size of lucene .nrm file

2011-09-18 Thread Uwe Schindler
Hi, The size is easy to calculate (it needs one byte per document and field): [Number of documents] * 1 byte * [number of indexed fields with norms enabled] - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > Fr

RE: Size of lucene norm file

2011-09-18 Thread Uwe Schindler
Hi, You can disable norms for fields where you don't need them. For standard full text searches, they are important, but for e.g. primary key lookups or numeric fields or fields that are only used for sorting, they are useless. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://

Re: Size of lucene norm file

2011-09-18 Thread roz dev
Norms (*.nrm) Norms are an index time normalization factor that can be factored into scoring. Document and field boosts as well as length normalization are applied with norms. When in memory, norms occupy one byte per document for each field with norms on, even if only one document has norms on fo

Re: Size of lucene norm file

2011-09-18 Thread janwen
What is NORM file? On 2011-9-18 5:20, roz dev wrote: Hi, I want to estimate the size of NORM file that lucene will generate for a 20 Gb index which has 2.5 Million Docs and 50 fields in each document. Is there any formula to predict it? And, what is the RAM cost of this nrm file. Thanks S