Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?

2011-12-11 Thread Sean Tong
Hi, We plan to upgrade the Lucene library in our application from 2.4.1 to 3.5.0. I have been running benchmark tests that come with Lucence. To my surprise, I found that the indexing in 3.5.0 is significant slower than 2.4.1 for the Wikipedia data. Attached is the algorithm for the tests.

Re: tokenizing text using language analyzer but preserving stopwords if possible

2011-12-11 Thread KARTHIK SHIVAKUMAR
Hi >> tokenize the original foreign text into words Need to Identify the Appropriate analyzer ( foreign language before Indexing ...) with regards karthik On Wed, Dec 7, 2011 at 4:57 PM, Avi Rosenschein wrote: > On Wed, Dec 7, 2011 at 00:41, Ilya Zavorin wrote: > > > I need to implement a "

Re: Lucene bangalore chapter

2011-12-11 Thread KARTHIK SHIVAKUMAR
Hi I definitely think there is NONE.. ;) with regards karthik On Tue, Dec 6, 2011 at 11:41 AM, Vinaya Kumar Thimmappa < vthimma...@ariba.com> wrote: > is there a lucene Bangalore chapter ? > > > -Vinaya > > > - > To unsu

Re: Score per position

2011-12-11 Thread arnon ma
Lisheng, thanks for the response.   If I understand correctly, both IndexDocValues and FieldCache are based on a single value per Document per Field, that can be taken into account for scoring. Is that correct ? We need a value per Document per Field *per Term*, like term frequency. Can this be