Re: Lucene tokenization

2012-03-27 Thread Paul Libbrecht
Nilesh, the StandardAnalyzer is full of generally useful special cases, including emails and numbers detection. I am supposing you met one such special case which has a justification of some sort. I can't tell you why but I can tell it's really hard to change because others rely on this somehow

Re: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread jianwen lou
thanks so much,Brandon Mintern.My mistak,sorry for everyone. On Wed, Mar 28, 2012 at 3:12 AM, Brandon Mintern wrote: > On Tue, Mar 27, 2012 at 12:21 AM, jianwen lou wrote: > > I want to store the long type value to my index files like follwing: > > > >NumericField priceField = ne

Re: boolean score calculation

2012-03-27 Thread Pavel Goncharik
Dear Lucene users and developers, sorry for getting back to this old subject, but we are in the position of re-evaluating our current implementation, which uses re-compiled version of Lucene 3 with boolean scorers multiplying sub-scores. I was hoping that "flexible ranking" in Lucene 4 will provid

Re: Document-Ids and Merges

2012-03-27 Thread Shai Erera
Or ... move to use a per-segment array. Then you don't need to rely on doc IDs changing. You will need to build the array from the documents that are in that segment only. It's like FieldCache in a way. The array is relevant as long as the segment exists (i.e. not merged away). Hope this helps.

Re: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread Brandon Mintern
On Tue, Mar 27, 2012 at 12:21 AM, jianwen lou wrote: > I want to store the long type value to my index files like follwing: > >                NumericField priceField = new NumericField("price"); >                priceField.setDoubleValue(temp.getCurrentprice()); >                document.add(pric

Re: delete entries from posting list Lucene 4.0

2012-03-27 Thread Zeynep P.
While using the pruning package, I realised that ridf is calculated in RIDFTermPruningPolicy as follows: Math.log(1 - Math.pow(Math.E, termPositions.freq() / maxDoc)) - df However, according to the original paper (Blanco et al.) for residual idf, it should be -log(df/D) + log (1 - e^(*-*tf/D)). T

RE: TVD, TVX and TVF files

2012-03-27 Thread Uwe Schindler
Maybe you only see CFS files? If this is the case, your index is in compound file format. In that case (the default), to get the raw files, disable compound files in the merge policy! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Or

Re: TVD, TVX and TVF files

2012-03-27 Thread Michael McCandless
The code seems OK on quick glance... Are you closing the writer? Are you hitting any exceptions? Mike McCandless http://blog.mikemccandless.com On Tue, Mar 27, 2012 at 12:19 PM, Luis Paiva wrote: > Hey all, > > i'm in my first steps in Lucene. > I was trying to index some txt files, and my pr

RE: Lucene tokenization

2012-03-27 Thread Steven A Rowe
Hi Nilesh, Which version of Lucene are you using? StandardTokenizer behavior changed in v3.1. Steve -Original Message- From: Nilesh Vijaywargiay [mailto:nilesh.vi...@gmail.com] Sent: Tuesday, March 27, 2012 2:04 PM To: java-user@lucene.apache.org Subject: Lucene tokenization I have a

TVD, TVX and TVF files

2012-03-27 Thread Luis Paiva
Hey all, i'm in my first steps in Lucene. I was trying to index some txt files, and my program doesn't construct the term vector files. I would need these files. (.tvd, .tvx, .tvf) I'm attaching my code so anyone can help me. Thank you all in advance! Sorry if i'm repeating the question, but

Re: Document-Ids and Merges

2012-03-27 Thread Michael McCandless
In general how Lucene assigns docIDs is a volatile implementation detail: it's free to change from release to release. Eg, the default merge policy (TieredMergePolicy) merges out-of-order segments. Another eg: at one point, IndexSearcher re-ordered the segments on init. Another: because Concurre

Re: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread Erick Erickson
I'll, of course, defer to Uwe for technical Lucene issues, but you've got a copy/paste error it looks like. I doubt it's the root of your problem, but this code reuses priceField, it seems like you intend the second to use salesField NumericField priceField = new NumericField("price");

Re: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread jianwen lou
It seems that the Analyzer i used in my project is the problem.I use CJKAnalyzer,I am not exactly understand the lucene analysis and tokenizer process .Is there other way to do this: I want to store numbers and date time in the lucene filed and to use the filed to filter and range the search,thanks

RE: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread Uwe Schindler
Hi, > I am not exactly understand the precisionStep arg,I need to add the arg? RTFM: http://goo.gl/PlhhO > On Tue, Mar 27, 2012 at 3:48 PM, jianwen lou wrote: > > > No,There is no multi-thread building index at same time, I google and > > get the result, i use 64 bit jvm. It matters? > > > >

RE: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread Uwe Schindler
The bug mentioned in this link was a multithread bug (what I asked you). If you reuse Documents and Fields this can happen, otherwise not. This code is heavily tested and the code you sent cannot fail. Maybe its different to the one you actually use? - Uwe Schindler H.-H.-Meier-Allee 63, D-282

Re: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread jianwen lou
I am not exactly understand the precisionStep arg,I need to add the arg? On Tue, Mar 27, 2012 at 3:48 PM, jianwen lou wrote: > No,There is no multi-thread building index at same time, > I google and get the result, i use 64 bit jvm. It matters? > > > http://lucene.472066.n3.nabble.com/Lucene-3-

Re: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread jianwen lou
No,There is no multi-thread building index at same time, I google and get the result, i use 64 bit jvm. It matters? http://lucene.472066.n3.nabble.com/Lucene-3-4-shift-bug-in-possibly-invalid-use-of-NumericTokenStream-td3592962.html F:\Java\open-source\lucene>java -version java version "1.6.0_25"

Document-Ids and Merges

2012-03-27 Thread Christoph Kaser
Hi all, I have a search application with 16 million documents that uses custom scores per document using a ValueSource. These values are updated a lot (and sometimes all at once), so I can't really write them into the index for performance reasons. Instead, I simply have a huge array of float

RE: NumericField exception java.lang.IllegalStateException: call set???Value() before usage in lucene 3.5

2012-03-27 Thread Uwe Schindler
Hi, Are you sure that you are not reusing the same NumericField instances across different threads? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: jianwen lou [mailto:loujan...@gmail.com] > Sent: Tuesd