Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-21 Thread Ahmet Arslan
> Date: Monday, March 21, 2011, 7:39 PM > One more thing: It is actually not > clear to me how to use PhraseQuery... I > thought I can just pass a phrase to it, but I see only > add(Term) method... > should I parse the string by myself to single terms ? Yes, you need to do it. QueryParser transf

Re: Performance problems with lazily loaded fields

2011-03-21 Thread Sanne Grinovero
2011/3/21 Brian Hurt : > I'm having a problem with the performance of lazily-loaded fields with > lucene.  The basic structure of the code is that I get a set of documents > back from a query, then iterate through them, reading most fields to collect > fragments.  This is taking an excessively long

Re: Append Codec random testing

2011-03-21 Thread Simon Willnauer
Jason, please mail to dev@l.a.o simon On Mon, Mar 21, 2011 at 6:09 PM, Jason Rutherglen wrote: > I'm seeing an error when using the misc Append codec. > > java.lang.AssertionError > at > org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:107) > at > org.apache.lucene.

Re: termIndexInterval, CheckIndex, size of tis file and Lucene index compression

2011-03-21 Thread Michael McCandless
Your math is right -- looks like it really is ~9 bytes per term (assuming no bugs in CheckIndex!). How long did this CheckIndex take to run...? On the file format, one correction: if the docFreq is < skipInterval (default 16) then there is no skip data and we don't write the SkipDelta. The vast

Performance problems with lazily loaded fields

2011-03-21 Thread Brian Hurt
I'm having a problem with the performance of lazily-loaded fields with lucene. The basic structure of the code is that I get a set of documents back from a query, then iterate through them, reading most fields to collect fragments. This is taking an excessively long amount of time- mostly in my c

Building a query of single terms...

2011-03-21 Thread Patrick Diviacco
I'm new to Lucene and I would like to know what's the difference (if there is any) between PhraseQuery.add(Term1) PhraseQuery.add(Term2) PhraseQuery.add(Term3) and term1 = new TermQuery(new Term(...)); booleanQuery.add(term1, BooleanClause.Occur.SHOULD); term2 = new TermQuery(new Term(...)); bo

Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-21 Thread Patrick Diviacco
One more thing: It is actually not clear to me how to use PhraseQuery... I thought I can just pass a phrase to it, but I see only add(Term) method... should I parse the string by myself to single terms ? On 21 March 2011 18:05, Patrick Diviacco wrote: > >> If description field is tokenized/ana

termIndexInterval, CheckIndex, size of tis file and Lucene index compression

2011-03-21 Thread Burton-West, Tom
I'm trying to get a feel for the impact of changing the termIndexInterval from the default of 128 to 1024 (8 * 128). This reduces the size of the tii file by 1/8th but in the worst case requires doing a linear scan of 1024 terms instead of 128 in memory. I'm not so concerned about the perform

Append Codec random testing

2011-03-21 Thread Jason Rutherglen
I'm seeing an error when using the misc Append codec. java.lang.AssertionError at org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:107) at org.apache.lucene.index.codecs.BlockTermsReader$FieldReader$SegmentTermsEnum._next(BlockTermsReader.java:661) at org.apache.luce

Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-21 Thread Patrick Diviacco
> > > If description field is tokenized/analyzed during indexing you need to use > PhraseQuery. > Uhm yeah I'm using a WhitespaceAnalyzer. This is the code using for indexing: writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new IndexWriterConfig(org.apache.lucene.util.Version.LUCENE_40, new

Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-21 Thread Ahmet Arslan
>     description = new TermQuery(new > Term("description", "my string")); > > I ask Lucene to consider "my string" as unique word, right? Correct. > I actually need to consider each word, should I use > PhraseQuery instead ? If description field is tokenized/analyzed during indexing you need

How to normalize Lucene scores... (over all queries)

2011-03-21 Thread Patrick Diviacco
I'm combining several scores for my queries performed with Lucene and other software. My issue is that I have lucene scores + other scores (not related to Lucene) for each query result. The other scores are all normalized between 1 and 0. I need to normalize Lucene scores (over all queries) beca

Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-21 Thread Patrick Diviacco
I'm new to Lucene. If I use description = new TermQuery(new Term("description", "my string")); I ask Lucene to consider "my string" as unique word, right ? I actually need to consider each word, should I use PhraseQuery instead ? Or is it correct ? thanks

Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-21 Thread Michael McCandless
Unfortunately, you can't easily recover from this (except by reindexing your docs again). Failing to call IW.commit() or IW.close() means no segments file was written... It is theoretically possible to reconstruct a segments file by "listing" all files and figuring out which segments there are, d