> Date: Monday, March 21, 2011, 7:39 PM
> One more thing: It is actually not
> clear to me how to use PhraseQuery... I
> thought I can just pass a phrase to it, but I see only
> add(Term) method...
> should I parse the string by myself to single terms ?
Yes, you need to do it.
QueryParser transf
2011/3/21 Brian Hurt :
> I'm having a problem with the performance of lazily-loaded fields with
> lucene. The basic structure of the code is that I get a set of documents
> back from a query, then iterate through them, reading most fields to collect
> fragments. This is taking an excessively long
Jason, please mail to dev@l.a.o
simon
On Mon, Mar 21, 2011 at 6:09 PM, Jason Rutherglen
wrote:
> I'm seeing an error when using the misc Append codec.
>
> java.lang.AssertionError
> at
> org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:107)
> at
> org.apache.lucene.
Your math is right -- looks like it really is ~9 bytes per term
(assuming no bugs in CheckIndex!).
How long did this CheckIndex take to run...?
On the file format, one correction: if the docFreq is < skipInterval
(default 16) then there is no skip data and we don't write the
SkipDelta.
The vast
I'm having a problem with the performance of lazily-loaded fields with
lucene. The basic structure of the code is that I get a set of documents
back from a query, then iterate through them, reading most fields to collect
fragments. This is taking an excessively long amount of time- mostly in my
c
I'm new to Lucene and I would like to know what's the difference (if there
is any) between
PhraseQuery.add(Term1)
PhraseQuery.add(Term2)
PhraseQuery.add(Term3)
and
term1 = new TermQuery(new Term(...));
booleanQuery.add(term1, BooleanClause.Occur.SHOULD);
term2 = new TermQuery(new Term(...));
bo
One more thing: It is actually not clear to me how to use PhraseQuery... I
thought I can just pass a phrase to it, but I see only add(Term) method...
should I parse the string by myself to single terms ?
On 21 March 2011 18:05, Patrick Diviacco wrote:
>
>> If description field is tokenized/ana
I'm trying to get a feel for the impact of changing the termIndexInterval from
the default of 128 to 1024 (8 * 128). This reduces the size of the tii file by
1/8th but in the worst case requires doing a linear scan of 1024 terms instead
of 128 in memory. I'm not so concerned about the perform
I'm seeing an error when using the misc Append codec.
java.lang.AssertionError
at
org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:107)
at
org.apache.lucene.index.codecs.BlockTermsReader$FieldReader$SegmentTermsEnum._next(BlockTermsReader.java:661)
at
org.apache.luce
>
>
> If description field is tokenized/analyzed during indexing you need to use
> PhraseQuery.
>
Uhm yeah I'm using a WhitespaceAnalyzer. This is the code using for
indexing:
writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new
IndexWriterConfig(org.apache.lucene.util.Version.LUCENE_40, new
> description = new TermQuery(new
> Term("description", "my string"));
>
> I ask Lucene to consider "my string" as unique word, right?
Correct.
> I actually need to consider each word, should I use
> PhraseQuery instead ?
If description field is tokenized/analyzed during indexing you need
I'm combining several scores for my queries performed with Lucene and other
software.
My issue is that I have lucene scores + other scores (not related to Lucene)
for each query result.
The other scores are all normalized between 1 and 0.
I need to normalize Lucene scores (over all queries) beca
I'm new to Lucene. If I use
description = new TermQuery(new Term("description", "my string"));
I ask Lucene to consider "my string" as unique word, right ?
I actually need to consider each word, should I use PhraseQuery instead ? Or
is it correct ?
thanks
Unfortunately, you can't easily recover from this (except by
reindexing your docs again).
Failing to call IW.commit() or IW.close() means no segments file was written...
It is theoretically possible to reconstruct a segments file by
"listing" all files and figuring out which segments there are,
d
14 matches
Mail list logo