Hi,
I've experimented a bit with MultiFieldQueryParser
(http://lucene.apache.org/core/4_2_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html)
But it seems to search for each of a query's terms in each field specified in
the constructor. So, as the doc says, if you q
If you are using MMapDirectory (default on 64 bit platforms) then they are
already in filesystem cache and directly accessible like RAM to indexreader. No
need to cache separately.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Or
Maybe consider the data saved only after you have committed them.
Acknowledge new data in batches after a commit?
2013/4/3 crocket
> Since I use NRT readers for Index and TaxonomyIndex, I don't have to commit
> to see the changes.
>
> Now, I don't know if indexes are ever committed.
>
> If they
Since I use NRT readers for Index and TaxonomyIndex, I don't have to commit
to see the changes.
Now, I don't know if indexes are ever committed.
If they don't commit automatically, I'd have to do it on a regular basis.
What should I do about committing?
These are not document hits but text hits (to be more specific, spans).
For the search result it is necessary to have the precise number of document
and text hits and a relatively small number of matched text snippets.
I've tried several approaches to optimize the search algorithm but they didn't
Hi,
I have the same question related to LMJelinekMercerSimiliarity class.
protected float score(BasicStats stats, float freq, float docLen) {
return stats.getTotalBoost() *
(float)Math.log(1 + ((1 - lambda) * freq / docLen) / (lambda *
((LMStats)stats).getCollectionProbability()));
On Tue, Apr 2, 2013 at 4:10 PM, Sharon W Tam wrote:
> Are there any other ideas?
Since scoring seems to be what you are interested in, you could have a
look to payloads: there can store arbitrary data and can be used to
score matches.
--
Adrien
On Tue, Apr 2, 2013 at 4:39 PM, Igor Shalyminov
wrote:
> Yes, the number of documents is not too large (about 90 000), but the queries
> are very hard. Although they're just boolean, a typical query can produce a
> result with tens of millions of hits.
How can there be tens of millions of hits
Yes, the number of documents is not too large (about 90 000), but the queries
are very hard. Although they're just boolean, a typical query can produce a
result with tens of millions of hits.
Single-threadedly such a query runs ~20 seconds, which is too slow. therefore,
multithreading is vital f
Hi,
Thanks for the reply ;)
>
> this is all not public tot he code because it is also subject to change!
>
> With Lucene 4.x, you can assume:
> directoryReader.leaves().get(i) corresponds to segmentsinfos.info(i)
>
> WARNING: But this is only true if:
> - the reader is instanceof DirectoryR
Thanks for your help, Adrien. But unfortunately, my term frequencies will
be partial counts so they won't be integers, And finding a common
denominator and scaling the rest of the frequencies accordingly will affect
the relative lengths of the documents which will affect the Lucene scoring
becaus
On Tue, Apr 2, 2013 at 2:29 PM, Igor Shalyminov
wrote:
> Hello!
Hi Igor,
> I have a ~20GB index and try to make a concurrent search over it.
> The index has 16 segments, I run SpanQuery.getSpans() on each segment
> concurrently.
> I see really small performance improvement of searching concurre
Hi,
this is all not public tot he code because it is also subject to change!
With Lucene 4.x, you can assume:
directoryReader.leaves().get(i) corresponds to segmentsinfos.info(i)
WARNING: But this is only true if:
- the reader is instanceof DirectoryReader
- the segmentinfos were opened on the e
Hi,
I have a question about the Index Readers in Lucene.
As far as I understand from the documentation, with the Lucene 4, we can create
an Index Reader from DirectoryReader.open(directory);
>From the code of the DirectoryReader, I have seen that it uses the
>SegmentReader to create the reader.
On Tue, Apr 2, 2013 at 12:45 PM, andi rexha wrote:
> Hi Adrien,
> Thank you very much for the reply.
>
> I have two other small question about this:
> 1) Is "final int freq = docsAndPositions.freq();" the same with
> "iterator.totalTermFreq()" ? In my tests it returns the same result and from
>
Hello!
I have a ~20GB index and try to make a concurrent search over it.
The index has 16 segments, I run SpanQuery.getSpans() on each segment
concurrently.
I see really small performance improvement of searching concurrently. I
suppose, the reason is that the sizes of the segments are very non-
Hi Adrien,
Thank you very much for the reply.
I have two other small question about this:
1) Is "final int freq = docsAndPositions.freq();" the same with
"iterator.totalTermFreq()" ? In my tests it returns the same result and from
the documentation it seems that the result should be the same.
Hi Andi,
Here is how you could retrieve positions from your document:
Terms termVector = indexReader.getTermVector(docId, fieldName);
TermsEnum reuse = null;
TermsEnum iterator = termVector.iterator(reuse);
BytesRef ref = null;
DocsAndPositionsEnum docsAndPositions = null;
Hi,
I have a problem while trying to extract term vector's attributes (i.e.
position of the terms). What I have done was:
Terms termVector = indexReader.getTermVector(docId, fieldName);
TermsEnum reuse = null;
TermsEnum iterator = termVector.iterator(reuse);
PositionIncr
19 matches
Mail list logo