Hi Rakesh,
I've spend the afternoon and the evening playing around your test
because I could not stand Hibernate Search to be significantly slower
than native Lucene ;)
I found several causes but as far as your test case is concerned, it
turns out you are reaching the scalability limit of a
Yes, I do close the old reader.
I have a large index, my system is doing real time updates: 1 thread writing
batches of updates to the index, after each index update, it updates the
reader. I have two readers open always, one is serving the search requests,
while the other updates and the two flips
I get the following error trace -
java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.FSDirectory@/Users/projects/workspace/project_name/web/file:/Users/.m2/repository/com/mycompany/project_name/2.1.0-internal-65-SNAPSHOT/suggesters-2.1.0-internal-65-SNAPSHOT.jar!/lu
On Wed, May 28, 2008 at 5:36 AM, stefano coppi <[EMAIL PROTECTED]> wrote:
> text: BB AA
> query: "AA BB"~0 why the result is false? Aren't BB AA contigous?
> result: false
>
> text: BB AA
> query: "AA BB"~1
> result: false
>
> text: BB AA
> query: "AA BB"~2 why with proximity=2 the result is tru
As someone that has done a lot of reopens, I can vouch there is no leak
under simple, normal usage. Are you sure your closing the original
reader after getting the reopened reference?
Michael Busch wrote:
Hi John,
hmm not good. I will take a look. It has probably to do with the
reference cou
Hi John,
hmm not good. I will take a look. It has probably to do with the
reference counting. Are you doing anything special? E. g. do you have
own reader implementations that you call reopen() on? What kinds of
readers are you using?
Are you maybe able to provide a heapdump?
-Michael
John
LOL - I sure wish it was! :)
Sadly, that was a typo (Luke, for all its beauties, does not seem to grasp
the concept of a clipboard so the sample was a manual transcription).
A few more details - don't know if this will help or not.
Same query as before, when I do a rewrite of the query in Luke I
I think you could override all the Similarity factors except tf() with
1, such that the term frequency is the only factor in the scoring.
Then you just submit the term as a query. Note, I think you will need
to override the similarity during indexing, too, so that norm length
is turned of
You should consider keeping the PageRank (and any other more dynamic
data) in a separate index (with the documents in the same oder as your
bigger, more static index) and then use a ParallelReader on both of
them. See:
http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/ParallelReade
Hi All,
I am trying to figure out a quick way to find the top N documents sorted
by frequency of a term.
I found:
IndexRead.termDocs()
which provides an enumeration of doc() and freq() but it returns an
enumeration sorted by doc number. Is there a way to get the results
sorted by freq? Or is
I think this is not suitable for my system since the num of pages is very
large that will cost much time for reindex
2008/5/28, Ian Lea <[EMAIL PROTECTED]>:
>
> Yes. But you'd have to do that anyway if you are storing pagerank in the
> index.
>
> One point on your 20s response time for sorting -
It's unclear what you *should* expect. How is your data
distributed?
In other words, how many documents do you have? In this example,
for instance,
1. TTL:data AND TTL:store OR TTL:variable => 3,733 results
it considered the TTL:data part only.
it's perfecily reasonable if every document that had
Yes. But you'd have to do that anyway if you are storing pagerank in the index.
One point on your 20s response time for sorting - is that for the
first sort or subsequent ones?
I believe that the first one will usually be substantially slower.
But sorting is always likely to be slower than not so
thanks lan, but this means that i must reindex these pages while the
pagerank score changed?
在08-5-28,Ian Lea <[EMAIL PROTECTED]> 写道:
>
> Hi
>
>
> Maybe you could use the pagerank score, possibly modified, as document
> boost at indexing time. From the javadocs for
> Document.setBoost(boost)
>
>
Hi,
I have some issue with boolean queries.
I am using Lucene-core-2.3.1.
I have done test on boolean query with 3 terms (data, store, variable) in my
TTL field. The TTL field is indexed and searched using StandardAnalyzer.
The three terms when searched individually gave the following result
1
If you are using a more recent version of Lucene you might check out
https://issues.apache.org/jira/browse/LUCENE-1026 and try the
WarmingIndexAccessor. Even if you don't use it, it will serve as a
decent example.
Ian Lea wrote:
Hi
I think that you will need to close your reader objects.
Hi
I think that you will need to close your reader objects. Hanging on
to them may prevent files from being deleted and you are likely to hit
memory or open file limitations.
We generally use a low tech approach:
save reference to old reader/searcher
create new one and give that out to those
Hi
Maybe you could use the pagerank score, possibly modified, as document
boost at indexing time. From the javadocs for
Document.setBoost(boost)
"Sets a boost factor for hits on any field of this document. This
value will be multiplied into the score of all hits on this document"
so will give
hi all ,
I have a problem that how to "combine" two score to sort the search
result documents.
for example I have 10 million pages in lucene index , and i know their
pagerank scores. i give a query to it , every docs returned have a
lucene-score, mark it as R (relevant score), and i al
Hello out there,
We have implemented some open source desktop searching app based on Lucene
http://sourceforge.net/projects/dynaq
Development always goes further, and currently we make experiments with the
file-lock based writer (/reader) synchronization capabilities of Lucene, in
order to waste
Hello everyone,
I'm testing the use of proximity search operator (~) in Lucene.
I noticed a strange behaviour when the terms in the text are not in the same
order of the query.
Here are some examples:
text: AA BB
query: "AA BB"~0
result: true
text: AA ZZ BB
query: "AA BB"~0
result: false
tex
Hello out there,
We have implemented some open source desktop searching app based on Lucene
http://sourceforge.net/projects/dynaq
Development always goes further, and currently we make experiments with the
file-lock based writer (/reader) synchronization capabilities of Lucene, in
order to waste
22 matches
Mail list logo