Phew, thanks for testing! It's all explainable...
When you have a reader open, it prevents the segments it had opened
from being deleted.
When you close that reader, the segments could be deleted, however,
that won't happen until the writer next tries to delete, which it does
only periodically (
Hi, I have done some testing that I would like to share with you.
I am starting my tests with an unoptimized 40Mb index. I have 3 test cases:
1) open a writer, optimize, commit, close
2) open a writer, open a reader from the writer, optimize, commit, close
3) same as 2) except the reader is opene
http://nlp.stanford.edu/IR-book/information-retrieval-book.html gives
a good introduction what happens under the hood of a search engine and
you can download it for free. It does not explain Lucene directly, but
a lot of IR algorithms that are used in Lucene (and any other search
engine) are explai
On Friday 27 November 2009 14:49:07 Michael McCandless wrote:
>
> So the "don't care" equivalent here is to use IndexSearcher's normal
> search APIs (ie, we don't use Version to switch this on or off).
Thanks for the hint. For an unknown reason I once fell into
the "search(query, filter, collecto
I'll check all thsese with my ops guy on monday and report back.
Thanks for the interest.
On Fri, Nov 27, 2009 at 4:00 PM, Michael McCandless
wrote:
> Any Lucene-related exceptions hit in your env? What OS (looks like
> Windows, but which one?), filesystem are you on?
>
> And are you really cert
Any Lucene-related exceptions hit in your env? What OS (looks like
Windows, but which one?), filesystem are you on?
And are you really certain about the java version being used in your
production env? Don't just trust which java your interactive shell
finds on its PATH -- double check how your a
On Fri, Nov 27, 2009 at 6:23 AM, jm wrote:
> Ok, I got the index from the production machine, but I am having some
> problem to find the index..., our process deals with multiple indexes,
> in the current exception I cannot see any indication about the index
> having the issue. I opened all my in
On Fri, Nov 27, 2009 at 8:13 AM, Stefan Trcek wrote:
> On Friday 27 November 2009 12:07:07 Michael McCandless wrote:
>> Re: What does "out of order" mean?
>>
>> It refers to the order in which the docIDs are delivered to your
>> Collector.
>>
>> "Normally" they are always delivered in increasing o
On Friday 27 November 2009 12:07:07 Michael McCandless wrote:
> Re: What does "out of order" mean?
>
> It refers to the order in which the docIDs are delivered to your
> Collector.
>
> "Normally" they are always delivered in increasing order.
>
> However, some queries (well, currently only certain
I manually did CheckIndex in all indexes and found two with issues:
first
Segments file=segments_42w numSegments=21 version=FORMAT_HAS_PROX [Lucene 2.4]
1 of 21: name=_109 docCount=10410
compound=true
hasProx=true
numFiles=1
size (MB)=55,789
no deletions
test: open reader
Additionally there is a whitepaper on
http://www.lucidimagination.com/How-We-Can-Help/whitepaper
What is new in Lucene 2.9
which gives you an overview over the new features - this is not on a
API level though.
simon
On Fri, Nov 27, 2009 at 12:42 PM, Helmut Jarausch
wrote:
> Hi,
>
> could anybod
That's the way to go:
public TopDocs search(Query query,
int n)
throws IOException
Finds the top n hits for query.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: He
There is indeed no search(Query) method in 3.0.
Your best bet is to compile your application against 2.9 and fix any
deprecation warnings - see the javadocs for alternatives. If it
compiles cleanly against 2.9 it should also compile against 3.0.
--
Ian.
On Fri, Nov 27, 2009 at 11:42 AM, Helm
Hi,
could anybody please point me to some documention with (more detailed)
information about the API change.
E.g. (in PyLucene)
Q=lucene.TermQuery(lucene.Term('@URI',BookNr))
FSDir= lucene.SimpleFSDirectory(lucene.File('/home/jarausch/Bib_Dev/DIR/'))
index_reader= lucene.IndexReader.open(FSDir)
Ok, I got the index from the production machine, but I am having some
problem to find the index..., our process deals with multiple indexes,
in the current exception I cannot see any indication about the index
having the issue. I opened all my indexes with luke and old opened
succesfully, some had
Anyways thanks for your suggestion sir
--- On Fri, 27/11/09, Uwe Schindler wrote:
From: Uwe Schindler
Subject: RE: To exit the while loop if match is found
To: java-user@lucene.apache.org
Date: Friday, 27 November, 2009, 11:19 AM
This question is out of the scope of Lucene. Try using some AJ
This question is out of the scope of Lucene. Try using some AJAX frameworks
like YUI for the communication between your textbox and the server.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: DHIVYA M [ma
Sir anyways i want this to happen in keypress event of the text box.
Can u suggest me a way for this?
Thanks in advance,
Dhivya.M
--- On Fri, 27/11/09, Uwe Schindler wrote:
From: Uwe Schindler
Subject: RE: To exit the while loop if match is found
To: java-user@lucene.apache.org
Date: Friday,
Phew :) Thanks for bringing closure!
Mike
On Fri, Nov 27, 2009 at 6:02 AM, Michael McCandless
wrote:
> If in fact you are using CFS (it is the default), and your OS is
> letting you use 10240 descriptors, and you haven't changed the
> mergeFactor, then something is seriously wrong. I would tri
You were right, my bad...
I have an async reader closing on a scheduled basis (after the writer
refreshes the index, to not interrupt the ongoing searches), but while
I've setup the scheduling for my first two index, I've forgotten it in
my third... oh dear...
Thanks anyway the info, it was usefu
It refers to the order in which the docIDs are delivered to your Collector.
"Normally" they are always delivered in increasing order.
However, some queries (well, currently only certain BooleanQuery
cases) can achieve substantial search speedup if they are allowed to
deliver docIDs to your collec
If in fact you are using CFS (it is the default), and your OS is
letting you use 10240 descriptors, and you haven't changed the
mergeFactor, then something is seriously wrong. I would triple check
that all readers are being closed.
Or... if you list the index directory, how many files do you see?
Another more simplier approach is to use
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Prefix
TermEnum.html
It is a wrapper term enumeration that lists all Terms with the supplied
prefix. You do not need to filter anything manual, just use a while-loop:
IndexReader reader
Hi,
The documentation of org.apache.lucene.search.Collector uses the obscure
term "out of order". What does "order" mean? The natural order of
document IDs, a scoring order, or some other order?
--
Cheers,
Alex
-
To unsubscr
On Fri, Nov 27, 2009 at 11:37 AM, Michael McCandless
wrote:
> Are you sure you're closing all readers that you're opening?
Absolutely. :) (okay, never say this, but I had bugz because of this
previously so I'm pretty sure that one is ok).
> It's surprising with normal usage of Lucene that you'd
Also, if you're able to reproduce this, can you call
writer.setInfoStream and capture & post the resulting output leading
up to the exception?
Mike
On Thu, Nov 26, 2009 at 7:12 AM, jm wrote:
> The process is still running and ops dont want to stop it. As soon as
> stops I'll try checkindex.
>
>
Are you sure you're closing all readers that you're opening?
It's surprising with normal usage of Lucene that you'd run out of
descriptors, with its default mergeFactor (have you increased the
mergeFactor)?
You can also enable compound file, which uses far fewer file
descriptors, at some cost to
Try to open with very large value (MAX_INT) it will load only first
term, and look up the rest from disk.
On Fri, Nov 27, 2009 at 12:24, Michael McCandless
wrote:
> If you are absolutely certain you won't be doing any lookups by term.
>
> The only use case I know of is internal, when Lucene's Seg
If you are absolutely certain you won't be doing any lookups by term.
The only use case I know of is internal, when Lucene's SegmentMerger
is merging the segment with other segments. In this case, the merger
does a linear iteration of all terms, and never a lookup by term, so
we save CPU/RAM by n
have you considered a custom sort strategy using a ScoreDocComparator ?
Inside your implementation you have access to individual doc scores and you
could create a parallel (to your docs) array of floats which stores your
r1,r2,r3 etc values.
Then use this array to implement your int compare(ScoreDo
Thanks,
May i know the purpose of using negative value?
Regards
Ganesh
- Original Message -
From: "Michael McCandless"
To:
Sent: Friday, November 27, 2009 3:17 PM
Subject: Re: IndexDivisor
> This is the expected behavior.
>
> If you intend to use the reader for searching, looking
This is the expected behavior.
If you intend to use the reader for searching, looking doc freq,
deleting docs, etc, you must pass a non-negative value for
indexDivisor.
Mike
On Fri, Nov 27, 2009 at 12:00 AM, Ganesh wrote:
> Hello all,
>
> I am using Lucene v2.9.1, If I open my reader with posit
Hi,
I've a requirement that involves frequent, batched update of my Lucene
index. This is done by a memory queue and process that periodically
wakes and process that queue into the Lucene index.
If I do not optimize my index, I'll receive "too many open files"
exception (yeah, right, I can get th
33 matches
Mail list logo