Hi all,
I got OutOfMemoryError at
org.apache.lucene.search.Searcher.search(Searcher.java:183)
My index is 43G bytes. Is that too big for Lucene ?
Luke can see the index has over 1800M docs, but the search is also out
of memory.
I use -Xmx1024M to specify 1G java heap space.
One abnormal thing is
On Thu, Nov 12, 2009 at 5:28 PM, Uwe Schindler wrote:
> Mike: What was the reason for this change?
We first thought this (visiting segments from largest to smallest
size) improved performance, but, then we decided a better optimization
was for Collectors to save tie breaking by knowing the docID
On Thu, Nov 12, 2009 at 5:45 PM, Jacob Rhoden wrote:
>> SearcherManager can work with a near real-time reader (via
>> IndexWriter.getReader), or with a standalone reader (via
>> IndexReader.open), so that's another source of more complexity vs your
>> use case.
>
> There can be quite a large numbe
Thanks Uwe and Mike!
On Thu, Nov 12, 2009 at 11:48 AM, Michael McCandless
wrote:
> Or, just run the junit test "directly", which doesn't try to buffer
> the output, so you can see it "live". Something like this:
>
> java -cp
> .:/usr/local/src/junit-4.4.jar:./build/classes/test:./build/classes/
On 13/11/2009, at 9:19 AM, Michael McCandless wrote:
On Wed, Nov 11, 2009 at 7:33 PM, Jacob Rhoden >
wrote:
The source code for SearcherManager is even downloadable for free:
http://www.manning.com/hatcher3/LIAsourcecode.zip
The example source code does some things that is beyond my level o
On Thu, Nov 12, 2009 at 09:20:30PM +0100, Uwe Schindler said:
> Which version of Lucene are you using and which Version constant do you pass
> to Analyzer and Query Parser? In 2.9.0 there was a bug/incorrect setting
> between the query parser and the Version.LUCENE_CURRENT / Version.LUCENE_29
> set
> By the way, the docStarts should be 5 and then 0, as IndexSearcher starts
> to
> search bigger segments first. Maybe this is your problem, that you have
> only
> looked at the second call?
Oh, that's no longer the case. Sorry. The docBases should be sorted upwards.
Mike: What was the reason for
On Thu, Nov 12, 2009 at 4:44 PM, Jacob Rhoden wrote:
>
> On 12/11/2009, at 8:42 PM, Michael McCandless wrote:
>
>> On Wed, Nov 11, 2009 at 7:33 PM, Jacob Rhoden
>> wrote:
>>>
>>> The source code for SearcherManager is even downloadable for free:
>>> http://www.manning.com/hatcher3/LIAsourcecode.
Could it that you are using the expert IndexSearcher ctor that takes the sub
reader array and docStarts?
Else it is impossible that all docBases are 0 (look into the code).
By the way, the docStarts should be 5 and then 0, as IndexSearcher starts to
search bigger segments first. Maybe this is you
Yes it should be 0 and 5.
I'm not sure what would cause 0 and 0, offhand.
Can you make a small standalone test case showing it?
Mike
On Thu, Nov 12, 2009 at 4:25 PM, Benjamin Heilbrunn wrote:
> Hello everyone,
>
> I'm a little bit confused about the docBase parameter of
> Collector.setNextRead
On 12/11/2009, at 8:42 PM, Michael McCandless wrote:
On Wed, Nov 11, 2009 at 7:33 PM, Jacob Rhoden
wrote:
The source code for SearcherManager is even downloadable for free:
http://www.manning.com/hatcher3/LIAsourcecode.zip
The example source code does some things that is beyond my level o
Hello everyone,
I'm a little bit confused about the docBase parameter of
Collector.setNextReader.
Imagine the following:
- Create new Index
- Index 5 docs
- Call IndexWriter.commit()
- Index 7 docs
- Call IndexWriter.commit()
- close Writer
Now I have a 2-segment index right?
I have
Which version of Lucene are you using and which Version constant do you pass
to Analyzer and Query Parser? In 2.9.0 there was a bug/incorrect setting
between the query parser and the Version.LUCENE_CURRENT / Version.LUCENE_29
setting. If you did not enable position increments in query parser, that
Yes, you're doing something wrong . What, you may ask?
Well, it's kind of hard to say without knowing what analyzers
you use at index AND query time and what the query you're
submitting looks like...
But the very first thing I'd try is to get a copy of Luke and peek at your
index to see if what yo
I have a document with the title "Here, there be dragons" and a body.
When I search for
Here, there be dragons
(no quotes)
with a title boost of 2.0 and a body boost of 0.8
I get the document as the first hit which is what I'd expect.
However, if change the query to
"Here, there be dragons"
Or, just run the junit test "directly", which doesn't try to buffer
the output, so you can see it "live". Something like this:
java -cp
.:/usr/local/src/junit-4.4.jar:./build/classes/test:./build/classes/java:./build/classes/demo
-Dlucene.version=2.9-dev -DtempDir=build -ea
org.junit.runner.JUni
Raise -Xmx, there is a setting in common-build.xml or buidl.xml
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Thursday, November 12, 2009 8:3
Is there a setting to fix this?
[junit] Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
[junit] at java.util.Arrays.copyOf(Arrays.java:2882)
[junit] at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
[junit] at
java.lang
Thanks again
I am not really sure, why it is enough for you to sort the first 50 highest
ranking hits, but if you only want to do this, sorting afterwards is quite
straightforward.
Just to clarify.. And I know it may seem strange...
But I'll mostly be conducting "long" phrase (3 or 4 word)
If there's a bug you're seeing, it's helpful to open an issue and post
code reproducing it.
On Wed, Nov 11, 2009 at 3:41 AM, Albert Juhe wrote:
>
> I think that this is the best way to proceed.
>
> thank you Mike
>
>
>
> Michael McCandless-2 wrote:
>>
>> Can you narrow the leak down to a small se
I am not really sure, why it is enough for you to sort the first 50 highest
ranking hits, but if you only want to do this, sorting afterwards is quite
straightforward.
Another idea is to not index the count itself, but more use the count as a
boost factor for each document. The ranking algorithm o
It is only sorting the top 50 hits, yes, but do do that, it needs to look at
the
*value* of the field for each and every of the billions of documents. You
can
do this without using memory if you're willing to deal with disk seeks, but
doing billions of those are going to mean that this query most
Ok. Thanks.
The doc. says:
"Finds the top |n| hits for |query|, applying |filter| if non-null, and
sorting the hits by the criteria in |sort|."
I understood that only the hits (50 in this) for the current search
would be sorted...
I'll just do the ordering afterwards. Thank you for clarifyin
I'm trying to use a precompiled Lucene index from within a WAR archive, and
was having difficulty, but found a possible solution:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200305.mbox/%3c20030524152100.28075.qm...@web12707.mail.yahoo.com%3e
The gotcha to the solution: it's written
Sorting utilizes a FieldCache: the forward lookup - the value a document has
for a
particular field (as opposed to the usual "inverted" way of looking at all
documents
which contains a given term), which lives in memory, and takes up as much
space
as one 4-bytes * numDocs.
If you've indexed the en
To sort on the count field must be indexed (but not tokenized), it does not
need to be stored. But In any case, sort needs lots of memory. How many
documents do you have?
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original M
If you run the zoie test turned to nrt, you can replicate it rather easily:
While the test is running, do lsof on your process, e.g.
lsof -p | | wc
-John
On Thu, Nov 12, 2009 at 8:24 AM, John Wang wrote:
> Well, I have code in the finally block to call IndexReader.close for every
> reader I
Well, I have code in the finally block to call IndexReader.close for every
reader I get from IndexWriter.getReader.
On Mon, Nov 9, 2009 at 2:43 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Does this look like a real leak John? You're definitely closing every
> reader you get back
Hello List.
I'm having a problem when I add a Sort object to my searcher:
docs = searcher.search(parser.parse(search), null, 50, sort);
Every time I execute a query I get an OutOfMemoryError exception.
But if I execute the query without the Sort object it works fine
Let me briefly explain ho
Dear all,
I am pretty sure it's trivial and I apologize for raising this issue.
I wish to access the index in the order driven by:
Term+"Field name"+Frequency or
Frequency+Term+"Field Name".
I read the terms in the order driven by "Field name"+Term+°Frequency as
follow:
Directory fsd =
On Wed, Nov 11, 2009 at 7:33 PM, Jacob Rhoden wrote:
> The source code for SearcherManager is even downloadable for free:
> http://www.manning.com/hatcher3/LIAsourcecode.zip
>
> The example source code does some things that is beyond my level of
> understanding
> of lucene. ie:
> 1) To me it loo
31 matches
Mail list logo