Thanks for the ideas.
We are testing out the methods and changes suggested to see if they
work with our current set up, and are checking if the disks are the
bottleneck in this case, but feel free to drop more hints. :)
At the moment we are copying the index at an offpeak hour, but we
would also
We ended up using String newquery = query.replace(query, ":",
with space in quotes after the ":". It worked great. Now results come back
even if you use
colon in the query. And one can still use ":" as a special operator if there
is no space
afterwards. Great suggestion.
Thanks!
Felix
--
Hi Ivan, Chris and all!
I'm that contributor of LUCENE-769 and I recommend it too :)
OutOfMemory error was one of main reasons for me to make it.
Regards,
Artem Vasiliev
On 4/6/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: The problem I suspect is the sorting. As I understand, Lucene
: bu
Craig,
This just shows you that the JVM OOMed while running thar particular method,
and does not necessarily mean that that method is what's consuming your RAM.
Run your app and, if you are using Java 1.5/1.6 run jmap against that java
process and tell it to show you how much memory objects are
And it was as easy as all that...
Thanks.
- Original Message
From: Chris Hostetter <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, April 6, 2007 12:23:30 PM
Subject: Re: Explanation from FunctionQuery
: So we reach a problem at extractTerms. I get an explanation no
: The problem I suspect is the sorting. As I understand, Lucene
: builds internal caches for sorting and I suspect that this is the root
: of your problem. You can test this by trying your problem queries
: without sorting.
if Sorting really is the cause of your problems, you may want to try out
: Would it be fair to say that you can expect OutOfMemory errors if you
: run complex queries? ie sorts, boosts, weights...
not intrinsicly ... the amount of memory used has more to do with the size
of hte index and the sorting done then it does with teh number of clauses
in your query (of course
: So we reach a problem at extractTerms. I get an explanation no problem
...
: I'm using the version of FunctionQuery from the JIRA attachment.
that seems like the heart of the problem ... i haven't looked at the
version in Jira for a while, but the version commited into Solr does
prov
On 4/5/07, Leon <[EMAIL PROTECTED]> wrote:
I need to index and search real numbers in Lucene. I found NumberUtils class
in Solr project which permits one to encode doubles into string so that
alpha numeric ordering would correctly correspond to the ordering on
numbers. When I use ConstantScoreRan
Would it be fair to say that you can expect OutOfMemory errors if you run
complex queries? ie sorts, boosts, weights...
My query looks like this:
+(pathNodeId_2976569:1^5.0 pathNodeId_2976969:1 pathNodeId_2976255:1
pathNodeId_2976571:1) +(pathClassId:1 pathClassId:346 pathClassId:314) -id:369
Thanks Erick for your help.
Actually I was already using Luke!
The only thing I was missing was the possibility of using different
Analyzers at the same time, with PerFieldAnalyzerWrapper.
Thank you again.
Best,
Roberto
Erick Erickson ha scritto:
Really, really, really get a copy of Luke. Rea
On Thursday 05 April 2007 17:07, Paul Hermans wrote:
> I do receive the message
> "java.lang.ClassNotFound:
> net.sf.snowball.ext.GermansStemmer".
This class is not part of the lukeall-0.7.jar, but it's in
lucene-snowball-2.1.0.jar
(which you can find on the Luke homepage). You will then need t
Hi Ratnesh,
1. There is no need to use that many question marks, really.
2. Use java-user list, not java-dev
3. You cannot delete using negative criteria. You can delete 1 Document using
its document ID, or you can delete 1 or more Documents using a Term where you
specify a field name and a val
Ivane,
Sorts will eat your memory, and how much they use depends on what you store in
them - ints, String, floats...
A profiler like JProfiler will tell you what's going on, who's eating your
memory.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com
When we use IndexModifier's docCount() method, it calls it's
underlying IndexReader's numDocs() or IndexWriter's docCount() method.
Here is the problem that IndexReader.numDocs() cares about deleted
documents, but IndexWriter.docCount() ignores it.
So, I've made some modifications in IndexWriter.
I can only shed a little light on a couple of points, see below.
On 4/6/07, Ivan Vasilev <[EMAIL PROTECTED]> wrote:
Hi All,
I have the following problem - we have OutOfMemoryException when
seraching on the indexes that are of size 20 - 40 GB and contain 10 - 15
million docs.
When we make searc
Really, really, really get a copy of Luke. Really. Use it to open
your index and run experimental queries, especially to see
how they get rewritten (but be sure to pick the appropriate
analyzer).
Google "lucene luke". Really, really get a copy. It'll help you
make MUCH faster progress than waitin
Thanks guys!
I really really appreciate your feedback. I didn't know a "simple" problem
like People name matching would be this complicated. I knew there will be
some unusual circumstances or rules, but I did not realize how much work has
been done to solve parts of the problem (string matching
Ok, glossing over some of the details was not the best idea. ms is a
MultiSearcher in that it's something I wrote that extends MultiSearcher. And
this part I should have mentioned before, the explain method being called is
the one in org.apache.lucene.search.Searcher. So explain is
public E
I agree, SecondString was helpful to me. Also have a look at William
Winkler's work at the US Census. We did similar things to come up
with blocking criteria to get an initial division into duplicates,
unique and undecided. Then we refined on the undecided set. No
approach is going to b
Hi All,
I have the following problem - we have OutOfMemoryException when
seraching on the indexes that are of size 20 - 40 GB and contain 10 - 15
million docs.
When we make searches we perform query that match all the results but we
DO NOT fetch all the results - we fetch 100 of them. We also
Hi All,
I'm indexing categories with this code:
for (Category category : item.getCategories()) {
lucene_doc.add(new Field(
"CATEGORY",
category.getName(),
Field.Store.NO,
Field.Index.UN_TOKENIZED));
}
And searching using the query:
Str
I've been doing this in past couple of years, and yes we use Lucene for some
key parts of the problem.
Basically, the problem you face is on how to run extremely high recall without
compromising precision, hard!
the key problem is performance, imagine you have DB with 10Mio persons you need
to
23 matches
Mail list logo