I have improved date-sorted searching performance pretty dramatically by
replacing the two step "search then sort" operation with a one step "use the
date as the score" algorithm. The main gotcha was making sure to not affect
which results get counted as hits in boolean searches, but overall I onl
I thought I understood phrases and slop until one of my coworkers
brought by the following example
For a document that contains
"quick brown fox"
"quick brown fox"~0
"quick fox brown"~2
"fox quick brown"~3
all match.
I would have expected "fox quick brown" to require a 4 instead of a 3,
two to
My firm uses a parser based on javax.xml.stream.XMLStreamReader to
break (english and nonenglish) wikipedia xml dumps into lucene-style
"documents and fields." We use wikipedia to test our
language-specific code, so we've probably indexed 20 wikipedia dumps.
- andy g
On Dec 11, 2007 9:35 PM, Oti
Currently our application implements a score cutoff by iterating through the
hits and then stopping once it reaches a hit whose score is below our
threshold. We'd like to optimize this (and avoid looking at the entire hits
when we don't need to) by having the score cutoff applied when the hits ar
My approach to dealing with these kinds of issues (which has worked well for
me thus far) is:
- Run java with -XX:+HeapDumpOnOutOfMemoryError command-line option
- use jhat to inspect the heap dump, like so:
$ /usr/java/jdk1.6/bin/jhat ./java_pid1347.hprof
jhat will take a while to parse the hea
For my application we have several hundred indexes, different subsets
of which are searched depending on the situation. Aside from not
upgrading to lucene 1.9, or making a big index for every possible
subset, do you have any ideas for how can we maintain fast
performance?
- andy g
On 4/26/06, Da
Hi,
In my project we've been using the Searcher.search(query, filter, sort)
method to gather results. But as it turns out, sometimes we just want all of
the documents that match with the filter, sorted by the sort field. Does
anyone know a query that returns all the documents in the index, so that