Re: Scaling out/up or a mix

2009-06-30 Thread Andy Goodell
I have improved date-sorted searching performance pretty dramatically by replacing the two step "search then sort" operation with a one step "use the date as the score" algorithm. The main gotcha was making sure to not affect which results get counted as hits in boolean searches, but overall I onl

phrases and slop

2008-08-28 Thread Andy Goodell
I thought I understood phrases and slop until one of my coworkers brought by the following example For a document that contains "quick brown fox" "quick brown fox"~0 "quick fox brown"~2 "fox quick brown"~3 all match. I would have expected "fox quick brown" to require a 4 instead of a 3, two to

Re: Indexing Wikipedia dumps

2007-12-12 Thread Andy Goodell
My firm uses a parser based on javax.xml.stream.XMLStreamReader to break (english and nonenglish) wikipedia xml dumps into lucene-style "documents and fields." We use wikipedia to test our language-specific code, so we've probably indexed 20 wikipedia dumps. - andy g On Dec 11, 2007 9:35 PM, Oti

Searching with a score cutoff

2007-06-04 Thread Andy Goodell
Currently our application implements a score cutoff by iterating through the hits and then stopping once it reaches a hit whose score is below our threshold. We'd like to optimize this (and avoid looking at the entire hits when we don't need to) by having the score cutoff applied when the hits ar

Re: How many Searches is a Searcher Worth?

2007-04-05 Thread Andy Goodell
My approach to dealing with these kinds of issues (which has worked well for me thus far) is: - Run java with -XX:+HeapDumpOnOutOfMemoryError command-line option - use jhat to inspect the heap dump, like so: $ /usr/java/jdk1.6/bin/jhat ./java_pid1347.hprof jhat will take a while to parse the hea

Re: performance differences between 1.4.3 and 1.9.1

2006-04-26 Thread Andy Goodell
For my application we have several hundred indexes, different subsets of which are searched depending on the situation. Aside from not upgrading to lucene 1.9, or making a big index for every possible subset, do you have any ideas for how can we maintain fast performance? - andy g On 4/26/06, Da

Query to return all documents in the index

2005-10-05 Thread Andy Goodell
Hi, In my project we've been using the Searcher.search(query, filter, sort) method to gather results. But as it turns out, sometimes we just want all of the documents that match with the filter, sorted by the sort field. Does anyone know a query that returns all the documents in the index, so that