Re: Sorting

2006-07-29 Thread Jason Calabrese
One fast way to make an alphabetic sort very fast is to presort your docs before adding them to the index. If you do this you can then just sort by index order. We are using this for a large index (1 million+ docs) and it works very good, and seems even slightly faster than relevance sorting.

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
We only display the 10 hits at a time, so we don't need to iterate through all the hits. It feels like there should be a way to pull a document out 1 index and stick it into an other and bring all the unstored fields along with it. On Friday 07 July 2006 12:52, Erick Erickson wrote: > Did you

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
> When you say you keep your documents ordered alphabetically, it's confusing > to me. Are you saying that you pre-sort all your documents then insert them > one after another so that automatically-generated internal Lucene ID maps > exactly to the alphabetical ordering? That is, for any document I

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
All, I sent this the other day, but didn't get any responses. I'm hoping that it was just missed, so I'm trying again. There has to be a better way to to insert a document in to an index then reindexing everything. --Jason On Wednesday 05 July 2006 5:06 pm, Jason Calabre

Inserting a document into an index at a specified position

2006-07-05 Thread Jason Calabrese
All, For performance reasons we keep our index of over a million documents ordered alphabeticaly. This way for an alpha sort we can just use the index order. This works very good, but I'm now looking for a way to insert a single document to the index in the correct position. Is there any s

Re: Stemming terms in SpanQuery

2006-05-02 Thread Jason Calabrese
I think the best way to tokening/stem is to use the analyzer directly. for example: TokenStream ts = analyzer.tokenStream(field, new StringReader(text)); Token token = null; while ((token = ts.next()) != null) { Term newTerm = new Term(field, token.termTe

Re: Getting count of documents matching a query?

2006-04-07 Thread Jason Calabrese
I just wrote some simple code to test this. For my test I ran the test with 3 queries: - A 3 term boolean - A single term query with over 5000 hits - A single term query with 0 hits For each query I ran the ran 4 tests of 10,000 searches: 1) using hits.length to get the counts and the standard si

Re: De-duping MultiSearcher results

2005-11-14 Thread Jason Calabrese
Maybe I'm missing something simple, but I don't see how this will work. It looks like this filter will just filter out documents that don't have guid field, but in my case every document has a guid. In a single index there are no duplicates. Duplicates are only a problem when I search multip

De-duping MultiSearcher results

2005-11-14 Thread Jason Calabrese
All, In the project I'm working on we have a separate index for each database. There are 12 databases now. but in the future there may be as many as 20. They all have their own release cycle so I don't want to merge the indexes. The databases all have some overlap between them. We manage thi