Re: multiple small indexes or one big index?

2011-06-02 Thread Alexander Rosemann
Alright. With all the changes you suggested I am down from 9s to <1s. Again, many thanks to both of you Erick and Shai! Regards, Alex On 02.06.2011 15:48, Alexander Rosemann wrote: No worries, I'll keep that in mind now. In addition I am going to switch to another collector as well. ATM I coll

Re: Federated relevance ranking

2011-06-02 Thread Erick Erickson
My gut feel is there isn't really a good solution to intermingling the results, since they come from different sources, index different kinds of data etc. The irreducible problem is that a hit in one index is not comparable to a hit in another, either from a Lucene scoring perspective or from the u

Re: Federated relevance ranking

2011-06-02 Thread Clint Gilbert
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you very much for your reply. Yeah, our indexes (indices?) contain different types and amounts of data. :( The data being indexed is all the same format - RDF - but it describes different numbers and kinds of things. What is your gut feeling on

Re: Federated relevance ranking

2011-06-02 Thread Erick Erickson
As you've found out, raw scores certainly aren't comparable across different indexes #unless# the documents are fairly distributed. You're talking large indexes here, so if the documents are balanced across all your indexes, the results should be pretty comparable. This pre-supposes that the indexe

Re: boosting fields

2011-06-02 Thread Erick Erickson
Have you tried using the explain method on a Searcher and examining the results? Best Erick On Thu, Jun 2, 2011 at 3:51 PM, Clemens Wyss wrote: > I have a minimal unit test in which I add three documents to an index. The > documents have two fields "year" and "descritpion". > doc1(year = "2007"

Federated relevance ranking

2011-06-02 Thread Clint Gilbert
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi everyone, I searched the list archives, but couldn't find a question that closely matches mine. The project I'm working on is designed to allow searching a distributed collection of data repositories. Currently, we index each repository to build

boosting fields

2011-06-02 Thread Clemens Wyss
I have a minimal unit test in which I add three documents to an index. The documents have two fields "year" and "descritpion". doc1(year = "2007", desc = "text with 2007 and 2009") doc2(year = "2009", desc = "text with 2007 and 2009") doc3(year = "2008", desc = "text with 2007 and 2009") To searc

Re: multiple small indexes or one big index?

2011-06-02 Thread Alexander Rosemann
No worries, I'll keep that in mind now. In addition I am going to switch to another collector as well. ATM I collect the results and then sort them using the std. Collections.sort approach... I have to look what Lucene offers and switch to something else. Thanks, Alex On 02.06.2011 15:36, Eri

Re: multiple small indexes or one big index?

2011-06-02 Thread Erick Erickson
Sounds good, just be sure to keep your (now single) searcher open! Also, be sure to measure queries after a while. The first few queries will fill up caches etc, so the time should improve after the first few. Best Erick On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann wrote: > Hi Erick, cachi

Re: multiple small indexes or one big index?

2011-06-02 Thread Alexander Rosemann
Hi Erick, caching the IndexSearchers didn't took too much effort and decreased searching already by 30%! I am busy changing the code to use a single index as you suggested atm. Still a few things left to be done but once I have it working I let you know how much faster it is for me. Thanks,

Re: multiple small indexes or one big index?

2011-06-02 Thread Erick Erickson
At this size, really consider going to a single index. The lack of administrative headaches alone is probably well worth the effort I almost guarantee that the time you spend re-writing things to keep the searchers open (and finding the bugs!) will be far more than just putting all the data in

Re: multiple small indexes or one big index?

2011-06-02 Thread Alexander Rosemann
Many, many thanks for the input. I have applied the little change of not closing the searchers each time and search times dropped already by half! I'll try to merge all indexes into a single one next. I'll let you know how that went. On 02.06.2011 05:28, Shai Erera wrote: All indexes toget