Alright. With all the changes you suggested I am down from 9s to <1s.
Again, many thanks to both of you Erick and Shai!
Regards,
Alex
On 02.06.2011 15:48, Alexander Rosemann wrote:
No worries, I'll keep that in mind now.
In addition I am going to switch to another collector as well. ATM I
coll
My gut feel is there isn't really a good solution to intermingling the
results, since
they come from different sources, index different kinds of data etc.
The irreducible
problem is that a hit in one index is not comparable to a hit in
another, either from
a Lucene scoring perspective or from the u
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Thank you very much for your reply. Yeah, our indexes (indices?)
contain different types and amounts of data. :( The data being indexed
is all the same format - RDF - but it describes different numbers and
kinds of things.
What is your gut feeling on
As you've found out, raw scores certainly aren't comparable across
different indexes
#unless# the documents are fairly distributed. You're talking large
indexes here,
so if the documents are balanced across all your indexes, the results should be
pretty comparable. This pre-supposes that the indexe
Have you tried using the explain method on a Searcher and examining the results?
Best
Erick
On Thu, Jun 2, 2011 at 3:51 PM, Clemens Wyss wrote:
> I have a minimal unit test in which I add three documents to an index. The
> documents have two fields "year" and "descritpion".
> doc1(year = "2007"
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi everyone,
I searched the list archives, but couldn't find a question that closely
matches mine.
The project I'm working on is designed to allow searching a distributed
collection of data repositories. Currently, we index each repository to
build
I have a minimal unit test in which I add three documents to an index. The
documents have two fields "year" and "descritpion".
doc1(year = "2007", desc = "text with 2007 and 2009")
doc2(year = "2009", desc = "text with 2007 and 2009")
doc3(year = "2008", desc = "text with 2007 and 2009")
To searc
No worries, I'll keep that in mind now.
In addition I am going to switch to another collector as well. ATM I
collect the results and then sort them using the std. Collections.sort
approach... I have to look what Lucene offers and switch to something else.
Thanks,
Alex
On 02.06.2011 15:36, Eri
Sounds good, just be sure to keep your (now single) searcher open! Also,
be sure to measure queries after a while. The first few queries will fill up
caches etc, so the time should improve after the first few.
Best
Erick
On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
wrote:
> Hi Erick, cachi
Hi Erick, caching the IndexSearchers didn't took too much effort and
decreased searching already by 30%!
I am busy changing the code to use a single index as you suggested atm.
Still a few things left to be done but once I have it working I let you
know how much faster it is for me.
Thanks,
At this size, really consider going to a single index. The lack of
administrative headaches alone is probably well worth the effort
I almost guarantee that the time you spend re-writing things to keep
the searchers open (and finding the bugs!) will be far more than just
putting all the data in
Many, many thanks for the input. I have applied the little change of not
closing the searchers each time and search times dropped already by half!
I'll try to merge all indexes into a single one next. I'll let you know
how that went.
On 02.06.2011 05:28, Shai Erera wrote:
All indexes toget
12 matches
Mail list logo