RE: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread atawfik
Hi David, It seems that MultiSearcher is deprecated in favor of MultiReader. Have a look here . Regarding the meta search approach, you can normalize raw scores of documents. There are many ways to do that. Just search for "normalization scor

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-08 Thread Vitaly Funstein
I think I see the bug here, but maybe I'm wrong. Here's my theory: Suppose no segments at a particular commit point contain any deletes. Now, we also hold open an NRT reader into the index, which may end up with some deletes, after the commit occurred. Then, according to the following conditional

KeywordAnalyzer still getting tokenized on spaces

2014-09-08 Thread Milind
I thought I could use the KeywordTokenizer to prevent tokenizing on spaces. so I can treat some fields as a single term. But it's still tokenizing on spaces. In the code below, I'm storing a document with a serial number containing spaces. I want to treat it as a single term without having end u

RE: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread Baldwin, David
I am looking at the MultiSearcher, which seems to have been around for a while (at least since 3.0.3) and I am wondering if that will do what I want. I just looked at Lucene again and it states that it searches multiple indexes with merged results. I also see a lot of similar comments about sc

Re: Question regarding complex queries and long tail suggestions

2014-09-08 Thread Mirko Sertic
Hi@all thanks for the links. In the meantime i used a spanquery with some custom syntax highlighting to gather search suggestions. This works pretty well, except some performance issues. I use IndexReader.getTermVector to get the terms for a single document to construct my suggestion spans a

RE: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread Baldwin, David
After my last question, I am now intrigued by the alternative suggested. Defining a 'Super-Corpus' (Collection). We are using Stock Lucene (not Solr or anything else). Is there a known method already to integrate the DF for multiple collections allowing such a cross-collection DF? I thin

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-08 Thread Vitaly Funstein
UPDATE: After making the changes we discussed to enable sharing of SegmentReaders between the NRT reader and a commit point reader, specifically calling through to DirectoryReader.openIfChanged(DirectoryReader, IndexCommit), I am seeing this exception, sporadically: Caused by: java.lang.NullPoint

Re: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread Erick Erickson
I think the point got lost in the discussion. Raw scores are simply _not_ comparable from different collections. They aren't even comparable for different queries in the _same_ collection. They are _only_ relevant for ranking in the same collection within a single query. And even then raw scores d

RE: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread Baldwin, David
Would it be possible, or does anyone have any experience, in using the raw score from each separate collection to order and then after a merge come up with relevancy? -Original Message- From: atawfik [mailto:contact.txl...@gmail.com] Sent: Sunday, September 07, 2014 9:50 AM To: java-use

RE: IOExceptions during search

2014-09-08 Thread Uwe Schindler
Hi, NFS is not a filesystem that works reliable and correctly with Lucene. Please consider using another file system, preferably on local disks or (if it needs to be networked) using iSCSI or similar. MMap crushes the whole JVM easily if NFS connections drop. NIOFSDir works better, but the whol

IOExceptions during search

2014-09-08 Thread Shlomit Rosen
Hello :) We have a customer who's system keeps crashing at certain queries. At default we use Mmap directory for search as it usually gives us the best performance. Although the heap usage seemed normal, we asked them to switch to NIOFSDirectory to make sure it's not a memory issue... The sys