Thanks Erick for you insight. I'd appreciate if someone could throw more light on it.
Thanks On Tue, Nov 9, 2010 at 11:27 PM, Erick Erickson <erickerick...@gmail.com>wrote: > I'm going to have to leave answering that to people with more > familiarity with the underlying code than I have... > > That said, I'd #guess# that you'll be OK because I'd #guess# that > filters are maintained on a per-reader basis and the results > are synthesized when combined in a MultiSearcher. > > But that's all a guess.... > > Best > Erick > > On Tue, Nov 9, 2010 at 2:48 AM, Samarendra Pratap <samarz...@gmail.com > >wrote: > > > Thanks Erick, you cleared some of my confusions. But I still have a > doubt. > > > > As you can see in previous example code I am re-creating parallel multi > > searcher for each search. (This is the actual scenario on production > > servers) > > The ParallelMultiSearcher constructor is taking different combination of > > searchers each time. It means that the same document may be assigned a > > different docid for next search. > > > > So my primary question is - Will the cached results from a filter created > > with one multi searcher, work fine with another multi searcher? > (Underlying > > IndexSearchers are opened only once. It is combination of IndexSearchers > > which is varying for each search.) > > > > I have tested it with my real code and sample indexes and it gives me a > > feeling that results are correct, but I am not able to understand how, > > given > > my above confusion. > > > > Can you suggest me a something with another curiosity - Which option will > > be > > more efficient - 1. MultiSearchers (either recreating for each search or > > reusing cached ones) with different searchers or 2. having a single index > > for all last update date criteria and using filters for different > > combinations of last update dates. > > As I wrote in my previous mail we have different physical indexes based > on > > different ranges of update dates. We select appropriate indexes based on > > the > > user selected options. > > > > On Tue, Nov 9, 2010 at 4:25 AM, Erick Erickson <erickerick...@gmail.com > > >wrote: > > > > > Ignore my previous, I thought you were constructing your own filters. > > What > > > you're doing should > > > be OK. > > > > > > Here's the source of my confusion. Each of your indexes has Lucene > > > document > > > IDs starting at > > > 0. In your example, you have two docs/index. So, if you created a > Filter > > > via > > > lower-level > > > calls, it could not be applied across different indexes. See the > > discussion > > > here: > > > http://www.gossamer-threads.com/lists/lucene/java-user/106376. That > is, > > > the bit in your Filter for index0, doc0 would be the same bit as in > > index1, > > > doc0. > > > > > > But, that's not what you are doing. The (Parallel)MultiSearcher takes > > > care of mapping these doc IDs appropriately for you so you don't have > to > > > worry about > > > what I was thinking about. Here's a program that illustrates this. It > > > creates > > > three RAMDirectories then dumps the Lucene doc ID from each. Then it > > > creates > > > a multisearcher from the same three dirs and walks that, dumping the > > Lucene > > > doc ID. > > > You'll see that the doc IDs change even though the contents are the > > > same.... > > > > > > Again, though, this isn't a problem because you are using a > > MultiSearcher, > > > which > > > takes care of this for you. > > > > > > Which is yet another reason to never, never, never count on lucene doc > > IDs > > > outside their context! > > > > > > Output at the end...... > > > > > > import org.apache.lucene.analysis.Analyzer; > > > import org.apache.lucene.analysis.standard.StandardAnalyzer; > > > import org.apache.lucene.document.Document; > > > import org.apache.lucene.document.Field; > > > import org.apache.lucene.index.IndexWriter; > > > import org.apache.lucene.search.*; > > > import org.apache.lucene.store.Directory; > > > import org.apache.lucene.store.RAMDirectory; > > > import org.apache.lucene.util.Version; > > > > > > import java.io.IOException; > > > > > > import static org.apache.lucene.index.IndexWriter.*; > > > > > > public class EoeTest { > > > public static void main(String[] args) { > > > EoeTest eoe = new EoeTest(); > > > eoe.doIt(); > > > } > > > private void doIt() { > > > try { > > > populateIndexes(); > > > searchAndSpit(); > > > tryMulti(); > > > } catch (Exception e) { > > > e.printStackTrace(); > > > } > > > > > > } > > > > > > private Searcher getMulti() throws IOException { > > > IndexSearcher[] searchers = new IndexSearcher[3]; > > > searchers[0] = new IndexSearcher(_ram1, true); > > > searchers[1] = new IndexSearcher(_ram2, true); > > > searchers[2] = new IndexSearcher(_ram3, true); > > > return new MultiSearcher(searchers); > > > } > > > private void tryMulti() throws IOException { > > > searchOne("multi ", getMulti()); > > > } > > > > > > private void searchAndSpit() throws IOException { > > > searchOne("ram1", new IndexSearcher(_ram1, true)); > > > searchOne("ram2", new IndexSearcher(_ram2, true)); > > > searchOne("ram3", new IndexSearcher(_ram3, true)); > > > } > > > private void searchOne(String which, Searcher is) throws IOException > { > > > log("dumping " + which); > > > TopDocs hits = is.search(new MatchAllDocsQuery(), 100); > > > for (int idx = 0; idx < hits.scoreDocs.length; ++idx) { > > > ScoreDoc sd = hits.scoreDocs[idx]; > > > Document doc = is.doc(sd.doc); > > > log(String.format("lid: %d, content: %s", sd.doc, > > > doc.get("content"))); > > > } > > > is.close(); > > > } > > > private void log(String msg) { > > > System.out.println(msg); > > > } > > > private void populateIndexes() throws IOException { > > > popOne(_ram1); > > > popOne(_ram2); > > > popOne(_ram3); > > > } > > > > > > private void popOne(Directory dir) throws IOException { > > > IndexWriter iw = new IndexWriter(dir, _std, > > MaxFieldLength.LIMITED); > > > Document doc = new Document(); > > > doc.add(new Field("content", "common " + > > > Double.toString(Math.random()), Field.Store.YES, Field.Index.ANALYZED, > > > Field.TermVector.YES)); > > > iw.addDocument(doc); > > > > > > doc = new Document(); > > > doc.add(new Field("content", "common " + > > > Double.toString(Math.random()), Field.Store.YES, Field.Index.ANALYZED, > > > Field.TermVector.YES)); > > > iw.addDocument(doc); > > > > > > iw.close(); > > > } > > > > > > > > > Directory _ram1 = new RAMDirectory(); > > > Directory _ram2 = new RAMDirectory(); > > > Directory _ram3 = new RAMDirectory(); > > > Analyzer _std = new StandardAnalyzer(Version.LUCENE_29); > > > } > > > > > > ************************************output**************** > > > where lid: ### is the Lucene doc ID returned in scoreDocs > > > *********************************************************** > > > > > > dumping ram1 > > > lid: 0, content: common 0.11100571422470962 > > > lid: 1, content: common 0.31555863707233567 > > > dumping ram2 > > > lid: 0, content: common 0.01235509997022377 > > > lid: 1, content: common 0.7017712652104814 > > > dumping ram3 > > > lid: 0, content: common 0.9472403989314128 > > > lid: 1, content: common 0.7105628402082196 > > > dumping multi > > > lid: 0, content: common 0.11100571422470962 > > > lid: 1, content: common 0.31555863707233567 > > > lid: 2, content: common 0.01235509997022377 > > > lid: 3, content: common 0.7017712652104814 > > > lid: 4, content: common 0.9472403989314128 > > > lid: 5, content: common 0.7105628402082196 > > > > > > > > > > > > > > > On Mon, Nov 8, 2010 at 3:33 AM, Samarendra Pratap <samarz...@gmail.com > > > >wrote: > > > > > > > Hi Erick, Thanks for the reply. > > > > Your answer have puzzled me more because what I am able to view is > not > > > > what you say or I am not able to grasp your meaning. > > > > I have written a small program which is exactly what my original > > > question > > > > was. Here I am creating a CachingWrapperFilter on one index and > reusing > > > it > > > > on other indexes. This single filter gives me results as expected > from > > > each > > > > of the index. I will appreciate if you can throw some light. > > > > > > > > I have given the output after the program ends > > > > > > > > > > > > > > > > > > //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// > > > > // following program is compiled with java6 > > > > > > > > import org.apache.lucene.index.*; > > > > import org.apache.lucene.analysis.*; > > > > import org.apache.lucene.analysis.standard.*; > > > > import org.apache.lucene.search.*; > > > > import org.apache.lucene.search.spans.*; > > > > import org.apache.lucene.store.*; > > > > import org.apache.lucene.document.*; > > > > import org.apache.lucene.queryParser.*; > > > > import org.apache.lucene.util.*; > > > > > > > > import java.util.*; > > > > > > > > public class FilterTest > > > > { > > > > protected Directory[] dirs; > > > > protected Analyzer a; > > > > protected Searcher[] searchers; > > > > protected QueryParser qp; > > > > protected Filter f; > > > > protected Hashtable<String, Filter> filters; > > > > > > > > public FilterTest() > > > > { > > > > // create analyzer > > > > a = new StandardAnalyzer(Version.LUCENE_29); > > > > // create query parser > > > > qp = new QueryParser(Version.LUCENE_29, "content", a); > > > > // initialize "filters" Hashtable > > > > filters = new Hashtable<String, Filter>(); > > > > } > > > > > > > > protected void createDirectories(int length) > > > > { > > > > // create specified number of RAM directories > > > > dirs = new Directory[length]; > > > > for(int i=0;i<length;i++) > > > > dirs[i] = new RAMDirectory(); > > > > } > > > > > > > > protected void createIndexes() throws Exception > > > > { > > > > /* create indexes for each directory. > > > > each index contains two documents. > > > > every document contains one term, unique across all indexes, one term > > > > unique across single index and one term common to all indexes > > > > */ > > > > for(int i=0;i<dirs.length;i++) > > > > { > > > > IndexWriter iw = new IndexWriter(dirs[i], a, true, > > > > IndexWriter.MaxFieldLength.LIMITED); > > > > > > > > Document d = new Document(); > > > > // unique id across all indexes > > > > d.add(new Field("id", ""+(i*2+1), Field.Store.YES, > > > > Field.Index.NOT_ANALYZED, Field.TermVector.YES)); > > > > // unique id in a single indexes > > > > d.add(new Field("docnumber", "1", Field.Store.YES, > > > > Field.Index.NOT_ANALYZED, Field.TermVector.YES)); > > > > // common word in all indexes > > > > d.add(new Field("content", "common", Field.Store.YES, > > > Field.Index.ANALYZED, > > > > Field.TermVector.YES)); > > > > iw.addDocument(d); > > > > > > > > d = new Document(); > > > > // unique id across all indexes > > > > d.add(new Field("id", ""+(i*2+2), Field.Store.YES, > > > > Field.Index.NOT_ANALYZED, Field.TermVector.YES)); > > > > // unique id in a single indexes > > > > d.add(new Field("docnumber", "2", Field.Store.YES, > > > > Field.Index.NOT_ANALYZED, Field.TermVector.YES)); > > > > // common word in all indexes > > > > d.add(new Field("content", "common", Field.Store.YES, > > > > Field.Index.ANALYZED, Field.TermVector.YES)); > > > > iw.addDocument(d); > > > > > > > > iw.close(); > > > > } > > > > } > > > > > > > > protected void openSearchers() throws Exception > > > > { > > > > // open searches for every directory and save it in an array > > > > searchers = new Searcher[dirs.length]; > > > > for(int i=0;i<dirs.length;i++) > > > > searchers[i] = new IndexSearcher(IndexReader.open(dirs[i], true)); > > > > } > > > > > > > > protected Searcher getSearcher(int[] arr) throws Exception > > > > { > > > > // provides a ParallelMultiSearcher instance with the searchers lying > > at > > > > index positions provided in the argument > > > > Searcher[] s = new Searcher[arr.length]; > > > > for(int i=0;i<arr.length;i++) > > > > s[i] = this.searchers[arr[i]]; > > > > > > > > return new ParallelMultiSearcher(s); > > > > } > > > > > > > > protected ScoreDoc[] search(String query, String filter, Searcher s) > > > throws > > > > Exception > > > > { > > > > Filter f = null; > > > > if(filter != null) > > > > { > > > > if(filters.containsKey(filter)) > > > > { > > > > System.out.println("Reusing filter for - " + filter); > > > > f = filters.get(filter); > > > > } > > > > else > > > > { > > > > System.out.println("Creating new filter for - " + filter); > > > > f = new CachingWrapperFilter(new > QueryWrapperFilter(qp.parse(filter))); > > > > filters.put(filter, f); > > > > } > > > > } > > > > System.out.println("Query:("+query+"), Filter:("+filter+")"); > > > > return s.search(qp.parse(query), f, 1000).scoreDocs; > > > > } > > > > > > > > public static void main(String[] args) throws Exception > > > > { > > > > FilterTest ft = new FilterTest(); > > > > ft.startTest(); > > > > } > > > > > > > > public void startTest() > > > > { > > > > try > > > > { > > > > Query q; > > > > > > > > createDirectories(3); > > > > createIndexes(); > > > > openSearchers(); > > > > Searcher s; > > > > ScoreDoc[] sd; > > > > > > > > System.out.println("==================================="); > > > > System.out.println("Fields of all the documents"); > > > > // creating searcher for all indexes > > > > s = getSearcher(new int[]{0,1,2}); > > > > // search all documents and their ids > > > > sd = search("+content:common", null, s); > > > > for(int i=0;i<sd.length;i++) > > > > { > > > > System.out.println("\tid:"+s.doc(sd[i].doc).get("id")+", > > > > docnumber:"+s.doc(sd[i].doc).get("docnumber")); > > > > } > > > > System.out.println("\n\n"); > > > > > > > > System.out.println("==================================="); > > > > System.out.println("Searching for documents in a single index. > Filter > > > > will be created and cached"); > > > > s = getSearcher(new int[]{0}); > > > > sd = search("+content:common", "docnumber:1", s); > > > > System.out.println("Hits:"+sd.length); > > > > for(int i=0;i<sd.length;i++) > > > > { > > > > System.out.println("\tid:"+s.doc(sd[i].doc).get("id")+", > > > > docnumber:"+s.doc(sd[i].doc).get("docnumber")); > > > > } > > > > System.out.println("\n\n"); > > > > > > > > System.out.println("==================================="); > > > > System.out.println("Searching for documents in a other indexes other > > > than > > > > previous search. Query and filter will be same. Filter will be > > reused"); > > > > s = getSearcher(new int[]{1,2}); > > > > sd = search("+content:common", "docnumber:1", s); > > > > System.out.println("Hits:"+sd.length); > > > > for(int i=0;i<sd.length;i++) > > > > { > > > > System.out.println("\tid:"+s.doc(sd[i].doc).get("id")+", > > > > docnumber:"+s.doc(sd[i].doc).get("docnumber")); > > > > } > > > > > > > > } > > > > catch(Exception e) > > > > { > > > > e.printStackTrace(); > > > > } > > > > } > > > > } > > > > > > > > > > > > > > //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// > > > > OUTPUT: > > > > [sa...@myserver java]$ java FilterTest > > > > =================================== > > > > Fields of all the documents > > > > Query:(+content:common), Filter:(null) > > > > id:1, docid:1 > > > > id:2, docid:2 > > > > id:3, docid:1 > > > > id:4, docid:2 > > > > id:5, docid:1 > > > > id:6, docid:2 > > > > > > > > > > > > > > > > =================================== > > > > Searching for documents in a single index. Filter will be created and > > > > cached > > > > Creating new filter for - docid:1 > > > > Query:(+content:common), Filter:(docid:1) > > > > Hits:1 > > > > id:1, docid:1 > > > > > > > > > > > > > > > > =================================== > > > > Searching for documents in indexes other than previous search. Query > > and > > > > filter will be same. Filter will be reused > > > > Reusing filter for - docid:1 > > > > Query:(+content:common), Filter:(docid:1) > > > > Hits:2 > > > > id:3, docid:1 > > > > id:5, docid:1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 3, 2010 at 7:04 PM, Erick Erickson < > > erickerick...@gmail.com > > > >wrote: > > > > > > > >> I'm assuming you're down in Lucene land. Unless somehow you've > > > >> gotten 63 separate filters when you think you only have one, I don't > > > >> think what you're doing will work. Or I'm failing to understand what > > > >> you're doing at all. > > > >> > > > >> The problem is I expect each of your indexes starts with document > > > >> 1. So your Filter is really a bit set keyed by Lucene document ID. > > > >> > > > >> So applying filter 2 to index 54 will NOT do what you want. What I > > > >> suspect you're seeing is that applying your filter is producing > enough > > > >> results from index 54 (to continue my example) to fool you into > > > >> thinking it's working. > > > >> > > > >> Try running the query with and without the filter on each of your > > > indexes, > > > >> perhaps as a control including a restrictive clause in the query > > > >> to do the same thing your filter is doing. Or construct the filter > new > > > >> for comparison.... If the numbers continue to be the same, I clearly > > > >> don't understand something! <G>.... > > > >> > > > >> Best > > > >> Erick > > > >> > > > >> On Wed, Nov 3, 2010 at 6:05 AM, Samarendra Pratap < > > samarz...@gmail.com > > > >> >wrote: > > > >> > > > >> > Hi. We have a large index (~ 28 GB) which is distributed in three > > > >> different > > > >> > directories, each representing a country. Each of these country > wise > > > >> > indexes > > > >> > is further distributed on the basis of last update date into 21 > > > smaller > > > >> > indexes. This index is updated once in a day. > > > >> > > > > >> > A user can search into any of one country and can choose last > update > > > >> date > > > >> > plus some other criteria. > > > >> > > > > >> > When the server application starts, index readers and hence > > searchers > > > >> are > > > >> > created for each of the small indexes (21 x 3) and put in an > array. > > > >> > Depending on the option (country and last update date) chosen by > > user > > > we > > > >> > pick the searchers of correct date range/country and create a new > > > >> > ParallelMultiSearcher instance. > > > >> > > > > >> > Now my question is - can I use single filter (caching filter) > > instance > > > >> for > > > >> > every search (may be on different searchers)? > > > >> > > > > >> > > > > >> > > > > > > =================================================================================== > > > >> > > > > >> > e.g > > > >> > for first search i create an filter of experience 4 years and save > > it. > > > >> > > > > >> > if another search for a different country (and hence difference > > index) > > > >> also > > > >> > has same experience criteria, i.e. 4 years, can i use the same > > filter > > > >> > instance for second search too? > > > >> > > > > >> > i have tested a little for this and surprisingly i have got > correct > > > >> > results. > > > >> > i was wondering if this is the correct way. or do i need to create > > > >> > different > > > >> > filters for each searcher (or index reader) instance? > > > >> > > > > >> > Thanks in advance. > > > >> > > > > >> > -- > > > >> > Regards, > > > >> > Samar > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Samar > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > -- > > Regards, > > Samar > > > -- Regards, Samar