Hi! I am trying to extend "mahout lucene.vector" driver, so that it can be feeded with arbitrary key-value constraints on solr schema fields (and generate only a subset for mahout vectors, which seems to be a regular use case).
So the best (easiest) way I see, is to create an IndexReader implementation that would allow to read the subset. The problem is that I don't know the correct way to do this. Maybe, subclassing the FilterIndexReader would solve the problem, but I don't know which methods to override to get a consistent object representation. The driver code includes the following: IndexReader reader = IndexReader.open(dir, true); Weight weight; if ("tf".equalsIgnoreCase(weightType)) { weight = new TF(); } else if ("tfidf".equalsIgnoreCase(weightType)) { weight = new TFIDF(); } else { throw new IllegalArgumentException("Weight type " + weightType + " is not supported"); } TermInfo termInfo = new CachedTermInfo(reader, field, minDf, maxDFPercent); VectorMapper mapper = new TFDFMapper(reader, weight, termInfo); LuceneIterable iterable; if (norm == LuceneIterable.NO_NORMALIZING) { iterable = new LuceneIterable(reader, idField, field, mapper, LuceneIterable.NO_NORMALIZING, maxPercentErrorDocs); } else { iterable = new LuceneIterable(reader, idField, field, mapper, norm, maxPercentErrorDocs); } It creates a SequenceFile.Writer class then and writes the "iterable" variable. Do you have any thoughts on how to inject the code in a most simple way? -- View this message in context: http://lucene.472066.n3.nabble.com/Creating-an-IndexReader-for-a-subset-from-original-IndexReader-object-tp3663603p3663603.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org