Hi!

I am trying to extend "mahout lucene.vector" driver, so that it can be
feeded with arbitrary
key-value constraints on solr schema fields (and generate only a subset for
mahout vectors,
which seems to be a regular use case).

So the best (easiest) way I see, is to create an IndexReader implementation
that would allow
to read the subset.

The problem is that I don't know the correct way to do this.

Maybe, subclassing the FilterIndexReader would solve the problem, but I
don't know which
methods to override to get a consistent object representation.

The driver code includes the following:



 IndexReader reader = IndexReader.open(dir, true);

    Weight weight;
    if ("tf".equalsIgnoreCase(weightType)) {
      weight = new TF();
    } else if ("tfidf".equalsIgnoreCase(weightType)) {
      weight = new TFIDF();
    } else {
      throw new IllegalArgumentException("Weight type " + weightType + " is
not supported");
    }

    TermInfo termInfo = new CachedTermInfo(reader, field, minDf,
maxDFPercent);
    VectorMapper mapper = new TFDFMapper(reader, weight, termInfo);

    LuceneIterable iterable;

    if (norm == LuceneIterable.NO_NORMALIZING) {
      iterable = new LuceneIterable(reader, idField, field, mapper,
LuceneIterable.NO_NORMALIZING, maxPercentErrorDocs);
    } else {
      iterable = new LuceneIterable(reader, idField, field, mapper, norm,
maxPercentErrorDocs);
    }


It creates a SequenceFile.Writer class then and writes the "iterable"
variable.



Do you have any thoughts on how to inject the code in a most simple way?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Creating-an-IndexReader-for-a-subset-from-original-IndexReader-object-tp3663603p3663603.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to