Hi Erick The index I am searching is lucene. I am trying to perform some operations over ALL the documents in that index. I can rebuild the index as a solr index and then use the export functionality. Up to now I've been using the lucene index searcher with custom collector. Would the below code be correct if I want to continue with lucene path?
thank you Erick public class DocIDCollector extends SimpleCollector { HashBiMap<Integer,Long> idSet = HashBiMap.create(); private Scorer scorer; private NumericDocValues ids; public boolean acceptsDocsOutOfOrder() { return true; } public void setScorer(Scorer scorer) { this.scorer = scorer; } public void doSetNextReader(LeafReaderContext reader) throws IOException{ ids = DocValues.getNumeric(reader.reader(), "id"); } public void collect(int doc) throws IOException { long wid = ids.get(doc); idSet.put(doc,wid); } public void reset() { idSet.clear(); } public HashBiMap<Integer,Long> getWikiIds() { return idSet; } } On Wed, Apr 29, 2015 at 11:32 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Hmmm, it's not clear to me whether you're using Solr or not, but if > you are have you considered using the export functionality? This is > already built to stream large result sets back to the client. And > lately (5.1), you can combine that with "streaming aggregation" to do > some pretty cool stuff. > > Not sure it applies in your situation as you didn't state the use-case > but thought I'd at least mention it. > > Best, > Erick > > On Wed, Apr 29, 2015 at 7:41 AM, Robust Links <pey...@robustlinks.com> > wrote: > > Hi > > > > I need help porting my lucene code from 4 to 5. In particular, I need to > > customize a collector (to collect all doc Ids in the index - which can be > >>30MM docs..). Below is how I achieved this in lucene 4. Is there some > > guidelines how to do this in lucene 5, specially on semantics changes of > > AtomicReaderContext (which seems deprecated) and the new > LeafReaderContext? > > > > thank you in advance > > > > > > public class CustomCollector extends Collector { > > > > private HashSet<String> data = new HashSet<String>(); > > > > private Scorer scorer; > > > > private int docBase; > > > > private BinaryDocValues dataList; > > > > > > public boolean acceptsDocsOutOfOrder() { > > > > return true; > > > > } > > > > public void setScorer(Scorer scorer) { > > > > this.scorer = scorer; > > > > } > > > > public void setNextReader(AtomicReaderContext ctx) throws IOException{ > > > > this.docBase = ctx.docBase; > > > > dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false); > > > > } > > > > public void collect(int doc) throws IOException { > > > > BytesRef t = new BytesRef(); > > > > dataList(doc); > > > > if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes != > BytesRef.EMPTY_BYTES) { > > > > data((t.utf8ToString())); > > > > } > > > > } > > > > public void reset() { > > > > data.clear(); > > > > dataList = null; > > > > } > > > > public HashSet<String> getData() { > > > > return data; > > > > } > > > > } > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >