Hi, I am trying to understand PyLucene more and to see if it is faster to retrieve result ids with java instead of with Python. The use case is to retrieve millions of recids -- with python, 700K ids takes about 1.5s. (even if query takes just fraction of that).
I wrote a simple java code (works in java) which returns array of ints. I have wrapped it with jcc, it is visible from inside python, but callind the static method throws InvalidArgsError (below is an example python session) JCC is version 2.4, built with shared mode -- the DistUtils is in a different package than lucene (ie. not inside lucene jars). Can this problem be similar to passing jcc-wrapped objects between different jcc-packages? http://search-lucene.com/m/SPgeW1hDtAw1 The java class is very simple: import org.apache.lucene.search.TopDocs; public class DumpUtils { public static int[] GetDocIds(TopDocs topdocs) { int[] out; out = new int[topdocs.totalHits]; ScoreDoc[] hits = topdocs.scoreDocs; for (int i=0; i < topdocs.totalHits; i++) { out[i] = hits[i].doc; } return out; } } Thanks for any help/pointers, roman Here is an example python session: In [1]: import pyjama In [2]: pyjama.initVM(pyjama.CLASSPATH) Out[2]: <jcc.JCCEnv object at 0x00C0E1F0> In [3]: import lucene as lu In [4]: pyjama.DumpUtils Out[4]: <type 'DumpUtils'> In [5]: pyjama.DumpUtils.GetDocIds Out[5]: <built-in method GetDocIds of type object at 0x0189E780> In [6]: In [7]: import newseman.pyjamic.slucene.searcher as se In [8]: s = se.Searcher();s.open('/tmp/whisper/') In [9]: hits = s._search(s._query('key:bo*',None), 50) In [10]: hits Out[10]: <TopDocs: org.apache.lucene.search.topd...@480457> In [11]: In [12]: pyjama.DumpUtils.GetDocIds(hits) --------------------------------------------------------------------------- InvalidArgsError Traceback (most recent call last) InvalidArgsError: (<type 'DumpUtils'>, 'GetDocIds', <TopDocs: org.apache.lucene. search.topd...@480457>)