Hi,

I am trying to understand PyLucene more and to see if it is faster to
retrieve result ids with java instead of with Python. The use case is
to retrieve millions of recids -- with python, 700K ids takes about
1.5s. (even if query takes just fraction of that).

I wrote a simple java code (works in java) which returns array of
ints. I have wrapped it with jcc, it is visible from inside python,
but callind the static method throws InvalidArgsError (below is an
example python session)

JCC is version 2.4, built with shared mode -- the DistUtils is in a
different package than lucene (ie. not inside lucene jars). Can this
problem be similar to passing jcc-wrapped objects between different
jcc-packages? http://search-lucene.com/m/SPgeW1hDtAw1

The java class is very simple:

import org.apache.lucene.search.TopDocs;

public class DumpUtils {
        public static int[] GetDocIds(TopDocs topdocs) {
                int[] out;
                out = new int[topdocs.totalHits];
                ScoreDoc[] hits = topdocs.scoreDocs;
                for (int i=0; i < topdocs.totalHits; i++) {
                        out[i] = hits[i].doc;
                }
                return out;
        }
}

Thanks for any help/pointers,

   roman


Here is an example python session:

In [1]: import pyjama

In [2]: pyjama.initVM(pyjama.CLASSPATH)
Out[2]: <jcc.JCCEnv object at 0x00C0E1F0>

In [3]: import lucene as lu

In [4]: pyjama.DumpUtils
Out[4]: <type 'DumpUtils'>

In [5]: pyjama.DumpUtils.GetDocIds
Out[5]: <built-in method GetDocIds of type object at 0x0189E780>

In [6]:

In [7]: import newseman.pyjamic.slucene.searcher as se

In [8]: s = se.Searcher();s.open('/tmp/whisper/')

In [9]: hits = s._search(s._query('key:bo*',None), 50)

In [10]: hits
Out[10]: <TopDocs: org.apache.lucene.search.topd...@480457>

In [11]:

In [12]: pyjama.DumpUtils.GetDocIds(hits)
---------------------------------------------------------------------------
InvalidArgsError                          Traceback (most recent call last)

InvalidArgsError: (<type 'DumpUtils'>, 'GetDocIds', <TopDocs: org.apache.lucene.
search.topd...@480457>)

Reply via email to