testing PyLucene 6.2

Dirk Rothe Thu, 08 Sep 2016 01:54:07 -0700

Am 05.09.2016, 21:27 Uhr, schrieb Andi Vajda <va...@apache.org>:

On Mon, 5 Sep 2016, Dirk Rothe wrote:
A volunteer is requested to build and test PyLucene's trunk onWindows. If noone comes forward, I intend to try to release PyLucene6.2 in a few weeks, still.
Nice Job!

I've successfully build PyLucene 6.2 on windows. Most tests pass:
* skipped the three test_ICU* due to missing "import icu"
Yes, for this you need to install PyICU: https://github.com/ovalhub/pyicu


I'm going to assume this would work for now.

* fixed test_PyLucene.py by ignoring open file handles (os.error) inshutil.rmtree() in Test_PyLuceneWithFSStore.tearDown()
Do you have a patch for me to apply ?


Yes, attached.

* then stuff like these in test_PythonDirectory.py

[..]

Can't make sense of this one, sorry.

* and this one in test_PythonException.py

[..]

This one could be because you may not have built JCC in shared mode ?
I vaguely remember there being a problem with proper cross-boundaryexception propagation requiring JCC to be built in shared mode.


jcc.SHARED reports True, so seems OK.

I don't think these Windows glitches are really problematic, and ourproduction code runs only in linux environments anyway.And I'm more interested in whether porting around 3kloc lucene-interfacesfrom v3.6 goes smoothly.

I've hit the first problematic case with an customPythonAnalyzer/PythonTokenizer where I don't see how to pass the input tothe Tokenizer implementation.I thought maybe like this, but PythonTokenizer does not accept an INPUTanymore (available in v4.10 and v3.6).


class _Tokenizer(PythonTokenizer):
    def __init__(self, INPUT):
        super(_Tokenizer, self).__init__(INPUT)
        # prepare INPUT
    def incrementToken(self):
        # stuff into termAtt/offsetAtt/posIncrAtt

class Analyzer6(PythonAnalyzer):
    def createComponents(self, fieldName):
        return Analyzer.TokenStreamComponents(_Tokenizer())

The PositionIncrementTestCase is pretty similar but initialized withstatic input. Would be a nice place for an example with dynamic input, Ithink.


This was our 3.6 approach:
class Analyzer3(PythonAnalyzer):
    def tokenStream(self, fieldName, reader):
       data = data_from_reader(reader)
       class _tokenStream(PythonTokenStream):
           def __init__(self):
                super(_tokenStream, self).__init__()
                # prepare termAtt/offsetAtt/posIncrAtt
           def incrementToken(self):
                # stuff from data into termAtt/offsetAtt/posIncrAtt
      return _tokenStream()

Any hints how to get Analyzer6 working?

--dirk

testing PyLucene 6.2

Reply via email to