Am 05.09.2016, 21:27 Uhr, schrieb Andi Vajda <va...@apache.org>:


On Mon, 5 Sep 2016, Dirk Rothe wrote:
A volunteer is requested to build and test PyLucene's trunk on Windows. If noone comes forward, I intend to try to release PyLucene 6.2 in a few weeks, still.

Nice Job!

I've successfully build PyLucene 6.2 on windows. Most tests pass:
* skipped the three test_ICU* due to missing "import icu"

Yes, for this you need to install PyICU: https://github.com/ovalhub/pyicu

I'm going to assume this would work for now.

* fixed test_PyLucene.py by ignoring open file handles (os.error) in shutil.rmtree() in Test_PyLuceneWithFSStore.tearDown()

Do you have a patch for me to apply ?

Yes, attached.

* then stuff like these in test_PythonDirectory.py
[..]
Can't make sense of this one, sorry.

* and this one in test_PythonException.py
[..]
This one could be because you may not have built JCC in shared mode ?
I vaguely remember there being a problem with proper cross-boundary exception propagation requiring JCC to be built in shared mode.

jcc.SHARED reports True, so seems OK.

I don't think these Windows glitches are really problematic, and our production code runs only in linux environments anyway. And I'm more interested in whether porting around 3kloc lucene-interfaces from v3.6 goes smoothly.

I've hit the first problematic case with an custom PythonAnalyzer/PythonTokenizer where I don't see how to pass the input to the Tokenizer implementation. I thought maybe like this, but PythonTokenizer does not accept an INPUT anymore (available in v4.10 and v3.6).

class _Tokenizer(PythonTokenizer):
    def __init__(self, INPUT):
        super(_Tokenizer, self).__init__(INPUT)
        # prepare INPUT
    def incrementToken(self):
        # stuff into termAtt/offsetAtt/posIncrAtt

class Analyzer6(PythonAnalyzer):
    def createComponents(self, fieldName):
        return Analyzer.TokenStreamComponents(_Tokenizer())

The PositionIncrementTestCase is pretty similar but initialized with static input. Would be a nice place for an example with dynamic input, I think.

This was our 3.6 approach:
class Analyzer3(PythonAnalyzer):
    def tokenStream(self, fieldName, reader):
       data = data_from_reader(reader)
       class _tokenStream(PythonTokenStream):
           def __init__(self):
                super(_tokenStream, self).__init__()
                # prepare termAtt/offsetAtt/posIncrAtt
           def incrementToken(self):
                # stuff from data into termAtt/offsetAtt/posIncrAtt
      return _tokenStream()

Any hints how to get Analyzer6 working?

--dirk

Reply via email to