Am 05.09.2016, 21:27 Uhr, schrieb Andi Vajda <va...@apache.org>:
On Mon, 5 Sep 2016, Dirk Rothe wrote:
A volunteer is requested to build and test PyLucene's trunk on
Windows. If noone comes forward, I intend to try to release PyLucene
6.2 in a few weeks, still.
Nice Job!
I've successfully build PyLucene 6.2 on windows. Most tests pass:
* skipped the three test_ICU* due to missing "import icu"
Yes, for this you need to install PyICU: https://github.com/ovalhub/pyicu
I'm going to assume this would work for now.
* fixed test_PyLucene.py by ignoring open file handles (os.error) in
shutil.rmtree() in Test_PyLuceneWithFSStore.tearDown()
Do you have a patch for me to apply ?
Yes, attached.
* then stuff like these in test_PythonDirectory.py
[..]
Can't make sense of this one, sorry.
* and this one in test_PythonException.py
[..]
This one could be because you may not have built JCC in shared mode ?
I vaguely remember there being a problem with proper cross-boundary
exception propagation requiring JCC to be built in shared mode.
jcc.SHARED reports True, so seems OK.
I don't think these Windows glitches are really problematic, and our
production code runs only in linux environments anyway.
And I'm more interested in whether porting around 3kloc lucene-interfaces
from v3.6 goes smoothly.
I've hit the first problematic case with an custom
PythonAnalyzer/PythonTokenizer where I don't see how to pass the input to
the Tokenizer implementation.
I thought maybe like this, but PythonTokenizer does not accept an INPUT
anymore (available in v4.10 and v3.6).
class _Tokenizer(PythonTokenizer):
def __init__(self, INPUT):
super(_Tokenizer, self).__init__(INPUT)
# prepare INPUT
def incrementToken(self):
# stuff into termAtt/offsetAtt/posIncrAtt
class Analyzer6(PythonAnalyzer):
def createComponents(self, fieldName):
return Analyzer.TokenStreamComponents(_Tokenizer())
The PositionIncrementTestCase is pretty similar but initialized with
static input. Would be a nice place for an example with dynamic input, I
think.
This was our 3.6 approach:
class Analyzer3(PythonAnalyzer):
def tokenStream(self, fieldName, reader):
data = data_from_reader(reader)
class _tokenStream(PythonTokenStream):
def __init__(self):
super(_tokenStream, self).__init__()
# prepare termAtt/offsetAtt/posIncrAtt
def incrementToken(self):
# stuff from data into termAtt/offsetAtt/posIncrAtt
return _tokenStream()
Any hints how to get Analyzer6 working?
--dirk