On Fri, 10 Jul 2015, Roxana Danger wrote:
Hello,
I am trying to construct a custom PythonTokenizer (see above), but I
am getting the error: "attribute 'reader' of 'Tokenizer' objects is not
readable" when accessing to it in reset class.
reader is a protected member in Tokenizer, I was supposing it to be
exposed through PythonTokenizer, and it is passed to the super class in the
constructor. Am I wrong?
You're right but there is no accessor for the reader object stored on the
Java side that makes it usable from the Python side.
You can either:
- add a getReader() method to the PythonTokenizer Java class that returns
it (and rebuild PyLucene after 'make clean')
- store the 'input' variable that is passed to your constructor on the
Python side, on your ComposerTokenizer instance. That 'input' is the
reader (at least, it's passed on to the Tokenizer Java class)
The first option is probably safer as it doesn't assume that
Tokenizer(reader) is not changing it in some way before storing it.
Andi..
Thanks, best regards,
Roxana
class ComposerTokenizer(PythonTokenizer):
def __init__(self, input):
PythonTokenizer.__init__(self, input)
self.reset()
def incrementToken(self):
if self.index < len(self.finaltokens):
self.clearAttributes()
offsetAttr = OffsetAttributeImpl()
offsetAttr.setOffset( ... )
self.index = self.index + 1
return True
else:
return False
def reset(self):
s = ''
ch = self.reader.read()
while ch <> -1:
s = s + ch
ch = self.reader.read()
self.index = 0
self.finalTokens = ... #processing s to extract
self.finaltokens
<http://www.reed.co.uk/lovemondays>