Hi Andi, Thank you very much. I will use the first solution. Best regards. Roxana
On 10 July 2015 at 12:00, Andi Vajda <va...@apache.org> wrote: > > On Fri, 10 Jul 2015, Roxana Danger wrote: > > Hello, >> I am trying to construct a custom PythonTokenizer (see above), but I >> am getting the error: "attribute 'reader' of 'Tokenizer' objects is not >> readable" when accessing to it in reset class. >> reader is a protected member in Tokenizer, I was supposing it to be >> exposed through PythonTokenizer, and it is passed to the super class in >> the >> constructor. Am I wrong? >> > > You're right but there is no accessor for the reader object stored on the > Java side that makes it usable from the Python side. > You can either: > - add a getReader() method to the PythonTokenizer Java class that returns > it (and rebuild PyLucene after 'make clean') > - store the 'input' variable that is passed to your constructor on the > Python side, on your ComposerTokenizer instance. That 'input' is the > reader (at least, it's passed on to the Tokenizer Java class) > > The first option is probably safer as it doesn't assume that > Tokenizer(reader) is not changing it in some way before storing it. > > Andi.. > > Thanks, best regards, >> Roxana >> >> class ComposerTokenizer(PythonTokenizer): >> >> def __init__(self, input): >> >> PythonTokenizer.__init__(self, input) >> >> self.reset() >> >> >> >> def incrementToken(self): >> >> if self.index < len(self.finaltokens): >> >> self.clearAttributes() >> >> offsetAttr = OffsetAttributeImpl() >> >> offsetAttr.setOffset( ... ) >> >> self.index = self.index + 1 >> >> return True >> >> else: >> >> return False >> >> >> def reset(self): >> >> s = '' >> >> ch = self.reader.read() >> >> while ch <> -1: >> >> s = s + ch >> >> ch = self.reader.read() >> >> self.index = 0 >> >> self.finalTokens = ... #processing s to extract >> self.finaltokens >> >> >> >> >> >> >> >> >> >> <http://www.reed.co.uk/lovemondays> >> >>