On Fri, 10 Jul 2015, Roxana Danger wrote:

Hello,
      I am trying to construct a custom PythonTokenizer (see above), but I
am getting the error: "attribute 'reader' of 'Tokenizer' objects is not
readable" when accessing to it in reset class.
      reader is a protected member in Tokenizer, I was supposing it to be
exposed through PythonTokenizer, and it is passed to the super class in the
constructor. Am I wrong?

You're right but there is no accessor for the reader object stored on the Java side that makes it usable from the Python side.
You can either:
  - add a getReader() method to the PythonTokenizer Java class that returns
    it (and rebuild PyLucene after 'make clean')
  - store the 'input' variable that is passed to your constructor on the
    Python side, on your ComposerTokenizer instance. That 'input' is the
    reader (at least, it's passed on to the Tokenizer Java class)

The first option is probably safer as it doesn't assume that Tokenizer(reader) is not changing it in some way before storing it.

Andi..

      Thanks, best regards,
            Roxana

class ComposerTokenizer(PythonTokenizer):

    def __init__(self, input):

          PythonTokenizer.__init__(self, input)

          self.reset()



    def incrementToken(self):

         if self.index < len(self.finaltokens):

               self.clearAttributes()

               offsetAttr = OffsetAttributeImpl()

               offsetAttr.setOffset( ... )

               self.index = self.index + 1

               return True

           else:

                return False


      def reset(self):

            s = ''

            ch = self.reader.read()

            while ch <> -1:

                  s = s + ch

                  ch = self.reader.read()

            self.index = 0

            self.finalTokens = ... #processing s to extract
self.finaltokens









<http://www.reed.co.uk/lovemondays>

Reply via email to