Tokenizer

2020-03-19 Thread Marc Jeurissen
Pylucene version: 8.1.1

Hi all,

When you have a custom tokenizer (class CustomTokenizer(PythonTokenizer)), you 
don’t seem to be able to override any method besides incrementToken (so not 
end, reset, close).

Is this correct?

Thank you very much



Met vriendelijke groeten,
Marc Jeurissen

Bibliotheek UAntwerpen
Stadscampus – Ve35.303
Venusstraat 35 – 2000 Antwerpen
marc.jeuris...@uantwerpen.be
T +32 3 265 49 71





Re: Tokenizer

2020-03-19 Thread Andi Vajda



On Thu, 19 Mar 2020, Marc Jeurissen wrote:


Pylucene version: 8.1.1

Hi all,

When you have a custom tokenizer (class CustomTokenizer(PythonTokenizer)), 
you don?t seem to be able to override any method besides incrementToken 
(so not end, reset, close).


Is this correct?


Correct, the only native method in PythonTokenizer.java meant to be 
implemented in Python is incrementToken() since that is what Tokenizer.java 
documents as being the method to extend.


This doesn't mean that you can't add your own extension points. Just edit 
PythonTokenizer.java and add more native methods you wish to implement from
python and rebuild extensions.jar and PyLucene. If you override Reset() or 
Close() you probably still want to ensure that the parent versions are 
called from your own python overrides by casting your instance to the parent 
class using its .cast_() method, using something like

  mytok.cast_(Tokenizer).reset()

Andi..



Thank you very much



Met vriendelijke groeten,
Marc Jeurissen

Bibliotheek UAntwerpen
Stadscampus ? Ve35.303
Venusstraat 35 ? 2000 Antwerpen
marc.jeuris...@uantwerpen.be
T +32 3 265 49 71