Re: proposed change to CharTokenizer

2010-10-17 Thread Michael Sokolov
OK - no responses to this, but in case you were curious...the patch I suggested won't work - so please don't install it :) In the end I was able to get the behavior I wanted by fiddling with offsets in my CharFilter, but it requires detecting token boundaries in the CharFilter stage, which se

proposed change to CharTokenizer

2010-10-14 Thread Mike Sokolov
Background: I've been trying to enable hit highlighting of XML documents in such a way that the highlighting preserves the well-formedness of the XML. I thought I could get this to work by implementing a CharFilter that extracts text from XML (somewhat like HTMLStripCharFilter, except I am us