markharw00d wrote:
Swing supports HTML and will do the highlight for you.
SwingText="<html>"+highlighter.getBestFragment(tokenStream,text)+"</html>";

If you don't like that approach and really do just want to just know the positions, plug in your own "Formatter" class which, instead of marking up the text, silently records the hit position information provided to it in the "TokenGroup" class and then return the original string without adding any markup. TokenGroup handles the issue of identifying runs of overlapping tokens for you.

Swing's HTML renderer is unfortunately too slow for our use (it took something like 10 seconds to load and display a 100kB document with highlights.) It's pretty ugly, too. Maybe that will change in version 6.0, though.

The text renderer has a distinct advantage of being relatively fast for that size, but also the highlighting can be done after the text is displayed and even in the background, which is a huge benefit.

The way I would be able to use the existing highlighter would probably be to make a custom Formatter which takes a Swing Highlighter object and does the highlighting from there, and then run the highlighter in a new thread after the text is already displayed.

But...

Hoss, your psuedo code looked like a solution for identifying phrase queries. Lack of proper support for phrase queries is a known issue with the current highlighter but I thought the primary issue in question here was speed?

Actually, we do need support for phrase queries (which pretty much rules out the existing highlighter code) but slop isn't as important.

Not sure this helps for non-phrase queries.

Indeed, the speed will be roughly identical for non-phrase queries since lookups in a HashMap versus a HashSet would be pretty much identical.

Also, I don't think hitting the index to work out what terms were a hit for the doc in question in order to shorten the list of terms to highlight is likely to speed up things. If anything, the extra disk IO is likely to slow it down.

That's a good point, particularly in the case of small documents.

We actually limit ourselves to 100k display at the moment because even JTextArea gets pretty inefficient once you get up to 1MB text sizes. The nasty part is that it can't load the text itself in the background -- the text component remains completely white until the entire text is present. A custom Document implementation may be able to work around some of that, but the last time I tried it, there was some flicker each time more text was appended. A completely custom component is probably the way to go. :-)

Daniel


--
Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to