Chris Hostetter wrote:
if you build a map whose keys are tokens which begin token lists for
queries, each of which is is mapped to a value which is a list of lists of
tokens, then you can make one pass over the tokens from the main text, and
"lookup" wether or not this is the potential start of something to
highlight, and if it is then check the rest of the tokens.
that proabbly didn't make a lot of sense did it?
Actually, that (and the following examples and pseudocode) puts into
words roughly what I've been churning over in my mind for a while now.
Ignoring the tokenisation which can largely be done outside of the
search algorithm, what I seem to have here is the need to for a list
utility method which would look something like...
<T> Iterable<Location> findAll(List<T> bigList,
List<List<T>> smallLists)
priority to longer or shorter matches *with the same start token* can be
done by carefully ordering the lists when you put them in the map ... but
if you don't want overlapping highlighting, and you want to give priority
to longer/shorter matches (instead of just the "first" match) then i think
you really have to do at least two passes ... in the first pass you
anotate each token with it's highlight phrase, and alow tokens to be in
multiple phrase; in the second pass you look for highlighted phrases
containing tokens which are in other highlighted phrases you that are
"better" and undo that highlighting.
Yeah. Actually, collisions like this are already a minor issue with our
existing routine. The interesting thing is that if I do highlight
something twice using the Swing APIs, it doesn't do anything terrible to
the display so I don't have to worry so much about it.
With HTML, of course, yes... I'd definitely need to filter to avoid
overlapping tags.
I will probably go with the simple HashMap<String,List<Token>> approach
first and see where that leaves me performance-wise. I think that
multiple multi-phrase queries should be relatively uncommon, so perhaps
the multi-level approach won't even be necessary.
Daniel
--
Daniel Noll
Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax: (02) 9212 6902
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]