Andreas Vox wrote:
Am 08.02.2005 um 14:01 schrieb Helge Hafting:
Andreas Vox wrote:
But removing index references from that list using the jump function from below
should be easier than the other way round.
Maybe I don't understand. We have to go through the entire document anyway,
don't we?
No, I'd use the jump function just to check the context. If I know the conhtext already
That's the part I don't really get. How could you possibly know the context for any
word in that list? Do you plan on showing some context for every word in your list?
I can select the references in the index buffer I don't want and delete them. If I don't
like a word, I can delete the whole line with all references.
I have the impression that this thing will make the indexing job nicer, but the current suggestion
seems to imply that most of the work will lie in *removing* lots and lots of not indexworthy words,
and then lots and lots of indexworthy words because I don't want to index anywhere near every
occurence of the indexworthy words.
By all means, build that word list with frequencies and every word in the document. But I think
we'll be better off if the default choice is that the words are just listed, but no index entry added for
any of them. The user should explicitly mark (or click, or whatever) the words to go in the index,
because that will less work than removing all the words that should *not* go in the index.
I am sure that even with a nice list of stopwords (who shuold make those for all languages), the
majority of words found will not go in the index. Some will, and having them in a list with
a nice jump function to look at context will be immensely helpful. But still, no words should actually
go into the index unless selected by the author.
Helge Hafting