Re: [Patch] gimmick: show word frequencies in new buffer

Andreas Vox Mon, 07 Feb 2005 08:41:48 -0800

Hi Helge!

Thanks for your comments.

Am 07.02.2005 um 13:37 schrieb Helge Hafting:

This seems useful for several purposes, but perhaps a warning about automatic index generation. The frequency is a useful statistic, but high frequency have no impact on wether to index the word. The obvious example here should be the top word on your list. :-) We index what "people might want to look up", not "all we have".

I know, I'm aiming at a kind of semi-automatic index generation. Creating an alphabetically sorted wordlist with frequencies as a hint and an easy way to navigate to the context and back would be a start. Using a stopword list would even remove all the most frequent words except "LyX" and "LaTeX" :-) So once the user prunes the list he/she would have a good start.

Also, make sure this thing does not get in the way of indexing whole phrases, math expressions, images and other stuff that don't show up in a wordlist. Well, I guess it doesn't, but still.

Once there is index markup in place the system should handle it conservatively.

Finally, and most important: Autoindexing every occurence of some index-worthy word often yields a useless index. Perhaps there are cases where such indexing is mandatory. But for an ordinary book the requirement is not to index every occurence of some word, but the 1,2 or 3 most important places the word occur. Few people want to mess around with "word, 1,2,6,8,12-16, 14, 18,19,22, 25-31,36" only to discover that "word" is thorougly explained on page 14 and 26-28, and all the other references merely mention "word" briefly.

But removing index references from that list using the jump function from below should be easier than the other way round.

I also think a function which jumps from one IndexInset to the next with the same key could be useful.

...

Perhaps the program shouldn't add the entries at all, just move from word to word and ask wether to add an entry at that point?

Good point. OTOH this would have to be repeated with every index word. Maybe an additional function which acts as a kind of search and replace, maybe even with regular expressions? The way of index editing I fancy right now would consist of the following functions: 1.) create an initial index from all words minus stop words. Insert the index references and open an index buffer with the alphabetically sorted list. 2.) Edit the index buffer: Delete entries, change the ordering, collate entries, create subitems, ... 3.) Have a jump function from an entry in the index buffer to the occurence and between occurences, and back. 4.) Update the text from the index buffer: delete unwanted index insets, change params of index insets, add new index insets. 5.) Regenerate the index buffer with existing entries, a la makeindex.

The index buffer would only be WYSIWYM of the true index: back references to index insets instead of pagenumbers, key and actual text both visible, word frequency as a hint, ...

You'll find that page ranges are partially supported already, an index entry that is repeated on several consecutive pages is automatically coalesced to a range. :-)

Yeah, but what if you want to index a whole chapter, like "Algorithms, p132--211" ? ;-)

* special markup
Now that'd be something - ability to use advanced indexing without having to type latex, or watch out for specials like "_" and so on.

I thought of index specific charstyles like "see" or "def", maybe also "emph" and "bold".

Ciao
/Andreas

Re: [Patch] gimmick: show word frequencies in new buffer

Reply via email to