Re: [Patch] gimmick: show word frequencies in new buffer

Helge Hafting Tue, 08 Feb 2005 05:10:02 -0800

Andreas Vox wrote:

Also, make sure this thing does not get in the way of indexing whole phrases, math expressions, images and other stuff that don't show up in a wordlist. Well, I guess it doesn't, but still.
Once there is index markup in place the system should handle it conservatively.

Nice. :-) I suspected this, but tried to be careful. Sometimes a cumbersome but working solution gets replaced by an easier solution with a few missing points. I was merely making sure that didn't happen here.

Finally, and most important: Autoindexing every occurence of some index-worthy word often yields a useless index. Perhaps there are cases where such indexing is mandatory. But for an ordinary book the requirement is not to index every occurence of some word, but the 1,2 or 3 most important places the word occur. Few people want to mess around with "word, 1,2,6,8,12-16, 14, 18,19,22, 25-31,36" only to discover that "word" is thorougly explained on page 14 and 26-28, and all the other references merely mention "word" briefly.
But removing index references from that list using the jump function from below should be easier than the other way round.

Maybe I don't understand. We have to go through the entire document anyway, don't we? Your way: Wel look at every occurence of every indexworthy word, which has been added to the index already. Many of them gets removed from the index, because we only want to index a given word in 2-3 places. My way: We look at every occurence of every indexworthy word, and makes a decision wether to index it at this point. Each word gets indexed 2-3 times, the rest of the time we skip to the next place.

Either way, the big job is in looking at every occurence of every indexworthy word. I believe that in my case, there is some extra work adding a few indices per word. In your case, I belive there will be more extra work, in removing quite a few indices per word.

Good point. OTOH this would have to be repeated with every index word.

Repeating over every word is necessary anyway - wther we add or remove indices.

Maybe an additional function which acts as a kind of search and replace,
maybe even with regular expressions?


Sure, that would be nice.  Make that, and it'll be instantly popular. :-)

The way of index editing I fancy right now would consist of the following functions: 1.) create an initial index from all words minus stop words. Insert the index references and open an index buffer with the alphabetically sorted list.

Suggestion: Make the initial wordlist. Have the user prune this (because the stop-word list cannot possibly contain every word we don't want to index. It can only contain common ones like "a", "the", ...) Then proceed to the next step.

2.) Edit the index buffer: Delete entries, change the ordering, collate entries, create subitems, ...

Are you proposing an index buffer that sort of looks like an index page and do the editing there? Such a thing is very nice for adjusting the layout of the index, obviously. One or two columns? font? Heading for each letter? Layout for that header?

But I am not so sure it will be useful for the actual indexing. A word that is indexed several places may have one very important place and we want that page number to be set in itralics, for example. I think that is better done by working on the index entry (the existing index entry box that currently doesn't support the fancy stuff, but it could be made into doing that.) The reason? Only by looking at the text can I see wether this is the _important_ entry (say, the definition) or merely some case that I also want to index.

3.) Have a jump function from an entry in the index buffer to the occurence and between occurences, and back.

Well, yes, certainly useful. Be aware that a word indexed multiple times on one page only get one entryin the index.

4.) Update the text from the index buffer: delete unwanted index insets, change params of index insets, add new index insets.

How would you add a new one? Well, the "foo *see* bar" type entries would go here, but all other entries refer to some specific page. They do so because they are anchored to some part of the text - surely you know that the actual page number cannot be known inside lyx. Adding a new entry requires that one goes into the text at the approprioate point - but then it is no longer done from your index editor. It is done in the text as always.

5.) Regenerate the index buffer with existing entries, a la makeindex.
The index buffer would only be WYSIWYM of the true index: back references to index insets instead of pagenumbers, key and actual text both visible, word frequency as a hint, ...

You'll find that page ranges are partially supported already, an index entry that is repeated on several consecutive pages is automatically coalesced to a range. :-)
Yeah, but what if you want to index a whole chapter, like "Algorithms, p132--211" ? ;-)

Sure thing! Nice to have the ability without resorting to knowledge of makeindex codes.

* special markup
Now that'd be something - ability to use advanced indexing without having to type latex, or watch out for specials like "_" and so on.
I thought of index specific charstyles like "see" or "def", maybe also "emph" and "bold".

Interesting. It must be configurable though - publishers tend to have their own requirements.

Helge Hafting

Re: [Patch] gimmick: show word frequencies in new buffer

Reply via email to