On Wed, 03 Oct 2007 03:57:56 +0200
Dov Feldstern <[EMAIL PROTECTED]> wrote:

> Hi!
> 
> This is an email I started writing a couple of months ago, regarding the 
> ignore-spellcheck discussion; but it is even more relevant now with 
> reference to the questions being raised about character styles as insets.
> 
> I agree very much with what JMarc has been saying about this issue: 
> although I like very much the idea of character styles / logical markup, 
> I don't think that insets are the right paradigm for implementing this.
> 
> I will try to articulate here when, in my opinion, an inset is 
> appropriate and when it is not. I can't provide hard and fast rules, but 
> here are a few questions which could be asked about a given piece of 
> text, and which I think could help us clarify whether or not an inset is 
> appropriate. (A lot of the questions may actually
> be asking the same thing in different forms; but then that's to be
> expected, since if I'm correct, then they are all describing 
> "inset-ness" versus "non-inset-ness"). Based on each question, I will 
> try to evaluate the status of various existing insets, as well as that 
> of Character Styles. And I will also try to explain for each question 
> why I think it describes "inset-ness".
> 
> 1) Would the sentence still make sense if the text in question were 
> replaced with a "black box"? Or put slightly differently: if a reader 
> were to read the sentence, but instead of seeing the text in question, 
> he would only know that "text of type X" belongs here, would the reader 
> still get the gist of the sentence --- perhaps missing some details, but 
> understanding th basic "template" of the sentence? If yes --- the text 
> belongs in an inset. If no --- it does not.
> 
> Note that for virtually all the insets which currently exist in LyX, the 
> answer to this question is clearly "yes": almost all of the existing 
> insets (footnote, note / comment, reference, ...) are not a main part of 
> the sentence at all, and the sentence would be perfectly readable 
> without the text in question altogether. The only case which could 
> perhaps be borderline is a mathematical expression; but even in this 
> case, I contend that the omission of the contents of the inset would not 
> change the overall meaning of the sentence. OTOH, in the case of 
> character styles, replacing it's contents with just a "emph text here" 
> message would almost certainly leave us with a grammatically incorrect 
> sentence, of which we could get no gist.
> 
> For example, from the following sentence I have omitted the contents of 
> the mathematical formulas and the references, leaving only the markers 
> ($ $ and \ref{}):
> 
> "An important difference in our case is that there exist measures $ $ 
> for which the set $ $ has \textit{no} largest element (see 
> Proposition~\ref{} in Section~\ref{})."
> 
> Clearly, this omission is of a whole different nature than the omission 
> of the \textit{} text would have been! With the current omissions, we're 
> still left with a more or less grammatically correct sentence; omitting 
> the \textit{} text would not have preserved this property! (Not to 
> mention the fact that in this particular case, the entire meaning of the 
> sentence would be reversed by such an omission...)
> 
> Why do I think that this question is related to "inset-ness"? Because of 
> the collapsible nature of most insets: collapsing an inset is basically 
> replacing the text in question with a black box of a known type. (Again, 
> math stands out, since it is not collapsible. And we're going to see 
> math standing out a lot. I think math is a special case, where a major 
> reason for having it as an inset is the fact that the input method  is 
> so very different from "normal" text.) The fact that we can set a 
> certain type of inset to be non-collapsible is quite beside the point: 
> it's just another indication of the fact that perhaps that type of inset 
> need not be an inset at all...

No, on the contrary: it is *the* point. A couple of months ago you
would have been correct here; not any more. Charstyles are
'Conglomerate'-type insets, where the content is always visible, and
which collapse to something that takes almost no extra screen real
estate. (And yes, having a three-block geometry might be nice -- for
multi-line insets like branches. For charstyles containing typically
just a few words it is near-irrelevant)

> 2) Does the text in question "belong to" the proposed inset / markup? If 
> the attribute which the markup is supposed to endow were to be deleted, 
> should the contents be deleted as well? If the answer is that "the 
> contents belong to the markup, and should be deleted along with it", 
> then this is an inset. If the contents exists independently of the 
> markup, and should remain intact even if the markup is removed, then 
> this is *not* an inset.
> 
> In the case of virtually all existing insets, the answer is that the 
> contents belong to the inset: if a footnote is deleted, its text should 
> not remain intact --- this would be disruptive to the main text (which 
> is why it was placed in a footnote in the first place). (Dissolve is a 
> special case, which is extremely useful at times; but it's not the norm 
> of what deleting an inset means.) OTOH, in the case of character styles, 
> the text should never be deleted along with the markup; after all, it's 
> an integral part of the original sentence. So the contents do not belong 
> to the markup, but to the containing sentence.
> 
> Why do I think this measures "inset-ness"? Because precisely one of the 
> purposes of an inset is to "encapsulate" its content. The implementation 
> of insets in the buffer reflects this: the inset is represented by a 
> single character, which can be moved around or deleted, taking all of 
> its contents with it. If we don't want that to be the case --- if we're 
> always going to want to dissolve the inset rather than to delete it with 
> its contents --- then why make it an inset in the first place? We should 
> be placing the text directly where it actually belongs in the parent 
> paragraph, and only marking it up to reflect the special attribute which 
> we want to confer upon it.
> 
> 3) What "came first": the text, or the attribute being applied to it? If 
> the text came first, this is not an inset.
> 
> This is almost exactly the same question as (2), but I feel it's worth 
> presenting it in this formulation as well, since it highlights the fact 
> that for Character Styles, all we're doing is applying an attribute to 
> already existing text (even if we start typing, then turn on \emph and 
> continue typing what we want in \emph, conceptually we are marking off 
> part of a larger sentence and giving it a special attribute). I mean --- 
> the term Logical Markup which is being used for this in the module code 
> says it as clear as day: this is *markup* of existing text! So why are 
> we not representing it that way internally?
> 
> 4) Is the attribute which the inset/markup is meant to endow necessarily
> supposed to extend to everything contained within it --- without even 
> knowing what's going to be contained in it? If yes, this should be an 
> inset; otherwise, it should not.
> 
> *Everything* inside a comment is expected to be commented out: graphics, 
> footnotes, ERT, ERT inside a caption inside a float inside the comment 
> --- everything. Same goes for a footnote: if I insert a graphic inside a 
> footnote, I expect it to appear in the footnote, not in the main text. 
> OTOH, when I mark off text as \emph, I'm not claiming that I necessarily 
> want the text inside a footnote appearing in the \emph text to itself be 
> \emph (maybe I do and maybe I don't, but I have control over that, and 
> can choose to have it either way). So the \emph-ness is not extending 
> automatically to everything contained "inside" it.

This does not argue one way or the other: we don't want the footnote to
inherit the main text's emph, whether it comes as a text attribute or
as an inset. This (the non-inheritance) is a property of the footnote,
and implemented there. Inset or not (_surrounding_ inset, that is) is
immaterial.

> Why do I think this describes "inset-ness"? Because both the GUI and the 
> internal buffer representation of an inset reflect the fact that 
> everything inside it is, well, inside it. If this is not what we mean to 
> represent --- i.e., if there may be text within a region marked off as 
> \emph which should itself not be \emph --- then we should be using an 
> internal representation which allows finer-grained control, such as that 
> provided by font attributes, and we should not be displaying it to the 
> user "inside" the \emph.
> 
> -------------------------------------------
> 
> A separate support for this position can be found, I think, by the fact 
> --- which we all agree upon --- that we're going to have to make some 
> changes (or some have already been made) to insets, in order for them to 
> be able to provide a good, usable solution for Character Styles: 
> displaying or not displaying a label; 3-box-model; toggling on/off; etc. 
> But if we're going to need to do things which make the insets behave 
> less and less inset-like, doesn't this seem to indicate that perhaps we 
> shouldn't be using an inset for this in the first place?

These (the ones already made) are not 'hacks' by any measure. You have
a stereotyped image of what a (collapsable) inset should be, and it
doesn't correspond to what they are today ;-)

I mean, what is 'inset-like'? Carved in stone, where?

The reason Jean-Marc wants toggling (which would be fairly easy to
implement BTW, and some people might find useful) is because he's still
thinking inside the character attribute box. In the inset mindbox, you
dissolve and re-apply. 

> So, to be a little constructive, what do I think *is* the correct 
> paradigm for Character Styles?
> 
> I would like to see some generalization of the concept of per-position 
> attributes, such that it would be possible to define (in code, for 
> starters) a new attribute --- say, "AttributeEmph" --- which could then 
> be set for each and every position in the text.
> 
> The interface would be something like this:
> 
> GetAttribute([in] pos, [in] attribute_type, [out] attribute_value)
> SetAttribute([in] pos, [in] attribute_type, [in] attribute_value)
> 
> Where attribute_type is a subclass of some AbstractAttribute, and 
> attribute_value represents the values that the given attribute_type 
> accepts (I guess templates would be helpful for this kind of model).
> 
> If these attributes are implemented on top of the existing font 
> attributes (as I think the current thinking is, and which I think is 
> correct), then we need not change anything in the latex output methods 
> --- these would continue using the font attributes directly. OTOH, both 
> the UI and the .lyx file would not access the font attributes directly 
> anymore, rather they would only access these "higher-level" attributes, 
> and these in turn would set the actual font attributes.
> 
> I can think of two possible implementations for storing these attributes 
>   in the memory buffer:
> 1) spans --- which is how font attributes work today;
> 2) have each position in the text be represented in the buffer not by a 
> char_type, but rather by a struct which would contain, in addition to 
> the char_type, also the attribute information belonging to that position 
> (and maybe also a pointer to the inset, if it's an inset; this would be 
> an extension of what I once suggested in this thread: 
> http://permalink.gmane.org/gmane.editors.lyx.devel/88025; but this is 
> really a separate issue); and perhaps other position-specific information.
> 
> I'm not saying this is easy, I'm sure there are a million little details 
> that I haven't even considered. But (a) I *do* think that it may be 
> easier than some of the things we want to be able to do if we stick with 
> insets (toggling of character styles; 3-box-model); and (b) much more 
> importantly, I just think that the *concept* of inset is wrong; and 
> using the wrong concept is bound to cost a lot later on, because the 
> better the concepts used for coding match the "real concepts", the 
> easier it will be to handle new, currently unforeseen situations --- 
> just because the code will "behave" more closely to how the "real world" 
> it is trying to represent behaves.

This is certainly doable (shudder; I had actually a plan ready for this
years ago, never got around to doing it :). But why bother? It's
duplicating most of what we already have for insets. (And remember
saving / loading this info to / from layout / module files.)

- Martin

Reply via email to