Re: Ignoring text in spell-check

Dov Feldstern Tue, 14 Aug 2007 14:17:57 -0700

Mael Hilléreau wrote:

So inset-type would be a nice higher level, because it will allow meto easily do what I usually want; but we still need to account forexceptions, which inset-type can't do. (Don't say "we can have aspecial 'ignore spelling' inset": I think it will be hard to correctlyimplement the latex output method for such an inset.
It would be more simple than for branches, no?

You almost convinced me here. However, I just tried this with branchesnow, and it turns out that exactly where I expected trouble, there arein fact bugs with the interaction of the branch inset and BiDi text,*because* the branch is in an inset. Nothing major, it's a convolutedscenario that I tried, I don't really think anyone will want to try itwith branches. But nonetheless, I still contend that doing this with aninset is more complicated and error-prone than doing it with characterattributes.

What should happen in terms of latex output in this case is absolutelynothing: it should be as if the inset weren't there; but I think thatit would be hard to achieve this "nothing" in the currentarchitecture, because there are too many things which *do* happen whena new inset is started --- just look at the relevant code...)
Is that true for e.g. notes or comments as well?

I think I didn't explain myself clearly. What I meant is, I'm thinkingof text which goes like "abc def ghi", and now someone comes and says:"def" is not a word, and so decides to mark it as ignore-spelling. So heputs an inset around it, and then we have "abc [def] ghi". But theoutput of "abc [def] ghi" should look *exactly* the same as the outputof "abc def ghi". I think that so far we all agree on this.

The same thing, BTW, should hold for branches. Say that the 'def' isonly in a branch. Well, the output of that branch should look as if thetext were "abc def ghi", without the inset there. Right?

The problem is, this is not working --- even now with branches, as Ijust found out thanks to your question --- in certain cases whichinvolve Bidi text (and maybe other kinds of transitions). In otherwords, in these situations, for "abc def ghi" I get one output, and for"abc [def] ghi" (in which the branch is activated, of course) I get*different* output. So I'm not saying that we should now go andimplement branches as character attributes rather than insets (thoughthat may actually not be a bad idea...;) ); but if we're doing thisagain in another situation, I say we keep it simple this time.

Regarding character-styles, I have two half-objections to using this:(a) I'm not really sure that character styles are where the concept ofignoring the spell-checker belongs. I see character styles as a toolfor semantic markup, whereas ignore spelling is not, IMO --- althoughagree this may be debatable --- semantic, but technical. And mixingconcepts is a bad idea, even if today I can't point to a specificreason why.
Perhaps one could say that ignoring spellcheck shouldn't be confusedwith a font attribute...

Very true, I think I said as much myself last night. But I still thinkit is slightly more appropriate here than in an inset.

In the end, there's no one good reason why I prefer character attributes--- if there were such a reason, it would be easy to convince everyoneof it ;) --- but it's just a feeling of "clunkiness" that surrounds themore complicated solutions, once we start dealing with more and morecases...

(b) I'm not familiar enough with character styles, so you'll have tohelp me here: is it possible to define a character style which leaveseverything exactly as it is, and only changes a single attribute (inour case: the spelling)?
Of course.
If so, then OK (that's why this is only a half-objection ;) ); if not,though, then we'd need to theoretically double the number of characterstyles: for every existing style, we would now have one with spellingand one without...
Is embedding disallowed for them?

Right, that's actually the important question: whether or not more thanone character style can be applied to the same text. If not, then we'restill stuck...

If yes, forget about them, the specialinset could be displayed in an inlined form. I mean something like that:
          ________
This is a | word | not spellchecked.
          --------
              ____________________________________________
The end of is | phrase is not spellchecked, and its length
_______________              _____________________________
makes it larger than screen. |
------------------------------
Screen redrawing doesn't need anything more, frame and background arealready visible enough. You'd like a green underline? Then whathappens for already blue underlined text, can we stack those lines?If no, further coding will be necessary.
This is really a non-issue, Mael. Attached is a patch which will dealwith this part of the problem.(The patch assumes a Font attribute and method calledignore_spelling(), along the lines of what I suggested last night athttp://permalink.gmane.org/gmane.editors.lyx.devel/91874. To tell thetruth, I think I would prefer to have another new function, somethinglike 'paintSpecialMarkings' which would call both paintForeignMark andpaintIgnoreSpelling --- but this is just a proof-of-concept patch. Andno, don't expect it to compile! ;) ).
How can underlines overlapping be managed?

Oh, you can't see enough of the context in the patch I submitted; but ifyou'll look at the context of the patch, you'll see that all I did wasto copy the functions for language to our case; and then you can seethat for the language, the line is drawn at: y = yo_ + 1 + desc, whereasin this patch it's drawn at: y = yo_ + 2 + desc. We have to see how itlooks on screen, but really, this is not a major problem...

Then this is applied at note creation time too.
What about my old document that I created without this option?Should I "remake" all notes?
It is usually ok that new features only are available after
you get the new LyX. :-)
Ok... but it's better if you can make new features available into newdocuments. In fact, this could be possible with a per-characterapproach by clicking on a document setting (find all notes into thedoc and mark them as not spellchecked)...
Almost any way we deal with this will entail a format change, andtogether with that will be a lyx2lyx function (or XSLT?) to convert tothe new format. At that point, *if* we decide (and this is a big if,I'm not at all suer I think this is correct) that every note should bemarked as ignore-spelling, then we could do that at the time of theconversion.
Indeed. I see no difference between low and hight level approaches here.
Settings applied at inset creation time keeps
the spellchecker logic (and screen display logic) relatively simple.
Ok, but it makes creation less simple...
Indeed. It is all about "where to put the complexity/slowness".
Screen drawing should be kept fast and simple - it is slow enough
as it is already. At least on some machines.  Actually,
put an ERT in a table cell and it is usually slow enough anywhere.
We could just use insets and layouts in a way similar to the waythey're used normally. The special inset wouldn't require much moreprocessing, even though its display could differ a bit from normalinsets (e.g. lines not broken).
The main window contents gets painted often - not the
place where you want any extra complexity.
I don't understand.
It's a question of adding complexity once (when the inset is created)versus added complexity at screen painting, which happens every timethe cursor moves, i.e., all the time...
But the truth is, I'm much more worried about the next point thanabout the efficiency.
Ok.
If something extra happens at note creation time, then the delay
will be small because only that one note inset gets a treatment.
If this have to happen at spellcheck time, then we get all these delays
at the same time. Perhaps this don't matter much for efficiency,
but there is code maintainability too.
But what you propose is to replace one type-level information(nospellcheck), by two processes (one at creation time, the other atclick on a document setting), the spellchecker test being required inall cases (however, doing this test on a type base would require lesscomputations than on a word, or character base). In addition, untilnow we just considered notes. What about comments, branches andlayouts ? Degree of abstraction is just too low to deal with allthese things, and maintainability is clearly worse!! I'd prefer tomaintain 1 information, than n processes!
The point is, though, that the ignore spell check will have a veryclear, simple interface: a character attribute. When we later decidethat actually this or that situation should or should not be ignorespelling, we don't have to touch the core again (the core, in thiscase, being the spell-checker itself; the lyx buffer; the file format;the painting; etc.) --- we just have small methods here and therewhich are in a one-to-one relation with the new situation. That's muchmore maintainable than adding longer and longer lists of situations towhat I called "the core".
Hence a low-level approach is more maintainable than a high-levelapproach. I learned something today ;)

In general, I think that one of the most complicated areas in softwaredevelopment, where the most errors creep in, is the interfaces betweendifferent (sub)components. So as a general rule, keeping the interfacesas simple as possible can usually help in this respect...

Anyway, don't forget that a char-based approach won't address all needs;remember the 4 needs I mentioned in a previous post(http://permalink.gmane.org/gmane.editors.lyx.devel/91840). Only point2. is fully addressed.

Certainly. All along I've said that I like your patches, and we *will*need some sort of higher level in order to make this *easy* to use. Butonce the basis is in, I think this should be possible, and not much morecomplicated than your original patch.


Mael.

Dov

Re: Ignoring text in spell-check

Reply via email to