As suggested by Ben and Akkana, I plan to write a treatise on HTML vs
plain text emails to the mail-news and editor newsgroups.  I hope to do
this in a week or so.


The whitespace problems in the HTML editor (and therefore the plain text
editor, which is a special case of the HTML editor) are a can of worms. 
I found this out, at least partially, in November.  I had thought that
maybe there was a single problem, as I think there is with N4.x composer
adding line breaks in the written HTML file, and then mistakenly
processing them into spaces, or gobbling adjacent spaces, or adding null
characters (empty rectangles in Win9.x).  These problems mainly manifest
with long hyperlinks adjacent to each other except for a single space.

Something like this may be happening with Mozilla, but I found quite a
few more problems.  

BTW, I have been programming on and off in assembly since about 1978 and
C since about 1982 - but not yet with C++.  I write with detailed
comments.  I was mad keen to fix the whitespace problems - but depth of
understanding needed to solve them would have probably taken weeks of
work to achieve, so I bailed out.


Joe Francis is on the case.  Joe, I send good vibes and clarity-energy
your way!


Here is my understanding of the problem space:

1 - The HTML file is read by one piece of code and turned into a    
    fancy data structure in memory.  This presumably is quite solid
    since it is part of the browser.

2 - The editor works on this data structure, not on the HTML itself.
    The data structure is a series of linked smaller data structures.

3 - The data structure is written out to HTML by a third piece of code.
    (Also, at least in November when I last tried, View Source writes
    it to HTML, displays it and then step 1 above is executed to 
    write back to the data structure in memory when you exit View Source
    - so this can change the data structure!)

4 - There are pesky problems with HTML on its own: 
   
    a - The equivalence of spaces and CRs (as I will call newlines).

    b - The rule that more than one space/CR will be rendered as 
        one space.

    c - Maybe uncertainty about how to add non-breaking spaces.

    d - The desire to break HTML into shorter lines with CRs when
        the file is written - and problems with interpreting
        the results.

5 - There are difficult questions of how to cope with multiple 
    whitespaces in text which is pasted into the HTML editor.  Ideally
    it should be converted into a space and then the rest as 
     .

6 - There are difficult questions about how to edit something where
    there were multiple spaces in the HTML file, the display code 
    shows them as a single space (quite rightly) and then, for instance
    you add a space to that apparent one space.

    If the space was not added, should the file be written out with 
    all the spaces intact?  (Arguably yes, since editing a file 
    without changing it should not alter the saved file.)

    If the space was added, how should the existing series of spaces be 
    treated?  As if it was one space?  As if they were spaces now to be 
    rendered?  If the latter, then what was previously a space suddenly 
    expands to many spaces.

7 - There are questions when editing about inserting spaces and 
    perhaps other things in places where there are multiple whitespace
    characters: CR space,     etc. - or whatever their 
    equivalents are in the data structure.

8 - There is also something called a "moz-break" which I think is some
    kind of Mozilla specific placeholder within the data structure 
    for various good purposes, but which is not written out to the
    HTML file.  So moz-breaks, spaces, non-breaking spaces, tabs, CRs
    . . . with potential insertions and deletions at almost any point 
    . . . . . . 

I don't understand the data structure.  I don't know if there is a way
of looking at it in a debugging situation.


I look forward to this daunting tangle of problems being solved!  


  - Robin

Reply via email to