Actually, I think Arno is correct, but it's a bit more complex than 
that:

The entities conversion depend strictly on the local character set.  
That is, each character set *may* map differently (as Arno just 
discovered for the "cent" character between CP-1252 and CP-1251); there 
is no "universal" conversion, that is, because the entities represent 
semantically equivalent characters in differing representations, not 
specific character codes.

For this reason, the best solution is usually to use Unicode (UTF-8) in 
HTML output.  If you specify UTF-8 as the content character set in the 
HTML header, then you only need to encode as entities the 
metacharacters:  ampersand, non-breaking-space, and left- and 
right-angled brackets.

As for HttpSrv.TextToHtmlText() method, it should take the content 
character set into consideration.  However, if the mappings are too 
different, maintaining many different tables may not be practical.

        dZ.

On Oct 9, 2008, at 05:09, Arno Garrels wrote:

> Francois Piette wrote:
>>> Or am I missing something?
>>
>> I think so. Using "html entities" make sure the correct character is
>> represented whatever the character set or character code is used by
>> the browser.
>
> That's correct, but the server maps the wrong HTML entities if it 
> doesn't run
> in a locale that uses CP 1252!
>
> For example:
> Currently  char #162 is hard coded to represent the cent sign:
> HTML Entity: 'cent'   , { #162 cent sign                               
>           }
>
> In windows-1251 however #162 maps to the small kyrillic letter U 
> (short).
>

-- 
        DZ-Jay [TeamICS]
        http://www.overbyte.be/eng/overbyte/teamics.html

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Reply via email to