On Tue, Feb 10, 2009 at 06:57:04PM +0100, Vincent Lefevre wrote:
> On 2009-02-10 18:16:43 +0200, Adrian Bunk wrote:
> > On Tue, Feb 10, 2009 at 04:33:23PM +0100, Vincent Lefevre wrote:
> > > FYI, I prefer the current one because iso-8859-1 takes less space
> > > than utf-8 (note that on the network, mail is not compressed),
> > 
> > It only makes a difference if you use non-ASCII characters AND
> > no characters outside iso-8859-1 (like the € sign) in an email.
> 
> So is the change of $send_charset. So, I suppose that these cases
> are important enough.

Having iso-8859-1 preferred over UTF-8 was a good choice back in 
2000 when $send_charset was set this way in init.h, since back then 
UTF-8 support in MUAs was not always good.

Now in 2009 that's no longer a problem.

> > And the size advantage in these cases would typically be something 
> > around 1%, so not really noticable.
> 
> This depends on the language and the length of the message.
> There's much more 1% of accented characters in French text,
> for instance.

But there are also the characters œ and Œ in French text, and they are 
not in iso-8859-1.

> So, it can be noticeable.

"noticeable" if you manually count bytes.

Even if it was 10% it wouldn't make any difference in practice (emails 
are big when someone adds a 1MB attachment, the few bytes in the email 
body hardly make any difference you notice in practice).

> > > Also, using "us-ascii:utf-8" will not affect received mail, so that
> > > if a user wants to deal with UTF-8 only, he must have some tools for
> > > charset conversion when receiving mail (and changing $send_charset
> > > would just be some minor configuration change for a specific usage).
> > 
> > As already discussed, having more charsets in the mix can cause problems 
> > when sending patches in the body of an email (e.g. when submitting 
> > patches to linux-kernel).
> 
> Well, your tools must cope with messages with different charsets in
> a mailbox (and encodings other then 7bit/8bit). If they don't, they
> are broken.
> 
> Also, this is for a specific usage. Other users may prefer iso-8859-1
> (when possible) for their specific usage. There's no default that
> would make everyone happy.

UTF-8 has the advantages compared to iso-8859-1:
- it can handle all characters in one charset
  (iso-8859-1 won't work without the fallback to UTF-8) 
- it has already become more or less the standard charset
  under Linux

And globally, it's a huge improvement that everyone is moving away from 
a gazillion different charsets to UTF-8.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed




--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to