utf-8 is ignored as regular valid ASCII in most utilities. This is
what makes utf-8 so nice.

The main problem(1) is for utilities like for example ls and ed that
use isprint to determine if they are allowed to print a character and
print '?' or an octal escape sequence on nonprint chars. With a hacked
libc and a utf-8 version of multibyte functions as well as a few fixes
on apps solve most of these problems, gtk apps and scim will be happy
with just being able to set the locale(2).

However, advanced console applications will need the full character
support and also support in the console driver for full glitch-less
functionality. Your problem is likely 1 or 2.

2009/5/13 Toni Mueller <openbsd-m...@oeko.net>:
> Hi Otto,
>
> thanks for the quick answer.
>
> On Wed, 13.05.2009 at 10:50:37 +0200, Otto Moerbeek <o...@drijf.net> wrote:
>> On Wed, May 13, 2009 at 10:35:25AM +0200, Toni Mueller wrote:
>> > fd = open(filename_with_utf8_characters);
>> >
>> > succeed on a standard OpenBSD disk (FFS, if I'm not mistaken), using
>> > open(2) and fopen(3).
>>
>> OpenBSD does not restrict or interpret filenames in any way, apart
>> from the obvious: / and NUL are not allowed in filenames.
>
> I guess, but don't know, that NUL is not part of any UTF-8 character...
>
>> So we accept funny chars in filenames, but do nothing special with them.
>
> Ok, that sounds great for a start. It means that the user can do
> whatever he likes, in terms of weird filenames.
>
>> > I'm currently debugging a third-party application that happens to want
>> > to use UTF-8 filenames, but doesn't seem to find them, and, FWIW, the
>> > file names I get with "ls" are ISO-Latin-1 encoded, anyway.
>> I suppose hwta you are seeing depends on your terminal.
>
> Erm... I did:
>
> ls -al | od -c > ls-output.txt
>
> and looked at that to determine what was on the file system, because
> I've been bitten by weird encoding problems often enough already.
> This way I determined that the special chars were indeed Latin1
> encoded. Just saying 'ls -al' would only yield blanks in the offending
> places, and otherwise only tends to garble my display.
>
>> The kernel and base utilities encode nothing. Some utilities might
>> protect funny chars being printed on a terminal (e.g. see ls -q).
>
> Thanks for the hint.
>
>> The kernel and libc do not do any encoding or decoding. What third
>> part libs and applications do, who nows.
>
> B ;)
>
>
> Kind regards,
> --Toni++

Reply via email to