Re: [debian-devel] Questions regarding utf-8

2003-05-17 Thread Rüdiger Kuhlmann
>--[Andreas Metzler]--<[EMAIL PROTECTED]> > Bob Hilliard <[EMAIL PROTECTED]> wrote: > > Andreas Metzler <[EMAIL PROTECTED]> writes: > > glyphs iconv returns? My locale is C. What locale are you using? > [...] > de_AT (uses ISO-8859-1 as charset). > LANG=de_AT, everything else is unset: > *promp

Re: Questions regarding utf-8

2003-05-16 Thread Andreas Metzler
Bob Hilliard <[EMAIL PROTECTED]> wrote: > Andreas Metzler <[EMAIL PROTECTED]> writes: >> *prompt* echo ö§ | recode latin1..ascii >> "oSS >> *prompt* echo ö§ | iconv -f latin1 -t >> ascii//TRANSLIT ; echo $? >> oe? >> -- >> »oe« is much better than »"o« and »SS« is no usable replacement

Re: Questions regarding utf-8

2003-05-16 Thread Bob Hilliard
Andreas Metzler <[EMAIL PROTECTED]> writes: > *prompt* echo ö§ | recode latin1..ascii > "oSS > *prompt* echo ö§ | iconv -f latin1 -t > ascii//TRANSLIT ; echo $? > oe? > -- > »oe« is much better than »"o« and »SS« is no usable replacement for > »§« (I do not think there is one), it wou

Re: Questions regarding utf-8

2003-05-16 Thread Matthias Urlichs
Hi, John Darrington wrote: > Given a text file, it will attempt to guess the natural language in > which it was written. I'm sure it would be fairly simple to modify it to > guess the charset. If you point me to a reasonably large set of example > files, I'll see what I can do. You could use you

Re: Questions regarding utf-8

2003-05-15 Thread era eriksson
On Fri, 09 May 2003 02:31:43 +0200, Martin v. Löwis wrote: > Bob Hilliard wrote: > > 1. How can I determine what character encoding is used in a > > document without manually scanning the entire file? First off, for the examples you mentioned (foldoc and the jargon file) the iso-8859-

Re: Questions regarding utf-8

2003-05-15 Thread John Darrington
I have a neural net program ( http://www.nongnu.org/libann/doc/libann_6.html#SEC26 ) which does something similar: Given a text file, it will attempt to guess the natural language in which it was written. I'm sure it would be fairly simple to modify it to guess the charset. If you point me to a

Re: Questions regarding utf-8

2003-05-15 Thread Andreas Metzler
Bob Hilliard <[EMAIL PROTECTED]> wrote: > Thanks to all who replied to my recent question on this subject. > Andreas Metzler <[EMAIL PROTECTED]> wrote: >> With glibc I'd use >> iconv --from=SRC-ENCODING --to=DST-ENCODING//TRANSLIT >> if it is acceptable to change the length of strings. This

Re: Questions regarding utf-8

2003-05-14 Thread Bob Hilliard
Thanks to all who replied to my recent question on this subject. Andreas Metzler <[EMAIL PROTECTED]> wrote: > With glibc I'd use > iconv --from=SRC-ENCODING --to=DST-ENCODING//TRANSLIT > if it is acceptable to change the length of strings. This will replace > e.g. the Euro-Symbol with "