Re: Detect character encoding

2005-12-05 Thread The new guy
Michal wrote: > Hello, > is there any way how to detect string encoding in Python? > > I need to proccess several files. Each of them could be encoded in > different charset (iso-8859-2, cp1250, etc). I want to detect it, and > encode it to utf-8 (with string function encode). Well, about how to

Re: Detect character encoding

2005-12-05 Thread jepler
Perhaps this project's code or ideas could be of service: http://freshmeat.net/projects/enca/ Jeff pgpYyDfS0xrTp.pgp Description: PGP signature -- http://mail.python.org/mailman/listinfo/python-list

Re: Detect character encoding

2005-12-05 Thread Kent Johnson
Martin P. Hellwig wrote: > I read or heard (can't remember the origin) that MS IE has a quite good > implementation of guessing the language en character encoding of web > pages when there not or falsely specified. Yes, I think that's right. In my experience MS Word does a very good job of gues

Re: Detect character encoding

2005-12-05 Thread Michal
Thanks everybody for helpfull advices. Michal -- http://mail.python.org/mailman/listinfo/python-list

Re: Detect character encoding

2005-12-04 Thread Martin v. Löwis
Diez B. Roggisch wrote: > So cp1250 doesn't have all codepoints defined - but the others have. > Sure, this helps you to eliminate 1 of the three choices the OP wanted > to choose between - but how many texts you have that have a 129 in them? For the iso8859 ones, you should assume that the char

Re: Detect character encoding

2005-12-04 Thread Martin v. Löwis
Martin P. Hellwig wrote: > From what I can remember is that they used an algorithm to create some > statistics of the specific page and compared that with statistic about > all kinds of languages and encodings and just mapped the most likely. More hearsay: I believe language-based heuristics ar

Re: Detect character encoding

2005-12-04 Thread François Pinard
[Diez B. Roggisch] >Michal wrote: >> is there any way how to detect string encoding in Python? >Recode might be of help here, it has such heuristics built in AFAIK. If we are speaking about the same Recode ☺, there are some built in tools that could help a human to discover a charset, but this

Re: Detect character encoding

2005-12-04 Thread Diez B. Roggisch
Mike Meyer wrote: > "Diez B. Roggisch" <[EMAIL PROTECTED]> writes: > >>Michal wrote: >> >>>is there any way how to detect string encoding in Python? >>>I need to proccess several files. Each of them could be encoded in >>>different charset (iso-8859-2, cp1250, etc). I want to detect it, >>>and enc

Re: Detect character encoding

2005-12-04 Thread skip
Martin> I read or heard (can't remember the origin) that MS IE has a Martin> quite good implementation of guessing the language en character Martin> encoding of web pages when there not or falsely specified. Gee, that's nice. Too bad the source isn't available... <0.5 wink> Skip --

Re: Detect character encoding

2005-12-04 Thread Martin P. Hellwig
Mike Meyer wrote: > "Diez B. Roggisch" <[EMAIL PROTECTED]> writes: >> Michal wrote: >>> is there any way how to detect string encoding in Python? >>> I need to proccess several files. Each of them could be encoded in >>> different charset (iso-8859-2, cp1250, etc). I want to detect it, >>> and enco

Re: Detect character encoding

2005-12-04 Thread B Mahoney
You may want to look at some Python Cookbook recipes, such as http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52257 "Auto-detect XML encoding" by Paul Prescod -- http://mail.python.org/mailman/listinfo/python-list

Re: Detect character encoding

2005-12-04 Thread Nemesis
Mentre io pensavo ad una intro simpatica "Michal" scriveva: > Hello, > is there any way how to detect string encoding in Python? > I need to proccess several files. Each of them could be encoded in > different charset (iso-8859-2, cp1250, etc). I want to detect it, and > encode it to utf-8 (with

Re: Detect character encoding

2005-12-04 Thread Mike Meyer
"Diez B. Roggisch" <[EMAIL PROTECTED]> writes: > Michal wrote: >> is there any way how to detect string encoding in Python? >> I need to proccess several files. Each of them could be encoded in >> different charset (iso-8859-2, cp1250, etc). I want to detect it, >> and encode it to utf-8 (with stri

Re: Detect character encoding

2005-12-04 Thread Diez B. Roggisch
Michal wrote: > Hello, > is there any way how to detect string encoding in Python? > > I need to proccess several files. Each of them could be encoded in > different charset (iso-8859-2, cp1250, etc). I want to detect it, and > encode it to utf-8 (with string function encode). You can only gues

Re: Detect character encoding

2005-12-04 Thread Scott David Daniels
Michal wrote: > Hello, > is there any way how to detect string encoding in Python? > > I need to proccess several files. Each of them could be encoded in > different charset (iso-8859-2, cp1250, etc). I want to detect it, and > encode it to utf-8 (with string function encode). > > Thank you for

Detect character encoding

2005-12-04 Thread Michal
Hello, is there any way how to detect string encoding in Python? I need to proccess several files. Each of them could be encoded in different charset (iso-8859-2, cp1250, etc). I want to detect it, and encode it to utf-8 (with string function encode). Thank you for any answer Regards Michal --