Michal wrote:
> Hello,
> is there any way how to detect string encoding in Python?
>
> I need to proccess several files. Each of them could be encoded in
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and
> encode it to utf-8 (with string function encode).
Well, about how to
Perhaps this project's code or ideas could be of service:
http://freshmeat.net/projects/enca/
Jeff
pgpYyDfS0xrTp.pgp
Description: PGP signature
--
http://mail.python.org/mailman/listinfo/python-list
Martin P. Hellwig wrote:
> I read or heard (can't remember the origin) that MS IE has a quite good
> implementation of guessing the language en character encoding of web
> pages when there not or falsely specified.
Yes, I think that's right. In my experience MS Word does a very good job
of gues
Thanks everybody for helpfull advices.
Michal
--
http://mail.python.org/mailman/listinfo/python-list
Diez B. Roggisch wrote:
> So cp1250 doesn't have all codepoints defined - but the others have.
> Sure, this helps you to eliminate 1 of the three choices the OP wanted
> to choose between - but how many texts you have that have a 129 in them?
For the iso8859 ones, you should assume that the char
Martin P. Hellwig wrote:
> From what I can remember is that they used an algorithm to create some
> statistics of the specific page and compared that with statistic about
> all kinds of languages and encodings and just mapped the most likely.
More hearsay: I believe language-based heuristics ar
[Diez B. Roggisch]
>Michal wrote:
>> is there any way how to detect string encoding in Python?
>Recode might be of help here, it has such heuristics built in AFAIK.
If we are speaking about the same Recode ☺, there are some built in
tools that could help a human to discover a charset, but this
Mike Meyer wrote:
> "Diez B. Roggisch" <[EMAIL PROTECTED]> writes:
>
>>Michal wrote:
>>
>>>is there any way how to detect string encoding in Python?
>>>I need to proccess several files. Each of them could be encoded in
>>>different charset (iso-8859-2, cp1250, etc). I want to detect it,
>>>and enc
Martin> I read or heard (can't remember the origin) that MS IE has a
Martin> quite good implementation of guessing the language en character
Martin> encoding of web pages when there not or falsely specified.
Gee, that's nice. Too bad the source isn't available... <0.5 wink>
Skip
--
Mike Meyer wrote:
> "Diez B. Roggisch" <[EMAIL PROTECTED]> writes:
>> Michal wrote:
>>> is there any way how to detect string encoding in Python?
>>> I need to proccess several files. Each of them could be encoded in
>>> different charset (iso-8859-2, cp1250, etc). I want to detect it,
>>> and enco
You may want to look at some Python Cookbook recipes, such as
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52257
"Auto-detect XML encoding" by Paul Prescod
--
http://mail.python.org/mailman/listinfo/python-list
Mentre io pensavo ad una intro simpatica "Michal" scriveva:
> Hello,
> is there any way how to detect string encoding in Python?
> I need to proccess several files. Each of them could be encoded in
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and
> encode it to utf-8 (with
"Diez B. Roggisch" <[EMAIL PROTECTED]> writes:
> Michal wrote:
>> is there any way how to detect string encoding in Python?
>> I need to proccess several files. Each of them could be encoded in
>> different charset (iso-8859-2, cp1250, etc). I want to detect it,
>> and encode it to utf-8 (with stri
Michal wrote:
> Hello,
> is there any way how to detect string encoding in Python?
>
> I need to proccess several files. Each of them could be encoded in
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and
> encode it to utf-8 (with string function encode).
You can only gues
Michal wrote:
> Hello,
> is there any way how to detect string encoding in Python?
>
> I need to proccess several files. Each of them could be encoded in
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and
> encode it to utf-8 (with string function encode).
>
> Thank you for
Hello,
is there any way how to detect string encoding in Python?
I need to proccess several files. Each of them could be encoded in
different charset (iso-8859-2, cp1250, etc). I want to detect it, and
encode it to utf-8 (with string function encode).
Thank you for any answer
Regards
Michal
--
16 matches
Mail list logo