The following module was proposed for inclusion in the Module List: modid: Encode::Guess::Educated DSLIP: adpOp description: determine encoding based on language model userid: TOMC (Tom Christiansen) chapterid: 13 (Internationalization_Locale) communities:
similar: Encode::Guess Encode::Detect rationale: Damian suggested E::Infer. Brian liked E::G::Educated, which has the advantage of being a three-level name. I donât much care, but Brianâs seems cool. My approach differs from all existing approaches because it uses a language model trained against three different very large English-language corpora. It correctly determines the encoding between several possible 8-bit encodings where the other modules fail miserably. I had originally thought to put this under Lingua::EN:: somewhere, but Damian convinced me that this was wrong. It is only the fact that I use English-language models by default that it works on English-language text. There is no reason that the user could not supply their own training model for some other language, and have it perform commensurately well on the non-English text. I will make the mechanism for doing this clearer in the beta release. enteredby: TOMC (Tom Christiansen) enteredon: Mon Mar 5 19:42:29 2012 GMT The resulting entry would be: Encode::Guess:: ::Educated adpOp determine encoding based on language model TOMC Thanks for registering, -- The PAUSE PS: The following links are only valid for module list maintainers: Registration form with editing capabilities: https://pause.perl.org/pause/authenquery?ACTION=add_mod&USERID=20800000_0a090bfcd67d5620&SUBMIT_pause99_add_mod_preview=1 Immediate (one click) registration: https://pause.perl.org/pause/authenquery?ACTION=add_mod&USERID=20800000_0a090bfcd67d5620&SUBMIT_pause99_add_mod_insertit=1 Peek at the current permissions: https://pause.perl.org/pause/authenquery?pause99_peek_perms_by=me&pause99_peek_perms_query=Encode%3A%3AGuess%3A%3AEducated