Bernhard,
To stop exception, you can modify $CHECK in Catalyst::Plugin::Unicode::Encoding
by removing “FB_CROAK”, that way it won’t throw exception, and let the code go
through, however it will not decode content correctly, but in this case, since
it is a spider, i don’t know if it matters to you.
you can just add this to your catalyst application:
use Catalyst::Plugin::Unicode::Encoding;
$Catalyst::Plugin::Unicode::Encoding::CHECK = Encode::LEAVE_SRC;
more isolated test that is illustrating this issue:
use strict;
use warnings;
use Encode qw(decode encode);
our $CHECK = Encode::FB_CROAK | Encode::LEAVE_SRC;
my $str = '深入 so what';
my $oct = encode("euc-cn", $str);
my $obj = Encode::find_encoding('UTF-8');
my $res = $obj->decode($oct, $CHECK);
warn $res;
-roman
On Jul 22, 2014, at 7:31 AM, Mark Ellis <[email protected]> wrote:
> I don't think there's anything you can do, you're app wants utf8 and they're
> sending something else which doesn't map. and since you can't know what
> format it is in, then all you can do is die if it doesn't map, which is what
> the plugin does.
>
> as far as i can tell the ruby middleware i found handles this by returning a
> 400 bad request, which cataylst does as well. so there's no affect, other
> than the noise in the logs.
>
>
> On 22 July 2014 11:21, Bernhard Bauch <[email protected]> wrote:
> here’s also a perl-script that does it
>
> ------------------------------------------
> use Encode qw(decode encode);
> use LWP::UserAgent;
>
> my $str = '深入 so what';
> my $oct = encode("gb2312", $str);
> my $url = 'http://wbc-inco.net/object/event/past';
> my $ua = LWP::UserAgent->new();
> my $response = $ua->post( $url, { $oct => $oct } );
> my $content = $response->decoded_content();
> ------------------------------------------
>
> On 22 Jul 2014, at 11:33, Bernhard Bauch <[email protected]> wrote:
>
>> hey all,
>>
>> this pyton3 script triggers the error ….
>>
>> --------------------------------
>> import httplib2
>> import urllib.parse
>>
>> somestr = '深入 so what'
>> encodedstr = somestr.encode('gb2312')
>> url = 'http://myappdomain.com/search'
>> body = { encodedstr:encodedstr }
>> headers = {
>> 'Content-type': 'application/x-www-form-urlencoded',
>> 'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml,
>> image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1',
>> 'Accept-Encoding': 'gzip, deflate',
>> 'Accept-Language': 'zh;q=0.9,en;q=0.8'
>> }
>> http = httplib2.Http()
>> response, content = http.request(url, 'POST', headers=headers,
>> body=urllib.parse.urlencode(body))
>> ————————————————
>>
>> now its possible to reproduce the error :)
>>
>> any ideas how to solve this ?
>> ruby people did this with adding a utf8-sanitizer in the middleware..
>>
>> bye, bernhard
>>
>>
>> On 21 Jul 2014, at 22:19, Bernhard Bauch <[email protected]> wrote:
>>
>>> more news..
>>>
>>> the crawler/searcheinge that triggers these errors is
>>> http://easou.com
>>>
>>> this searchengine delivers their pages not in UTF8 — but in “gb2312” which
>>> is “simple chinese”
>>> if i open the “wrong utf8” parameters from the faulty requests with
>>> “gb2312” some readable signs appear.
>>> >> this leads me to: catalyst does not handle requests with gb2312 encoded
>>> >> parameters (because they are not utf8) -and the request does not promote
>>> >> that it is encoded in other than utf8.
>>>
>>> any ideas what to do ?
>>>
>>> bye, bernhard
>>>
>>>
>>>
>>> On 21 Jul 2014, at 14:36, Roman Winfinit <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> How are you running your application? Ie: mod_perl, fcgi, fcgi +
>>>> httpd/nginx, plack + ... also what version of perl are you using and what
>>>> os?
>>>>
>>>> -roman
>>>>
>>>> On Jul 21, 2014 6:58 AM, "Bernhard Bauch" <[email protected]> wrote:
>>>> Hey all,
>>>>
>>>> on most of my website running on (latest catalyst: 5.90065) i always get
>>>> utf8 related errors.
>>>> the usually appear if a spider
>>>> Mozilla/5.0 (compatible; EasouSpider;
>>>> +http://www.easou.com/search/spider.html)
>>>> comes accross.
>>>>
>>>> the error is:
>>>> Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to
>>>> Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm
>>>> line 167.
>>>>
>>>> It took me while to get the actual parameters the spiders sends because
>>>> the debug-message of catalyst do not tell that much :...
>>>>
>>>> —————————————
>>>> [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim
>>>> /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s)
>>>> [10682] [Wed Jul 16 15:08:47 2014] ***
>>>> [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim
>>>> /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400;
>>>> Content-Type: text/plain; charset=UTF-8; Content-Length: unknown
>>>> [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim
>>>> /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s
>>>> (154.059/s)
>>>> .---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------.
>>>> | Action
>>>>
>>>> | Time |
>>>> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
>>>> '---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------'
>>>> —————————————
>>>>
>>>> i changed to Plugin::Unicode::Encoding plugin a bit to find out what the
>>>> client sends … the results are these:
>>>> UTF8 trash arrives - and the module seems unable to deal with it…
>>>>
>>>> ————————————
>>>> Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to
>>>> Unicode at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm
>>>> line 170.
>>>> -
>>>>
>>>> URL: notice/list
>>>>
>>>> PARAMS:$VAR1 = {
>>>> 'X*Ö^K^@^@^@^@¸®ä
>>>> ^@^@^@^@8<83>^H^K^@^@^@^@h¡ä
>>>> ^@^@^@^@Hµä
>>>> ^@^@^@^@X^Z^N^Q^@^@^@^@ø<91>^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸<92>^F^Q^@^@^@^@(^K^N^Q^@^@^@^@<88>^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@<88>úÝ^P^@^@^@^@^Xá(
>>>> ^@^@^@^@ئÆ
>>>> ^@^@^@^@Øï*^Q^@^@^@^@^X' =>
>>>> '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@
>>>>
>>>> J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@
>>>> <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@ <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@
>>>> <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@
>>>>
>>>> <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@'
>>>> };
>>>>
>>>>
>>>> // value: $VAR1 =
>>>> '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@
>>>>
>>>> J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@
>>>> <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@ <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@
>>>> <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@
>>>>
>>>> <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@';
>>>>
>>>>
>>>> headers: Connection: close
>>>> Accept: text/html, application/xml;q=0.9, application/xhtml+xml,
>>>> image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
>>>> Accept-Encoding: gzip, deflate
>>>> Accept-Language: zh;q=0.9,en;q=0.8
>>>> Host: wbc-inco.net
>>>> User-Agent: Mozilla/5.0 (compatible; EasouSpider;
>>>> +http://www.easou.com/search/spider.html)
>>>> Content-Length: 927
>>>> Content-Type: application/x-www-form-urlencoded
>>>> REFER: http://b------.net/“
>>>>
>>>> ————————————
>>>>
>>>> to understand the logging above: this is what i added /changed in the
>>>> Catalyst::Plugin::Unicode::Encoding
>>>>
>>>> ————————————————————
>>>> around line 168:
>>>>
>>>> my $val;
>>>> eval {
>>>> $val = Encode::is_utf8( $value ) ? $value : $enc->decode(
>>>> $value, $CHECK );
>>>> };
>>>> if ($@){
>>>> # UPS !
>>>> # get request infos
>>>> use Data::Dumper;
>>>> my $params = $self->req->parameters;
>>>> my $headers= $self->req->headers->as_string;
>>>> die "UTF8 Error: $@ - \n\nURL: " . $self->req->path . "\n\nPARAMS:" .
>>>> Dumper( $params ) . "\n\n // value: " . Dumper($value) . "\n\nheaders: " .
>>>> $headers;
>>>> ….
>>>> ————————————————————
>>>>
>>>> I guess my Catalyst Apps are not the only ones with these errors ?
>>>>
>>>>
>>>> about my App settings / config:
>>>>
>>>> app-config has
>>>> encoding UTF-8
>>>>
>>>> App.pm does not load Unicode::Encoding anymore (since this is not need
>>>> when using latest Catalyst: 5.90065)
>>>>
>>>> i am using postgres with
>>>> pg_enable_utf8 1
>>>> (but the error about is far away from any DB related problem i guess)
>>>>
>>>> using Catalyst::Plugin::Unicode::Encoding version 2.1 (coming with
>>>> catalyxt)
>>>>
>>>> i just checked out the tracker for catalyst on cpan, there is an UTF8
>>>> issue ticket
>>>> https://rt.cpan.org/Public/Bug/Display.html?id=94957
>>>> but i does not look as it was this problem ...
>>>>
>>>> Any ideas what todo ?
>>>> Add a issue/ticket ?
>>>>
>>>> thanks for feedback,
>>>> bernhard bauch
>>>>
>>>>
>>>>
>>>> —
>>>> Bernhard Bauch, Webdevelopment
>>>> ZSI - Zentrum für soziale Innovation
>>>> [email protected]
>>>> Skype: berni-zsi
>>>>
>>>>
>>>> _______________________________________________
>>>> List: [email protected]
>>>> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>>>> Searchable archive: http://www.mail-archive.com/[email protected]/
>>>> Dev site: http://dev.catalyst.perl.org/
>>>>
>>>> !DSPAM:53cd09a3104511692032419!
>>>> _______________________________________________
>>>> List: [email protected]
>>>> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>>>> Searchable archive: http://www.mail-archive.com/[email protected]/
>>>> Dev site: http://dev.catalyst.perl.org/
>>>>
>>>>
>>>> !DSPAM:53cd09a3104511692032419!
>>>
>>> —
>>> Bernhard Bauch, Webdevelopment
>>> ZSI - Zentrum für soziale Innovation
>>> [email protected]
>>> Skype: berni-zsi
>>>
>>> _______________________________________________
>>> List: [email protected]
>>> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>>> Searchable archive: http://www.mail-archive.com/[email protected]/
>>> Dev site: http://dev.catalyst.perl.org/
>>>
>>>
>>> !DSPAM:53cd7626104517769513966!
>>
>> —
>> Bernhard Bauch, Webdevelopment
>> ZSI - Zentrum für soziale Innovation
>> [email protected]
>> Skype: berni-zsi
>>
>> _______________________________________________
>> List: [email protected]
>> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>> Searchable archive: http://www.mail-archive.com/[email protected]/
>> Dev site: http://dev.catalyst.perl.org/
>>
>>
>> !DSPAM:53ce305e104511469956211!
>
>
> —
> Bernhard Bauch, Webdevelopment
> ZSI - Zentrum für soziale Innovation
> [email protected]
> Skype: berni-zsi
>
>
> _______________________________________________
> List: [email protected]
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/[email protected]/
> Dev site: http://dev.catalyst.perl.org/
>
>
> _______________________________________________
> List: [email protected]
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/[email protected]/
> Dev site: http://dev.catalyst.perl.org/
_______________________________________________
List: [email protected]
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/[email protected]/
Dev site: http://dev.catalyst.perl.org/