> >>search_words=%B7%F6%BA%7E > >>select id from products where name like '??~' > >>Query failed: ERROR: Invalid EUC_JP character sequence found (0xba7e) > > > > > > This is definitly a bad EUC_JP. > > According to a PHP developer in my bug report > (http://bugs.php.net/bug.php?id=24309&edit=2): > > "URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is > B7E6+BA7E, which is correct EUC-JP character sequence. [snip] But, I > believe encoding detection of mbstring works fine in this case. > B7E6+BA7E is not correct byte sequence of SJIS, UTF-8, ISO2022-JP. It is > correct EUC-JP byte sequence." > > I see that he wrote B7E6 instead of the correct B7F6. I resubmitted my > bug report to PHP and pointed this out. Hopefully the developer will see > that this sequence is incorrect EUC-JP and that PHP failed to detect this :)
In the EUC_JP encoding there are some rules: 1) if the first byte is 0x8e then second byte is a JIS 0201 character and should be greater than 0x7f 2) else if the first byte is 0x8f then second and third byte is a JIS 0212 character and they should be greater than 0x7f 3) else if the first byte is greater than 0x7f then second and third byte is a JIS 0208 character and they should be greater than 0x7f 4) else the byte is ASII and should be eqaul to or less than 0x7f Apparently: B7F6: this is ok. we can apply rule #3 BA7E: this is not good, since it satisfies non of rule #1 to #4 > Thanks! > > Jean-Christian Imbeault > > PS I posted to HACKERS a few weeks ago about another bug (a real one :) > in the EUC-JP translation having to do with the WAVE DASH. I'll repost > here on the BUGS list, could you let me know the status of that BUG? Thanks! Sorry for the delay. In EUC-JP <--> Unicode translation, WAVE DASH is always a problem since there are several different mappings among different vendors/standards. I think I need more time to solve this. -- Tatsuo Ishii ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly