Hi, At Fri, 6 Jul 2001 18:53:57 +0200 (CEST), peter karlsson <[EMAIL PROTECTED]> wrote:
> I have committed a fix now. It seems to work on my local machine (I > can't read Japanese, but I can see that there is no mis-encoding left). Thanks. I checked. I found many items read only "Debian". These pages have titles of "Debian <someting Japanese>", which are "Debian <esc><JIS X 0208 specifier string><JIS X 0208 literal><esc><ASCII specifier string>" in bytes. Thus, the first <esc> matches the regexp to end $title. (Note the second <esc> also cannot end $title. Well, <esc> cannot be a end sign in any ) $title =~ s/^#use .* title="(.+?)(" .*$|"$|\e.*$)/$1/; I think it should be modified as: $title =~ s/^#use .* title="(.+?)("\s.*$|"$)/$1/; I tested locally (as an independent perl script) and it works well for such pages. (I also modified to use \s instead of 0x20 space because it can match tab. This is not related to the problem we are discussing now.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/ "Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/