Tomohiro KUBOTA:
> Imagine an ISO-2022-JP string has a JIS X 0208 part and following
> ASCII part. When the JIS X 0208 part ends with 0x22, it matches "\e
> and thus the regexp will fail.
Yes, I am aware of that, but since regular expressions are not powerful
enough to parse all possible combinat
Hi,
Thank you again for submitting your fix. Now the page is good.
At Sat, 7 Jul 2001 09:29:02 +0200 (CEST),
peter karlsson <[EMAIL PROTECTED]> wrote:
> and those were not matched properly. However, I seem to have missed a
> quotation mark missing in the regexp, it should read:
>
>$title =
Tomohiro KUBOTA:
> I found many items read only "Debian".
I've put in a fix for this now.
--
\\//
peter - http://www.softwolves.pp.se/
Statement concerning unsolicited e-mail according to Swedish law:
http://www.softwolves.pp.se/peter/reklampost.html
Tomohiro KUBOTA:
> $title =~ s/^#use .* title="(.+?)(" .*$|"$|\e.*$)/$1/;
>
> I think it should be modified as:
>
> $title =~ s/^#use .* title="(.+?)("\s.*$|"$)/$1/;
That does not work (that was my first attempt), because there are some
Japanese pages that have
title="DBCS"
and those wer
Hi,
At Fri, 6 Jul 2001 18:53:57 +0200 (CEST),
peter karlsson <[EMAIL PROTECTED]> wrote:
> I have committed a fix now. It seems to work on my local machine (I
> can't read Japanese, but I can see that there is no mis-encoding left).
Thanks. I checked.
I found many items read only "Debian". The
Tomohiro KUBOTA:
> Could someone CVS committer please implement this to
> webwml/english/sitemap.wml ?
I have committed a fix now. It seems to work on my local machine (I
can't read Japanese, but I can see that there is no mis-encoding left).
--
\\//
peter - http://www.softwolves.pp.se/
Stat
Hi,
At Fri, 6 Jul 2001 17:21:34 +0200,
Josip Rodin <[EMAIL PROTECTED]> wrote:
>> my $title = `egrep '^#use .* title=' $page `; chomp $title;
>> $title =~ s/^#use .* title="([^"]+)".*$/$1/;
> I suppose we could just change that regexp to match everything after the
> opening double quote up to
On Fri, Jul 06, 2001 at 09:36:50PM +0900, Tomohiro KUBOTA wrote:
> I checked webwml/english/sitemap.wml and found:
>
> my $title = `egrep '^#use .* title=' $page `; chomp $title;
> $title =~ s/^#use .* title="([^"]+)".*$/$1/;
>
> This seems to be the code to extract title for sitemap items.
Hi,
At 06 Jul 2001 08:46:08 +0900,
Olaf Meeuwissen <[EMAIL PROTECTED]> wrote:
> But some sites screw up the charset :-( Claiming to use one encoding
> and using another.
Yes. We must not do such a poor mistake! :-)
> Hmm, that sounds like it could be an inconsistency in the parser rules
> (n
Tomohiro KUBOTA <[EMAIL PROTECTED]> writes:
> Note that new web browsers which understand will NOT be confused by
> any encodings.
But some sites screw up the charset :-( Claiming to use one encoding
and using another.
> UTF-8 is not popular yet and some browsers may fail to display,
> though
Hi,
At Thu, 5 Jul 2001 17:36:39 +0100,
David Starner <[EMAIL PROTECTED]> wrote:
> Doesn't ISO-2022-JP have a form that invokes JIS X 0208 into the upper half?
> Could SJIS be used instead?
No.
Additional explanations about real state of Japanese encodings:
There are three popular encodings fo
"David Starner" <[EMAIL PROTECTED]> writes:
> Doesn't ISO-2022-JP have a form that invokes JIS X 0208 into the
> upper half?
No, but you may have been thrown off by the fact that EUC-JP is a
proper ISO-2022 encoding. This is not the same as a ISO-2022-JP
encoding. See Ken Lunde's CJKV, Chap. 4
Tomohiro KUBOTA <[EMAIL PROTECTED]> writes:
> [encoding story zapped]
>
> When the corresponding Japanese wml page has a Japanese title
> (in #use wml::debian::template title="" line) which includes
> a Japanese character which include include 0x22 (DOUBLE QUOTE)
> in its pair of bytes, a pro
David Starner:
> Doesn't ISO-2022-JP have a form that invokes JIS X 0208 into the upper half?
You have EUC-JP, which encodes the JIS X 0208 at 0xA1-0xFE (it is the same
encoding as ISO-2022-JP, but with the high bit set, and no escape
sequences).
> Could SJIS be used instead?
Shift-JIS is a hor
Writes Tomohiro KUBOTA <[EMAIL PROTECTED]>:
> Does anyone have any idea to solve this problem?
It seems to me you have two options: pick an encoding that doesn't have this
problem, or change wml so it deals with ISO-2022-JP.
Doesn't ISO-2022-JP have a form that invokes JIS X 0208 into the upper h
Hi,
I found that some items of Japanese version of "Sitemap" page
are broken.
http://www.debian.org/sitemap.ja.html
I researched this problem and found the reason. However, before
explaining it, I will have to explain the encoding used for
Japanese web pages.
Japanese web pages (wml sources
16 matches
Mail list logo