On Mon, 2021-09-13 at 14:03 +0100, Jonathan Wakely via Gcc wrote:
> On Mon, 13 Sept 2021 at 14:01, Jonathan Wakely <jwakely....@gmail.com>
> wrote:
> > 
> > On Mon, 13 Sept 2021 at 13:53, Thomas Koenig via Gcc <
> > gcc@gcc.gnu.org> wrote:
> > > 
> > > Hi,
> > > 
> > > I just got an error when accessing the gcc git pages at
> > > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git , it is:
> > > 
> > > This page contains the following errors:
> > > error on line 91 at column 6: XML declaration allowed only at the
> > > start
> > > of the document
> > > Below is a rendering of the page up to the first error.
> > 
> > The web server seems to restart the page in the middle of the HTML,
> > the content contains:
> > 
> > </tr>
> > <tr class="light">
> > Content-type: text/html
> > 
> > <?xml version="1.0" encoding="utf-8"?>
> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
> > <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en-US" lang="en-
> > US">
> 
> Ah, the "second" page it's trying to display (in the middle of the
> first) is an error:
> 
> <div class="page_body">
> <br /><br />
> 500 - Internal Server Error
> <br />
> <hr />
> Wide character in subroutine entry at /var/www/git/gitweb.cgi line
> 2208.
> 
> </div>

Summarizing some notes from IRC:

The last commit it manages to print successfully in that log seems to
be:
  c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
so it appears that:
  42e95a830ab48e59389065ce79a013a519646f1
is triggering the issue, and indeed
  
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f42e95a830ab48e59389065ce79a013a519646f1
fails in a similar way, whereas other commits work.

It appears to be due to the "ł" character in the email address of the
Author, in that:

commit c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
Author: Jan-Benedict Glaw <jbg...@lug-owl.de>

works, whereas:

commit f42e95a830ab48e59389065ce79a013a519646f1
Author: Jan-Benedict Glaw <jbglaw@ług-owl.de>

doesn't.

git show f42e95a830ab48e59389065ce79a013a519646f1 | hexdump -C

shows:

00000030  41 75 74 68 6f 72 3a 20  4a 61 6e 2d 42 65 6e 65  |Author: Jan-Bene|
00000040  64 69 63 74 20 47 6c 61  77 20 3c 6a 62 67 6c 61  |dict Glaw <jbgla|
00000050  77 40 c5 82 75 67 2d 6f  77 6c 2e 64 65 3e 0a 44  
|w...@..ug-owl.de>.D|
00000060  61 74 65 3a 20 20 20 4d  6f 6e 20 53 65 70 20 31  |ate:   Mon Sep 1|

i.e. we have the two bytes 0xc5 0x82, which is the UTF-8 encoding of "ł".


$ git format-patch 
c012297c9d5dfb177adf1423bdd05e5f4b87e5ec^^..c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
0001-Fix-multi-statment-macro.patch
0002-cr16-elf-is-now-obsoleted.patch
$ file *.patch
0001-Fix-multi-statment-macro.patch:  unified diff output, UTF-8 Unicode text
0002-cr16-elf-is-now-obsoleted.patch: unified diff output, ASCII text


Hope this is helpful
Dave

Reply via email to