Hey guys, Probably nobody saw this because of the time of year (Happy New Year, incidentally!!!).
Just a quick ping to the list to see if anyone can give me some pointers. Chris > On 30 Dec 2015, at 12:15 PM, Chris Sherlock <chris.sherloc...@gmail.com> > wrote: > > Hi guys, > > In bug 95217 - https://bugs.documentfoundation.org/show_bug.cgi?id=95217 > <https://bugs.documentfoundation.org/show_bug.cgi?id=95217> - Persian test in > a webpage encoded as UTF-8 is corrupting. > > If I take the webpage and save to an HTML file encoded as UTF8, then there > are no problems and the Persian text comes through fine. However, when > connecting to a webserver directly, the HTTP header correctly gives the > content type as utf8. > > I did a test using Charles Proxy with its SSL interception feature turned on > and pointed Safari to > https://bugs.documentfoundation.org/attachment.cgi?id=119818 > <https://bugs.documentfoundation.org/attachment.cgi?id=119818> > > The following headers are gathered: > > HTTP/1.1 200 OK > Server: nginx/1.2.1 > Date: Sat, 26 Dec 2015 01:41:30 GMT > Content-Type: text/html; name="text.html"; charset=UTF-8 > Content-Length: 982 > Connection: keep-alive > X-xss-protection: 1; mode=block > Content-disposition: inline; filename="text.html" > X-content-type-options: nosniff > > Some warnings are spat out that it editeng's eehtml can't detect the > encoding. I initially thought it was looking for a BOM, which makes no sense > for a webpage, but that's wrong. Instead, for some reason the headers don't > seem to be processed and the HTML parser is falling back to ISO-8859-1 and > not UTF8 as the character encoding. > > We seem to use Neon to make the GET request to the webserver. A few > observations: > > 1. We detect a server OK response as an error > 2. (Probably more to the point) I believe PROPFIND is being used, but > actually even though the function being used indicates a PROPFIND verb is > used a GET is used as is normal but the headers aren't being stored. This > ,Evans that when the parser looks for the headers to find the encoding it's > not finding anything, resulting in a fallback to ISO-8859-1. > > One easy thing (doesn't solve the root issue) is that wouldn't it be a better > idea to fallback to UTF8 and not ISO-8859-1, given ISO-8859-1 is really just > a subset of UTF-8? > > Any pointers on how to get to the bottom of this would be appreciated, I'm > honestly not up on webdav or Neon. > > Chris Sherlock
_______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice