On Mon, Aug 7, 2017 at 9:53 AM, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: > I just had a look at http://www.unicode.org/L2/L2017/17197-utf8-retract.pdf > to use the test data in there for Ruby. > I was under the impression from previous looks at it that it contained a lot > of test data.
It contains the test outputs with identical results (output exhibiting the spec-following behavior and output exhibiting the one REPLACEMENT CHARACTER per bogus byte behavior) shown only once. Since the input doesn't make sense as a PDF, it only mentions where to find the input (https://hsivonen.fi/broken-utf-8/test.html). > However, when I looked at the test data more carefully (I had > read the text before the test data carefully at least two times before, but > not looked at the test data in that much detail), I discovered that there > might be up to 7 copies of the same data. The first one starts on page 9, > and then there's a new one about every 4 or 5 pages. > > Can you check/confirm? Any idea what might have caused this? The test outputs are not identical. They should be the content of the following files with a bit of introductory text before each: https://hsivonen.fi/broken-utf-8/spec.html https://hsivonen.fi/broken-utf-8/one-per-byte.html https://hsivonen.fi/broken-utf-8/win32.html https://hsivonen.fi/broken-utf-8/java.html https://hsivonen.fi/broken-utf-8/python2.html with non-conforming output replaced with italic text saying what the bytes were https://hsivonen.fi/broken-utf-8/perl5.html https://hsivonen.fi/broken-utf-8/icu.html I inspected the PDF multiple times just now, and, as far as I can tell, the content indeed matches what I described above (no duplicates). For reference, I tested the Ruby standard library with the following program: data = IO.read("test.html", encoding: "UTF-8") encoded = data.encode("UTF-16LE", :invalid=>:replace).encode("UTF-8") IO.write("ruby.html", encoded) ...where test.html was the file available at https://hsivonen.fi/broken-utf-8/test.html -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/