I'm not a specialist on this, but seeing your html I would think, that you would like to see those 4 characters \xe4 in your html document, but Sphinx sees this as the utf-8 form of one (non existing) unicode character.
If that is correct, I would try some escaping to bypass the parsing of this sequence as utf-8. Lothar Von: [email protected] [mailto:[email protected]] Im Auftrag von Conway M Gesendet: Donnerstag, 18. April 2013 17:39 An: [email protected] Cc: [email protected] Betreff: [sphinx-users] Re: SphinxError: Can't decode unicode within a doc Günter, thanks for your response. The conf.py did not have a source_encoding specified. So I assume it would just default to 'utf-8-sig'. Even explicitly specifying the encoding as 'utf-8-sig' produced the same error. The snippet in the rst document that is causing the error is (also specified in the original post): data = 'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5' The complete rst document can be found here<https://raw.github.com/pydata/pandas/master/doc/source/io.rst>. The resulting html should look like this<http://pandas.pydata.org/pandas-docs/dev/io.html#dealing-with-unicode-data>. One thing that I just realized is that other developers who have built the docs have built them exclusively on a Linux box. However, I am working off a Ubuntu 12.04 virtual machine running on Windows 7. So I'm not entirely convicted the the input file is broken and that it might be a platform dependent issue. On Thursday, April 18, 2013 2:06:43 AM UTC-5, Guenter Milde wrote: On 2013-04-17, Conway M wrote: > I am trying to compile the docs of Pandas > <https://github.com/pydata/pandas>but I am unable to get Sphinx to > compile a document with some unicode. Is there some flag I need to > specify to let Sphinx correctly build documents with unicode in them? The default input encoding is 'utf8', so if your rst document is utf8-encoded, it should be OK. If not, please post more details (used encoding, docutils settings). A minimal example (the part of the input file that coused the error) may help further. > In this case, I don't want Sphinx to decode the text. Docutils/Sphinx will always decode the input into an "unicode" instance and encode the output. All inner processing is done on "unicode" (or derived) objects. ... >> * File "/usr/local/lib/python2.7/dist-packages/sphinx/environment.py", >> line 609, in read_doc >> raise SphinxError(str(err)) >> *SphinxError: 'utf8' codec can't decode byte 0xe4 in position 36: invalid >> continuation byte >> *> >> /usr/local/lib/python2.7/dist-packages/sphinx/environment.py(609)read_doc() >> -> raise SphinxError(str(err)) >> (Pdb) It looks like the input file is either broken or not in utf8 encoding (which then?). It looks like the input decoding is not done by docutils.io<http://docutils.io>, but by the Sphinx "wrapper" - this means you must tell Sphinx about the correct "source_encoding" http://sphinx-doc.org/config.html#confval-source_encoding. Setting the Docutils config setting "input-encoding" http://docutils.sourceforge.net/docs/user/config.html#input-encoding will not help. Günter -- You received this message because you are subscribed to the Google Groups "sphinx-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To post to this group, send email to [email protected]<mailto:[email protected]>. Visit this group at http://groups.google.com/group/sphinx-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "sphinx-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sphinx-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
