Günter, thanks for your response. The conf.py did not have a source_encoding specified. So I assume it would just default to 'utf-8-sig'. Even explicitly specifying the encoding as 'utf-8-sig' produced the same error.
The snippet in the rst document that is causing the error is (also specified in the original post): *data = 'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'* The complete rst document can be found here<https://raw.github.com/pydata/pandas/master/doc/source/io.rst>. The resulting html should look like this<http://pandas.pydata.org/pandas-docs/dev/io.html#dealing-with-unicode-data>. One thing that I just realized is that other developers who have built the docs have built them exclusively on a Linux box. However, I am working off a Ubuntu 12.04 virtual machine running on Windows 7. So I'm not entirely convicted the the input file is broken and that it might be a platform dependent issue. On Thursday, April 18, 2013 2:06:43 AM UTC-5, Guenter Milde wrote: > > On 2013-04-17, Conway M wrote: > > > > I am trying to compile the docs of Pandas > > <https://github.com/pydata/pandas>but I am unable to get Sphinx to > > compile a document with some unicode. Is there some flag I need to > > specify to let Sphinx correctly build documents with unicode in them? > > The default input encoding is 'utf8', so if your rst document is > utf8-encoded, it should be OK. > > If not, please post more details (used encoding, docutils settings). > A minimal example (the part of the input file that coused the error) may > help further. > > > In this case, I don't want Sphinx to decode the text. > > Docutils/Sphinx will always decode the input into an "unicode" instance > and encode the output. All inner processing is done on "unicode" (or > derived) objects. > > ... > > >> * File "/usr/local/lib/python2.7/dist-packages/sphinx/environment.py", > >> line 609, in read_doc > >> raise SphinxError(str(err)) > >> *SphinxError: 'utf8' codec can't decode byte 0xe4 in position 36: > invalid > >> continuation byte > >> *> > >> > /usr/local/lib/python2.7/dist-packages/sphinx/environment.py(609)read_doc() > >> -> raise SphinxError(str(err)) > >> (Pdb) > > It looks like the input file is either broken or not in utf8 encoding > (which > then?). > > It looks like the input decoding is not done by docutils.io, but by the > Sphinx "wrapper" - this means you must tell Sphinx about the correct > "source_encoding" > http://sphinx-doc.org/config.html#confval-source_encoding. > Setting the Docutils config setting "input-encoding" > http://docutils.sourceforge.net/docs/user/config.html#input-encoding will > not help. > > Günter > > -- You received this message because you are subscribed to the Google Groups "sphinx-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sphinx-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
