I'm not a specialist on this, but
seeing your html I would think, that you would like to see those 4 characters 
\xe4  in your html document, but Sphinx sees this as the utf-8 form of one (non 
existing) unicode character.

If that is correct, I would try some escaping to bypass the parsing of this 
sequence as utf-8.
Lothar

Von: [email protected] [mailto:[email protected]] Im 
Auftrag von Conway M
Gesendet: Donnerstag, 18. April 2013 17:39
An: [email protected]
Cc: [email protected]
Betreff: [sphinx-users] Re: SphinxError: Can't decode unicode within a doc

Günter, thanks for your response.

The conf.py did not have a source_encoding specified.  So I assume it would 
just default to 'utf-8-sig'.  Even explicitly specifying the encoding as 
'utf-8-sig' produced the same error.

The snippet in the rst document that is causing the error is (also specified in 
the original post):

data = 'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'

The complete rst document can be found 
here<https://raw.github.com/pydata/pandas/master/doc/source/io.rst>.  The 
resulting html should look like 
this<http://pandas.pydata.org/pandas-docs/dev/io.html#dealing-with-unicode-data>.

One thing that I just realized is that other developers who have built the docs 
have built them exclusively on a Linux box.  However, I am working off a Ubuntu 
12.04 virtual machine running on Windows 7.  So I'm not entirely convicted the 
the input file is broken and that it might be a platform dependent issue.




On Thursday, April 18, 2013 2:06:43 AM UTC-5, Guenter Milde wrote:
On 2013-04-17, Conway M wrote:


> I am trying to compile the docs of Pandas
> <https://github.com/pydata/pandas>but I am unable to get Sphinx to
> compile a document with some unicode.  Is there some flag I need to
> specify to let Sphinx correctly build documents with unicode in them?

The default input encoding is 'utf8', so if your rst document is
utf8-encoded, it should be OK.

If not, please post more details (used encoding, docutils settings).
A minimal example (the part of the input file that coused the error) may
help further.

> In this case, I don't want Sphinx to decode the text.

Docutils/Sphinx will always decode the input into an "unicode" instance
and encode the output. All inner processing is done on "unicode" (or
derived) objects.

...

>> *  File "/usr/local/lib/python2.7/dist-packages/sphinx/environment.py",
>> line 609, in read_doc
>>     raise SphinxError(str(err))
>> *SphinxError: 'utf8' codec can't decode byte 0xe4 in position 36: invalid
>> continuation byte
>> *>
>> /usr/local/lib/python2.7/dist-packages/sphinx/environment.py(609)read_doc()
>> -> raise SphinxError(str(err))
>> (Pdb)

It looks like the input file is either broken or not in utf8 encoding (which
then?).

It looks like the input decoding is not done by 
docutils.io<http://docutils.io>, but by the
Sphinx "wrapper" - this means you must tell Sphinx about the correct
"source_encoding"
http://sphinx-doc.org/config.html#confval-source_encoding.
Setting the Docutils config setting "input-encoding"
http://docutils.sourceforge.net/docs/user/config.html#input-encoding will
not help.

Günter
--
You received this message because you are subscribed to the Google Groups 
"sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To post to this group, send email to 
[email protected]<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/sphinx-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
"sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sphinx-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to