Re: [Rd] Encoding errors in Rd files

Duncan Murdoch Wed, 25 Jul 2012 01:06:17 -0700

On 12-07-25 3:24 AM, steven mosher wrote:

Thank Dr. Ripley.


  When I read the instructions
" If the DESCRIPTION file is not entirely in ASCII it should contain an ‘
Encoding’ field specifying an encoding. This is used as the encoding of the
DESCRIPTION file itself and of the R and NAMESPACE files, and as the
default encoding of .Rd files.  "

I assumed that I should specify an encoding of UTF-8 in the description
file to handle the specific Rd files that were having problems.

After I removed  the encoding field (UTF-8)  I had added to the DESCRIPTION
file, the following error no longer occurred

"  Package inputenc Error: Keyboard character used is undefined
(inputenc)                in inputencoding `utf8'. "

however the rd files still had errors.

    looking at the latex versions of the Rd files I was able to spot which
characters in the Rd files
were offending. In my normal editor they were simply not showing up, so I
had no idea what characters were
causing the problem.

The only thing I am left with is the entries in the data frame. The package
came with some predefined  .rda files
in its data subdirectory.  Using encoding() on  the column of the data
frame that contains the following items

  checking data for non-ASCII characters ... WARNING
    Warning: found non-ASCII string(s)
    'Tourbihre de la Rivihre-aux-Feu' in object 'modpoll'
    'Lac ` la Fourche' in object 'modpoll'
    'Lac ` la Loutre' in object 'modpoll'
    'Lac Kinogami' in object 'modpoll'

I get "unknown"  for all the items.  So, If I understand you I should take
this dataframe, change the encoding
to UTF-8.

The issue is almost certainly with accented characters in the text. Forexample, the 5th letter in Rivière is an e with a grave accent. It isdisplayed as "h" in your error message, because R was not told how tointerpret the way it is stored in the source file, or was told somethingthat turned out to be incorrect.

You need to change the source file so that it is stored in the UTF-8encoding. That means you should read the file into an editor thatdisplays it correctly (and that's sometimes hard when you don't know theoriginal encoding; you may need to do some manual editing), then save itagain, specifying that it should be saved using the UTF-8 encoding. Howyou do that depends on your editor.

Then when you tell R that it is encoded in UTF-8, R will read itproperly and won't complain.

The tools::showNonASCIIfile() function can help to find characters thatmay need fixing. R can recognize when things are not ASCII (those byteshave the high bit set), but it will be up to you to figure out whatencoding was actually used. For French, latin1 is a good guess but itis not necessarily right.


Duncan Murdoch


Sorry for being so dense

On Tue, Jul 24, 2012 at 1:46 PM, Prof Brian Ripley <rip...@stats.ox.ac.uk>wrote:

On 24/07/2012 21:08, steven mosher wrote:

Well, I'm working on project trying to bring back an old package last
published on R 1.9 back to life.
I'm almost there but I am getting killed by an encoding error in the Rd
files

After reading the manual, I decided to try UTF-8.  Mostly because I could
spell it. ha.

That got me a bit closer but I still have these warnings

* checking data for non-ASCII characters ... WARNING
    Warning: found non-ASCII string(s)
    'Tourbihre de la Rivihre-aux-Feu' in object 'modpoll'
    'Lac ` la Fourche' in object 'modpoll'
    'Lac ` la Loutre' in object 'modpoll'
    'Lac Kinogami' in object 'modpoll'


How to handle those is in 'Writing R Extensions': basically convert to
UTF-8 and mark them as UTF-8.


  * checking data for ASCII and uncompressed saves ... OK

* checking examples ... OK
* checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
   ! Package inputenc Error: Keyboard character used is undefined
(inputenc)                in inputencoding `utf8'.

I'll keep searching the help list archives for a clue, but If somebody
could point me at educational material it's really time
that I learn this aspect.


Without the actual file we can do little.  The message means that
something in the manual inputs (and it could be the DESCRIPTION file or an
Rd file) contains a character not known to LaTeX.  Most likely it is simply
not a UTF-8 character, but it could also be outside LaTeX's gamut.

Normally the LaTeX log (which is in the check output) is more revealing:
you can also try this part alone with R CMD Rd2pdf (and R CMD Rd2pdf
--no-description often points the finger at the DESCRIPTION file).

I've read    
http://developer.r-project.**org/Encodings_and_R.html<http://developer.r-project.org/Encodings_and_R.html>

How do I figure out which encoding to use with the error seen above


Assuming this is not something esoteric, UTF-8 is the most comprehensive
choice, but LaTeX's UTF-8 coverage (and that of the fonts used) is heavily
biased to Western European scripts.  So for example for Lithuanian you may
want to choose something else (Latin-7?).




--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  
http://www.stats.ox.ac.uk/~**ripley/<http://www.stats.ox.ac.uk/~ripley/>
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


        [[alternative HTML version deleted]]



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Encoding errors in Rd files

Reply via email to