On 12-07-25 3:24 AM, steven mosher wrote:
Thank Dr. Ripley.
When I read the instructions
" If the DESCRIPTION file is not entirely in ASCII it should contain an ‘
Encoding’ field specifying an encoding. This is used as the encoding of the
DESCRIPTION file itself and of the R and NAMESPACE files, and as the
default encoding of .Rd files. "
I assumed that I should specify an encoding of UTF-8 in the description
file to handle the specific Rd files that were having problems.
After I removed the encoding field (UTF-8) I had added to the DESCRIPTION
file, the following error no longer occurred
" Package inputenc Error: Keyboard character used is undefined
(inputenc) in inputencoding `utf8'. "
however the rd files still had errors.
looking at the latex versions of the Rd files I was able to spot which
characters in the Rd files
were offending. In my normal editor they were simply not showing up, so I
had no idea what characters were
causing the problem.
The only thing I am left with is the entries in the data frame. The package
came with some predefined .rda files
in its data subdirectory. Using encoding() on the column of the data
frame that contains the following items
checking data for non-ASCII characters ... WARNING
Warning: found non-ASCII string(s)
'Tourbihre de la Rivihre-aux-Feu' in object 'modpoll'
'Lac ` la Fourche' in object 'modpoll'
'Lac ` la Loutre' in object 'modpoll'
'Lac Kinogami' in object 'modpoll'
I get "unknown" for all the items. So, If I understand you I should take
this dataframe, change the encoding
to UTF-8.
The issue is almost certainly with accented characters in the text. For
example, the 5th letter in Rivière is an e with a grave accent. It is
displayed as "h" in your error message, because R was not told how to
interpret the way it is stored in the source file, or was told something
that turned out to be incorrect.
You need to change the source file so that it is stored in the UTF-8
encoding. That means you should read the file into an editor that
displays it correctly (and that's sometimes hard when you don't know the
original encoding; you may need to do some manual editing), then save it
again, specifying that it should be saved using the UTF-8 encoding. How
you do that depends on your editor.
Then when you tell R that it is encoded in UTF-8, R will read it
properly and won't complain.
The tools::showNonASCIIfile() function can help to find characters that
may need fixing. R can recognize when things are not ASCII (those bytes
have the high bit set), but it will be up to you to figure out what
encoding was actually used. For French, latin1 is a good guess but it
is not necessarily right.
Duncan Murdoch
Sorry for being so dense
On Tue, Jul 24, 2012 at 1:46 PM, Prof Brian Ripley <rip...@stats.ox.ac.uk>wrote:
On 24/07/2012 21:08, steven mosher wrote:
Well, I'm working on project trying to bring back an old package last
published on R 1.9 back to life.
I'm almost there but I am getting killed by an encoding error in the Rd
files
After reading the manual, I decided to try UTF-8. Mostly because I could
spell it. ha.
That got me a bit closer but I still have these warnings
* checking data for non-ASCII characters ... WARNING
Warning: found non-ASCII string(s)
'Tourbihre de la Rivihre-aux-Feu' in object 'modpoll'
'Lac ` la Fourche' in object 'modpoll'
'Lac ` la Loutre' in object 'modpoll'
'Lac Kinogami' in object 'modpoll'
How to handle those is in 'Writing R Extensions': basically convert to
UTF-8 and mark them as UTF-8.
* checking data for ASCII and uncompressed saves ... OK
* checking examples ... OK
* checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
! Package inputenc Error: Keyboard character used is undefined
(inputenc) in inputencoding `utf8'.
I'll keep searching the help list archives for a clue, but If somebody
could point me at educational material it's really time
that I learn this aspect.
Without the actual file we can do little. The message means that
something in the manual inputs (and it could be the DESCRIPTION file or an
Rd file) contains a character not known to LaTeX. Most likely it is simply
not a UTF-8 character, but it could also be outside LaTeX's gamut.
Normally the LaTeX log (which is in the check output) is more revealing:
you can also try this part alone with R CMD Rd2pdf (and R CMD Rd2pdf
--no-description often points the finger at the DESCRIPTION file).
I've read
http://developer.r-project.**org/Encodings_and_R.html<http://developer.r-project.org/Encodings_and_R.html>
How do I figure out which encoding to use with the error seen above
Assuming this is not something esoteric, UTF-8 is the most comprehensive
choice, but LaTeX's UTF-8 coverage (and that of the fonts used) is heavily
biased to Western European scripts. So for example for Lithuanian you may
want to choose something else (Latin-7?).
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics,
http://www.stats.ox.ac.uk/~**ripley/<http://www.stats.ox.ac.uk/~ripley/>
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
[[alternative HTML version deleted]]
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel