Re: [Rd] Error on Windows build: "unable to re-encode"

Duncan Murdoch Sat, 27 Feb 2010 08:08:40 -0800

Felix Schönbrodt wrote:

Thanks for your help - that was the solution (easy enough to remove these two 
characters - they've been in only comments anyway).
Fortunately, the DECRIPTION file accepts umlauts, as in my second name. The 
problem was only in the source file.

I think comments in R code could also include umlauts, but they need tobe encoded in a way that can be converted to Latin1 on Windows. I don'tknow why yours weren't. Did those characters look like u-umlaut on yoursystem? What editor did you use to produce that file?

I'm not sure what the consequences would be of allowing unrepresentablecharacters to be mapped to question marks or hex codes (with awarning). I think it would slow down the processing a bit (becausethose lines would need to be processed twice: once to detect that theyhave some bad characters, a second time to replace them). I'm not sureif it would slow down processing of files that include no bad chars.I'll take a look.


Duncan Murdoch

Felix


Am 26.02.2010 um 18:37 schrieb Duncan Murdoch:

On 26/02/2010 11:05 AM, Felix Schönbrodt wrote:

Hi Duncan,

I now declared the endcoding in the DESCRIPTION to UTF-8 (and all files are encoded in 
that way, too). As my last name is "Schönbrodt", I'd be happy to see it that 
way in the package ;-)

However, it still doesn't build on Windows (but works on Mac and Linux).Unfortunately I cannot build the Windows packages myself (I work on a Mac), but the win-builder by Uwe Ligges still shows the same error ...

If declaring the encoding in DESCRIPTION doesn't solve the problem, I'd be 
happy to take a look at the package.

That's a great offer! I'd be very happy if you could take a look.
You can find the source at http://r-forge.r-project.org/projects/tripler/, a 
tar.gz is attached as well.

I got the same error as you.  It looks as though iconv has trouble with the way some 
characters are encoded in your file.  For example, on line 893, you have a u-umlaut 
encoded as EF BF BD.  According the the UTF-8 tables at 
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280, that encodes a question 
mark in a diamond, "REPLACEMENT CHARACTER".  There's no corresponding character 
in the standard Windows latin1 encoding, so conversion fails.  Firefox can display the 
funny question mark, but it doesn't display the u-umlaut as you intended, so I think this 
is an error in your file.

A way to find all such errors is as follows:  read the file as utf-8, then use 
the iconv() function in R to convert it to latin1.  When I do that, I get NA on 
lines 893 and 953, which are displayed to me as

[1] "\t# im latenten Fall: die Error variance erst am Ende berechnen (d.h., alle 
error componenten ï¿½ber alle Gruppen mitteln, die unter NUll auf Null setzen, dann 
addieren)"

[2] "\t\t# TODO: ï¿½berprï¿½fen!"We might be able to make the error message in the package installer more informative (e.g. giving the line number that failed). I'll look into that.


Duncan Murdoch


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Error on Windows build: "unable to re-encode"

Reply via email to