The file.show() issue is now in the bug tracker. I used a slightly different example to demonstrate the problem.
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16738 - Mikko On 29.02.2016 20:30, Duncan Murdoch wrote: > I have just committed your first patch (the strlen() replacement) to > R-devel, and will soon put it in R-patched as well. I wont have time to > look at this again before the 3.2.4 release, so your file.show() patch > isn't going to make it unless someone else gets to it. > > There's still a faint chance that I'll do more in R-devel before 3.3.0, > but I think it's best if there were bug reports about both of these > problems so they don't get forgotten. Since the first one is mainly a > Windows problem, I'll write that one up; I'd appreciate it if you could > write up the file.show() issue, after checking against R-devel rev 70247 > or higher. > > Duncan Murdoch > > On 25/02/2016 5:54 AM, Mikko Korpela wrote: >> On 25.02.2016 11:31, Mikko Korpela wrote: >>> On 23.02.2016 14:06, Mikko Korpela wrote: >>>> On 23.02.2016 11:37, Martin Maechler wrote: >>>>>>>>>> nospam@altfeld-im de <nos...@altfeld-im.de> >>>>>>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes: >>>>> >>>>> > Dear R developers >>>>> > I think I have found a bug that can be reproduced with two >>>>> lines of code >>>>> > and I am very thankful to get your first assessment or >>>>> feed-back on my >>>>> > report. >>>>> >>>>> > If this is the wrong mailing list or I did something wrong >>>>> > (e. g. semi "anonymous" email address to protect my privacy >>>>> and defend >>>>> > unwanted spam) please let me know since I am new here. >>>>> >>>>> > Thank you very much :-) >>>>> >>>>> > J. Altfeld >>>>> >>>>> Dear J., >>>>> (yes, a bit less anonymity would be very welcomed here!), >>>>> >>>>> You are right, this is a bug, at least in the documentation, but >>>>> probably "all real", indeed, >>>>> >>>>> but read on. >>>>> >>>>> > On Tue, 2016-02-16 at 18:25 +0100, nos...@altfeld-im.de wrote: >>>>> >> >>>>> >> >>>>> >> If I execute the code from the "?write.table" examples section >>>>> >> >>>>> >> x <- data.frame(a = I("a \" quote"), b = pi) >>>>> >> # (ommited code) >>>>> >> write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE") >>>>> >> >>>>> >> the resulting CSV file has a size of 6 bytes which is too >>>>> short >>>>> >> (truncated): >>>>> >> >>>>> >> """,3 >>>>> >>>>> reproducibly, yes. >>>>> If you look at what write.csv does >>>>> and then simplify, you can get a similar wrong result by >>>>> >>>>> write.table(x, file = "foo.tab", fileEncoding = "UTF-16LE") >>>>> >>>>> which results in a file with one line >>>>> >>>>> """ 3 >>>>> >>>>> and if you debug write.table() you see that its building blocks >>>>> here are >>>>> file <- file(........, encoding = fileEncoding) >>>>> >>>>> a writeLines(*, file=file) for the column headers, >>>>> >>>>> and then "deeper down" C code which I did not investigate. >>>> >>>> I took a look at connections.c. There is a call to strlen() that gets >>>> confused by null characters. I think the obvious fix is to avoid the >>>> call to strlen() as the size is already known: >>>> >>>> Index: src/main/connections.c >>>> =================================================================== >>>> --- src/main/connections.c (revision 70213) >>>> +++ src/main/connections.c (working copy) >>>> @@ -369,7 +369,7 @@ >>>> /* is this safe? */ >>>> warning(_("invalid char string in output conversion")); >>>> *ob = '\0'; >>>> - con->write(outbuf, 1, strlen(outbuf), con); >>>> + con->write(outbuf, 1, ob - outbuf, con); >>>> } while(again && inb > 0); /* it seems some iconv signal -1 on >>>> zero-length input */ >>>> } else >>>> >>>> >>>>> >>>>> But just looking a bit at such a file() object with writeLines() >>>>> seems slightly revealing, as e.g., 'eol' does not seem to >>>>> "work" for this encoding: >>>>> >>>>> > fn <- tempfile("ffoo"); ff <- file(fn, open="w", encoding = >>>>> "UTF-16LE") >>>>> > writeLines(LETTERS[3:1], ff); writeLines("|", ff); >>>>> writeLines(">a", ff) >>>>> > close(ff) >>>>> > file.show(fn) >>>>> CBA|> >>>>> > file.size(fn) >>>>> [1] 5 >>>>> > >>>> >>>> With the patch applied: >>>> >>>> > readLines(fn, encoding="UTF-16LE", skipNul=TRUE) >>>> [1] "C" "B" "A" "|" ">a" >>>> > file.size(fn) >>>> [1] 22 >>> I just realized that I was misusing the encoding argument of >>> readLines(). The code above works by accident, but the following would >>> be more appropriate: >>> >>> > ff <- file(fn, open="r", encoding="UTF-16LE") >>> > readLines(ff) >>> [1] "C" "B" "A" "|" ">a" >>> > close(ff) >>> >>> Testing on Linux, with the patch applied. (As noted by Duncan Murdoch, >>> the patch is incomplete on Windows.) >> Before inspecting the file with readLines() I tried file.show() but it >> did not work as expected. On Linux using a UTF-8 locale, the result of >> trying to show the truly UTF-16LE encoded file with >> >> > file.show(fn, encoding="UTF-16LE") >> >> was a pager showing "<43>" (quotes not included) followed by several >> empty lines. >> >> With the following patch, the command works correctly (in this case, on >> this platform, not tested comprehensively). The idea is to read the >> input file "raw" in order to avoid problems with null characters. The >> input then needs to be split into lines after iconv(), or it could be >> written to the output file with cat() if the style of line termination >> characters does not matter. The 'perl = TRUE' is for assumed performance >> advantage only. It can be removed, or one might want to test if there is >> a significant difference one way or the other. >> >> - Mikko >> >> Index: src/library/base/R/files.R >> =================================================================== >> --- src/library/base/R/files.R (revision 70217) >> +++ src/library/base/R/files.R (working copy) >> @@ -50,10 +50,13 @@ >> for(i in seq_along(files)) { >> f <- files[i] >> tf <- tempfile() >> - tmp <- readLines(f, warn = FALSE) >> + tmp <- list(readBin(f, "raw", file.size(f))) >> tmp2 <- try(iconv(tmp, encoding, "", "byte")) >> if(inherits(tmp2, "try-error")) file.copy(f, tf) >> - else writeLines(tmp2, tf) >> + else { >> + tmp2 <- strsplit(tmp2, "\r\n?|\n", perl = TRUE)[[1L]] >> + writeLines(tmp2, tf) >> + } >> files[i] <- tf >> if(delete.file) unlink(f) >> } ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel