Jeffrey Horner wrote: > Jeffrey Horner wrote: >> Duncan Murdoch wrote: >>> On 6/14/2007 10:49 AM, Jeffrey Horner wrote: >>>> Hi, >>>> >>>> Here's a patch to the readChar manual page (R-trunk as of today) >>>> that better clarifies readChar's return value. >>> Your update is not right. For example: >>> >>> x <- as.raw(32:96) >>> readChar(x, nchars=rep(2,100)) >>> >>> This returns a character vector of length 100, of which the first 32 >>> elements have 2 chars, the next one has 1, and the rest are "". >>> >>> So the length of nchars really does affect the length of the value. >>> >>> Now, I haven't looked at the code, but it's possible we could delete >>> the "(which might be less than \code{length(nchars)})" remark, and if >>> not, it would be useful to explain the situations in which the return >>> value could be shorter than the nchars vector. >> >> Well, this is rather a misunderstanding on my part; I completely >> forgot about vectorization. The manual page makes sense to me now. >> >> But the situation about the return value possibly being less than >> length(nchars) isn't clear. Consider a 101 byte text file in a >> non-multibyte character locale: >> >> f <- tempfile() >> writeChar(paste(rep(seq(0,9),10),collapse=''),con=f) >> >> and calling readChar() to read 100 bytes with length(nchar)=10: >> >> > readChar(f,nchar=rep(10,10)) >> [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" >> [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" >> >> and readChar() reading the entire file with length(nchar)=11: >> >> > readChar(f,nchar=rep(10,11)) >> [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" >> [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" >> [11] "\0" >> >> but the following two outputs are confusing. readchar() with >> length(nchar)>=12 returns a character vector length 12: >> >> > readChar(f,nchar=rep(10,12)) >> [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" >> [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" >> [11] "\0" "" >> > readChar(f,nchar=rep(10,13)) >> [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" >> [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" >> [11] "\0" "" >> >> It seems that the first time EOF is encountered on a read operation, >> an empty string is returned, but on subsequent reads nothing is >> returned. Is this intended behavior? > > I believe this is an off-by-1 bug in do_readchar(). The following fix to > R-trunk v41946 causes the above readchar() calls to cap the returned > vector length at 11: > > Index: src/main/connections.c > =================================================================== > --- src/main/connections.c (revision 41946) > +++ src/main/connections.c (working copy) > @@ -3286,7 +3286,7 @@ > if(!con->open(con)) error(_("cannot open the connection")); > } > PROTECT(ans = allocVector(STRSXP, n)); > - for(i = 0, m = i+1; i < n; i++) { > + for(i = 0, m = 0; i < n; i++) { > len = INTEGER(nchars)[i]; > if(len == NA_INTEGER || len < 0) > error(_("invalid value for '%s'"), "nchar"); >
This does look like an off-by-1 bug as do_readbin's for loops are coded just like the above patch. Jeff -- http://biostat.mc.vanderbilt.edu/JeffreyHorner ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel