Jeffrey Horner wrote: > Duncan Murdoch wrote: >> On 6/14/2007 10:49 AM, Jeffrey Horner wrote: >>> Hi, >>> >>> Here's a patch to the readChar manual page (R-trunk as of today) that >>> better clarifies readChar's return value. >> Your update is not right. For example: >> >> x <- as.raw(32:96) >> readChar(x, nchars=rep(2,100)) >> >> This returns a character vector of length 100, of which the first 32 >> elements have 2 chars, the next one has 1, and the rest are "". >> >> So the length of nchars really does affect the length of the value. >> >> Now, I haven't looked at the code, but it's possible we could delete the >> "(which might be less than \code{length(nchars)})" remark, and if not, >> it would be useful to explain the situations in which the return value >> could be shorter than the nchars vector. > > Well, this is rather a misunderstanding on my part; I completely forgot > about vectorization. The manual page makes sense to me now. > > But the situation about the return value possibly being less than > length(nchars) isn't clear. Consider a 101 byte text file in a > non-multibyte character locale: > > f <- tempfile() > writeChar(paste(rep(seq(0,9),10),collapse=''),con=f) > > and calling readChar() to read 100 bytes with length(nchar)=10: > > > readChar(f,nchar=rep(10,10)) > [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" > [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" > > and readChar() reading the entire file with length(nchar)=11: > > > readChar(f,nchar=rep(10,11)) > [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" > [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" > [11] "\0" > > but the following two outputs are confusing. readchar() with > length(nchar)>=12 returns a character vector length 12: > > > readChar(f,nchar=rep(10,12)) > [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" > [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" > [11] "\0" "" > > readChar(f,nchar=rep(10,13)) > [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" > [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789" > [11] "\0" "" > > It seems that the first time EOF is encountered on a read operation, an > empty string is returned, but on subsequent reads nothing is returned. > Is this intended behavior?
I believe this is an off-by-1 bug in do_readchar(). The following fix to R-trunk v41946 causes the above readchar() calls to cap the returned vector length at 11: Index: src/main/connections.c =================================================================== --- src/main/connections.c (revision 41946) +++ src/main/connections.c (working copy) @@ -3286,7 +3286,7 @@ if(!con->open(con)) error(_("cannot open the connection")); } PROTECT(ans = allocVector(STRSXP, n)); - for(i = 0, m = i+1; i < n; i++) { + for(i = 0, m = 0; i < n; i++) { len = INTEGER(nchars)[i]; if(len == NA_INTEGER || len < 0) error(_("invalid value for '%s'"), "nchar"); Jeff > > Jeff > >> Duncan Murdoch >> >> >> It could use some work as I'd >>> also like to add some text about using nchar() to find the length of >>> the string that readchar() returns, but I'm unsure which of >>> type="bytes" or type="chars" to mention. Is it type="chars"? >>> >>> Index: src/library/base/man/readChar.Rd >>> =================================================================== >>> --- src/library/base/man/readChar.Rd (revision 41943) >>> +++ src/library/base/man/readChar.Rd (working copy) >>> @@ -57,8 +57,8 @@ >>> } >>> >>> \value{ >>> - For \code{readChar}, a character vector of length the number of >>> - items read (which might be less than \code{length(nchars)}). >>> + For \code{readChar}, a character vector of length 1 with the number >>> + of characters less than or equal to nchars. >>> >>> For \code{writeChar}, a raw vector (if \code{con} is a raw vector) or >>> invisibly \code{NULL}. >>> >>> >>> Jeff > > -- http://biostat.mc.vanderbilt.edu/JeffreyHorner ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel