On Mon, 05 Aug 2013, Qiang Wang <uns...@gmail.com> writes: >> On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann >> <e...@enricoschumann.net>wrote: >> >>> On Fri, 02 Aug 2013, Qiang Wang <uns...@gmail.com> writes: >>> >>> > Hi, >>> > >>> > I'm struggling with encode/decode strings in R. Don't know why the second >>> > example below would fail. Thanks in advance for your help. >>> > succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x, "character") >>> > fail: s <- "safs" x <- base64encode(s) y <- base64decode(x, "character") >>> > >>> >>> And the first example works for you? >>> >>> require("base64enc") >>> s <- "saf" >>> x <- base64encode(s) >>> >>> ## Error in file(what, "rb") : cannot open the connection >>> ## In addition: Warning message: >>> ## In file(what, "rb") : cannot open file 'saf': No such file or directory >>> >>> ?base64encode says that its first argument is >>> >>> "data to be encoded/decoded. For ‘base64encode’ it can be a raw >>> vector, text connection or file name. For ‘base64decode’ it can be >>> a string or a binary connection." >>> >>> Try this: >>> >>> rawToChar(base64decode(base64encode(charToRaw("saf")))) >>> >>> ## [1] "saf" >>> >>> -- >>> Enrico Schumann >>> Lucerne, Switzerland >> http://enricoschumann.net >> > > Thanks for your reply! > > Sorry I did not clarify that I was using base64encode and base64decode > functions provide from "caTools" package. It seems that if I convert the > string to the raw type first, it still solves my problem. > > My original problem actually is that I have a string: > secret <- > '5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg==' > > It was claimed to be encoded in Base64. So I tried to decode it: > > require("base64enc") > rawToChar(base64decode(secret)) > > Then, I got > "\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ߟl\xc9\xf8V\xcdqk6" > > But what I suppose to get is: > '\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6' > > Most part of the result is correct except several characters near the end. > I don't know where the problem is. >
See the help page of 'rawToChar': the function transforms raw bytes into characters. But, depending on your locale, one character may be more than one byte. On my computer, with a UTF-8 locale (see my '?sessionInfo' below), rawToChar(base64decode(secret), TRUE) gives me ## [1] "\xe4" "\xac" "." "\x83" "\xe0" "r" "\xae" ## [8] "\xaf" "\xa2" "\x95" "B" "\xcc" "\xcf" "r" ## [15] "\001" "\017" "\x9b" "j" "\xb8" "\xdb" "y" ## [22] "\t" "\xc7" "X" "\x8b" "u" "\xcf" "s" ## [29] "\xc8" "\xe7" "+" "\v" "W" "\xbc" "\x88" ## [36] "\a" "\xc3" "\xfb" "\xdc" "H" "e" "5" ## [43] "T" "\"" "(" "\xe1" "\xbf" "\xce" "}" ## [50] "\xc4" "C" "\x9b" "\xbb" "\x87" "\xdf" "\x9f" ## [57] "l" "\xc9" "\xf8" "V" "\xcd" "q" "k" ## [64] "6" That is, every *single* byte is converted into character. For example: rawToChar(base64decode(secret), TRUE)[55:56] gives ## [1] "\xdf" "\x9f" which probably is what you expected. But if I paste those two characters together, paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "") they will be shown like so: ## [1] "ߟ" because this is how this byte pattern will be interpreted in UTF-8. Abbreviated 'sessionInfo': R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.