On Thu, Nov 28, 2013 at 4:48 PM, Simon Urbanek <simon.urba...@r-project.org> wrote: > On Nov 27, 2013, at 8:30 PM, Murray Stokely <mur...@stokely.org> wrote: > >> I think none of these examples describe a zlib compressed data block inside >> a binary file that the OP asked about, as all of your examples are e.g. >> prepending gzip or zip headers. >> >> Greg, is memDecompress what you are looking for? >> > > I think so. > > But this is interesting — I think the documentation of > memCompress/memDecompress is not quite correct and the parameters are > misleading. Although it does mention the gzip headers, it is incorrect since > zlib format is not a subset of the gzip format (albeit they use the same > compression method), so you cannot extract gzip content using zlib > decompression - you’ll get internal error -3 in memDecompress(2) if you try > it since it expects the zlib header which is different form the gzip one.
Interestingly. Just to make sure: are you 100% certain about this? >From the http://svn.r-project.org/R/trunk/src/main/connections.c: case 2: /* gzip */ { uLong inlen = LENGTH(from), outlen = 3*inlen; int res; Bytef *buf, *p = (Bytef *)RAW(from); /* we check for a file header */ if (p[0] == 0x1f && p[1] == 0x8b) { p += 2; inlen -= 2; } while(1) { buf = (Bytef *) R_alloc(outlen, sizeof(Bytef)); res = uncompress(buf, &outlen, p, inlen); if(res == Z_BUF_ERROR) { outlen *= 2; continue; } if(res == Z_OK) break; error("internal error %d in memDecompress(%d)", res, type); } ans = allocVector(RAWSXP, outlen); memcpy(RAW(ans), buf, outlen); break; } That code looks for the 0x1F 0x8B magic number, which is the one for gzip [http://www.gzip.org/zlib/rfc-gzip.html#header-trailer]. Or are you saying that that if statement is incorrect? (Disclaimer: I don't know much about gzip/zlib, but I happens to recognize that gzip magic number.) /Henrik > So “gzip” in type is a misnomer - it should say “zlib” since it can neither > read nor write the gzip format. Also the documentation should make it clear > since it’s pointless to try to use this on gzip contents. The better > alternative would be to support both gzip and zlib since R can deal with both > — the issue is that it will break code that used type=“gzip” explicitly to > mean “zlib” so I’m not sure there is a good way out. > > Cheers, > Simon > > >> >> On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel <e...@debian.org> wrote: >> >>> >>> On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote: >>> | >>> | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote: >>> | | I have a binary file type that includes a zlib compressed data block >>> (ie >>> | | not gzip). Is anyone aware of a way using base R or a CRAN package to >>> | | decompress this kind of data (from disk or memory). So far I have found >>> | | Rcompression::decompress on omegahat, but I would prefer to keep >>> | | dependencies on CRAN (or bioconductor). I am also trying to avoid >>> | | writing yet another C level interface to part of zlib. >>> | >>> | Unless I am missing something, this is in base R; see help(connections). >>> | >>> | Here is a quick demo: >>> | >>> | R> write.csv(trees, file="/tmp/trees.csv") # data we all have >>> | R> system("gzip -v /tmp/trees.csv") # as I am lazy here >>> | /tmp/trees.csv: 50.5% -- replaced with /tmp/trees.csv.gz >>> | R> read.csv(gzfile("/tmp/trees.csv.gz")) # works out of the box >>> >>> Oh, and in case you meant zip file containing a data file, that also works. >>> >>> First converting what I did last >>> >>> edd@max:/tmp$ gunzip trees.csv.gz >>> edd@max:/tmp$ zip trees.zip trees.csv >>> adding: trees.csv (deflated 50%) >>> edd@max:/tmp$ >>> >>> Then reading the csv from inside the zip file: >>> >>> R> read.csv(unz("/tmp/trees.zip", "trees.csv")) >>> X Girth Height Volume >>> 1 1 8.3 70 10.3 >>> 2 2 8.6 65 10.3 >>> 3 3 8.8 63 10.2 >>> 4 4 10.5 72 16.4 >>> 5 5 10.7 81 18.8 >>> 6 6 10.8 83 19.7 >>> 7 7 11.0 66 15.6 >>> 8 8 11.0 75 18.2 >>> 9 9 11.1 80 22.6 >>> 10 10 11.2 75 19.9 >>> 11 11 11.3 79 24.2 >>> 12 12 11.4 76 21.0 >>> 13 13 11.4 76 21.4 >>> 14 14 11.7 69 21.3 >>> 15 15 12.0 75 19.1 >>> 16 16 12.9 74 22.2 >>> 17 17 12.9 85 33.8 >>> 18 18 13.3 86 27.4 >>> 19 19 13.7 71 25.7 >>> 20 20 13.8 64 24.9 >>> 21 21 14.0 78 34.5 >>> 22 22 14.2 80 31.7 >>> 23 23 14.5 74 36.3 >>> 24 24 16.0 72 38.3 >>> 25 25 16.3 77 42.6 >>> 26 26 17.3 81 55.4 >>> 27 27 17.5 82 55.7 >>> 28 28 17.9 80 58.3 >>> 29 29 18.0 80 51.5 >>> 30 30 18.0 80 51.0 >>> 31 31 20.6 87 77.0 >>> R> >>> >>> Regards, Dirk >>> >>> -- >>> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel