Re: [Rd] inflate zlib compressed data using base R or CRAN package?

Henrik Bengtsson Fri, 29 Nov 2013 01:38:58 -0800

On Thu, Nov 28, 2013 at 4:48 PM, Simon Urbanek
<simon.urba...@r-project.org> wrote:
> On Nov 27, 2013, at 8:30 PM, Murray Stokely <mur...@stokely.org> wrote:
>
>> I think none of these examples describe a zlib compressed data block inside 
>> a binary file that the OP asked about, as all of your examples are e.g. 
>> prepending gzip or zip headers.
>>
>> Greg, is memDecompress what you are looking for?
>>
>
> I think so.
>
> But this is interesting — I think the documentation of 
> memCompress/memDecompress is not quite correct and the parameters are 
> misleading. Although it does mention the gzip headers, it is incorrect since 
> zlib format is not a subset of the gzip format (albeit they use the same 
> compression method), so you cannot extract gzip content using zlib 
> decompression - you’ll get  internal error -3 in memDecompress(2) if you try 
> it since it expects the zlib header which is different form the gzip one.


Interestingly.  Just to make sure: are you 100% certain about this?
>From the http://svn.r-project.org/R/trunk/src/main/connections.c:

    case 2: /* gzip */
    {
        uLong inlen = LENGTH(from), outlen = 3*inlen;
        int res;
        Bytef *buf, *p = (Bytef *)RAW(from);
        /* we check for a file header */
        if (p[0] == 0x1f && p[1] == 0x8b) { p += 2; inlen -= 2; }
        while(1) {
            buf = (Bytef *) R_alloc(outlen, sizeof(Bytef));
            res = uncompress(buf, &outlen, p, inlen);
            if(res == Z_BUF_ERROR) { outlen *= 2; continue; }
            if(res == Z_OK) break;
            error("internal error %d in memDecompress(%d)", res, type);
        }
        ans = allocVector(RAWSXP, outlen);
        memcpy(RAW(ans), buf, outlen);
        break;
    }

That code looks for the 0x1F 0x8B magic number, which is the one for
gzip [http://www.gzip.org/zlib/rfc-gzip.html#header-trailer].  Or are
you saying that that if statement is incorrect?  (Disclaimer: I don't
know much about gzip/zlib, but I happens to recognize that gzip magic
number.)

/Henrik

> So “gzip” in type is a misnomer - it should say “zlib” since it can neither 
> read nor write the gzip format. Also the documentation should make it clear 
> since it’s pointless to try to use this on gzip contents. The better 
> alternative would be to support both gzip and zlib since R can deal with both 
> — the issue is that it will break code that used type=“gzip” explicitly to 
> mean “zlib” so I’m not sure there is a good way out.
>
> Cheers,
> Simon
>
>
>>
>> On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel <e...@debian.org> wrote:
>>
>>>
>>> On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote:
>>> |
>>> | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote:
>>> | | I have a binary file type that includes a zlib compressed data block
>>> (ie
>>> | | not gzip). Is anyone aware of a way using base R or a CRAN package to
>>> | | decompress this kind of data (from disk or memory). So far I have found
>>> | | Rcompression::decompress on omegahat, but I would prefer to keep
>>> | | dependencies on CRAN (or bioconductor). I am also trying to avoid
>>> | | writing yet another C level interface to part of zlib.
>>> |
>>> | Unless I am missing something, this is in base R; see help(connections).
>>> |
>>> | Here is a quick demo:
>>> |
>>> | R> write.csv(trees, file="/tmp/trees.csv")    # data we all have
>>> | R> system("gzip -v /tmp/trees.csv")           # as I am lazy here
>>> | /tmp/trees.csv:        50.5% -- replaced with /tmp/trees.csv.gz
>>> | R> read.csv(gzfile("/tmp/trees.csv.gz"))      # works out of the box
>>>
>>> Oh, and in case you meant zip file containing a data file, that also works.
>>>
>>> First converting what I did last
>>>
>>> edd@max:/tmp$ gunzip trees.csv.gz
>>> edd@max:/tmp$ zip trees.zip trees.csv
>>>  adding: trees.csv (deflated 50%)
>>> edd@max:/tmp$
>>>
>>> Then reading the csv from inside the zip file:
>>>
>>> R> read.csv(unz("/tmp/trees.zip", "trees.csv"))
>>>    X Girth Height Volume
>>> 1   1   8.3     70   10.3
>>> 2   2   8.6     65   10.3
>>> 3   3   8.8     63   10.2
>>> 4   4  10.5     72   16.4
>>> 5   5  10.7     81   18.8
>>> 6   6  10.8     83   19.7
>>> 7   7  11.0     66   15.6
>>> 8   8  11.0     75   18.2
>>> 9   9  11.1     80   22.6
>>> 10 10  11.2     75   19.9
>>> 11 11  11.3     79   24.2
>>> 12 12  11.4     76   21.0
>>> 13 13  11.4     76   21.4
>>> 14 14  11.7     69   21.3
>>> 15 15  12.0     75   19.1
>>> 16 16  12.9     74   22.2
>>> 17 17  12.9     85   33.8
>>> 18 18  13.3     86   27.4
>>> 19 19  13.7     71   25.7
>>> 20 20  13.8     64   24.9
>>> 21 21  14.0     78   34.5
>>> 22 22  14.2     80   31.7
>>> 23 23  14.5     74   36.3
>>> 24 24  16.0     72   38.3
>>> 25 25  16.3     77   42.6
>>> 26 26  17.3     81   55.4
>>> 27 27  17.5     82   55.7
>>> 28 28  17.9     80   58.3
>>> 29 29  18.0     80   51.5
>>> 30 30  18.0     80   51.0
>>> 31 31  20.6     87   77.0
>>> R>
>>>
>>> Regards, Dirk
>>>
>>> --
>>> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com
>>>
>>> ______________________________________________
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] inflate zlib compressed data using base R or CRAN package?

Reply via email to