So I'm building a program that does the following:

* Have data stored either in a file (CSV or gzipped CSV) or a Julia 
structure (Array{T,2}, or other structures that support getindex(A,i,j))
* Need to do a POST request over HTTPS with "Content-Type: text/csv", and 
ideally always as "Content-Encoding: gzip"

Challenges:
* Sometimes the data might be too big to fit in memory (from files), so I 
need some kind of transfer in chunks
* For regular CSV files or Julia structures, the data is obviously not 
gzipped already


I'm thinking of setting it up as follows:

* create an IOBuffer of min(data/file size, MAX_BUF_SIZE); (MAX_BUF_SIZE 
can be configured, probably default to 1GB or so)
* if it's a Julia structure, probably use writedlm to get into CSV format 
and then gzip the IOBUffer somehow?
* if it's a gzipped file, just readall(file) into IOBuffer
* if it's a delimited file, maybe gzip the whole thing then read in? or 
read it in in chunks and gzip the chunks?
* takebuf_string(IOBuffer) to put as the body to my HTTPS POST request


I'm mainly wondering about the soundness of my approach; in particular with 
regards to when/how to do the gzipping and overall, how to avoid copying 
data much as possible.

I think it would be nice to able to do `g = GZipIOBuffer()` and then 
`write(g, data)` and `takebuf_string(g)` to get the raw gzipped data to 
send, but it doesn't look like that's currently setup (or possible or 
insane) with GZip.jl.

Reply via email to