Hello,

I have written a GO program which downloads a 5G compressed CSV from Amazon 
S3, decompresses it and uploads the decompressed CSV (20G) to Amazon S3.

Amazon S3 provides a default concurrent uploader/downloader and I am using 
a multithreaded approach to download files in parallel, decompress and 
upload. The program seems to work fine, however I believe the program could 
be optimized further. And not all the cores are used though I have 
parallelized for the no. of CPUs available . The CPU usage is only around 
30-40% . I see a IO wait around 30/40% percent. 

The download happens faster, The decompression takes 5-6 minutes and the 
upload happens in parallel but takes almost an hour to upload  a set of 8 
files. 

For decompression, I use 
reader, err := gzip.NewReader(gzipfile)
writer, err := os.Create(outputFile)
err = io.Copy(writer, reader)

I use a 16CPU, 122 GB RAM, 500 GB SSD instance

Are there any other methodologies where I can optimize compresssion part 
and upload part

I am pretty new to Golang.  Any guidance is very much appreciated.

Regards
Mukund






-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to