Something like this? This one uses the ZLIB compression, you can replace the decompression logic with GZip one in your case.
compressedStream.map(x => { val inflater = new Inflater() inflater.setInput(x.getPayload) val decompressedData = new Array[Byte](x.getPayload.size * 2) var count = inflater.inflate(decompressedData) var finalData = decompressedData.take(count) while (count > 0) { count = inflater.inflate(decompressedData) finalData = finalData ++ decompressedData.take(count) } new String(finalData) }) Thanks Best Regards On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon <eranwit...@gmail.com> wrote: > Hi, > I have a few JSON files in which one of the field is a binary filed - this > field is the output of running GZIP of a JSON stream and compressing it to > the binary field. > > Now I want to de-compress the field and get the outpur JSON. > I was thinking of running map operation and passing a function to the map > operation which will decompress each JSON file. > the above function will find the right field in the outer JSON and then > run GUNZIP on it. > > 1) is this a valid practice for spark map job? > 2) any pointer on how to do that? > > Eran >