Something like this? This one uses the ZLIB compression, you can replace
the decompression logic with GZip one in your case.
compressedStream.map(x => {
val inflater = new Inflater()
inflater.setInput(x.getPayload)
val decompressedData = new Array[Byte](x.getPayload.size * 2)
var count = inflater.inflate(decompressedData)
var finalData = decompressedData.take(count)
while (count > 0) {
count = inflater.inflate(decompressedData)
finalData = finalData ++ decompressedData.take(count)
}
new String(finalData)
})
Thanks
Best Regards
On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon <[email protected]> wrote:
> Hi,
> I have a few JSON files in which one of the field is a binary filed - this
> field is the output of running GZIP of a JSON stream and compressing it to
> the binary field.
>
> Now I want to de-compress the field and get the outpur JSON.
> I was thinking of running map operation and passing a function to the map
> operation which will decompress each JSON file.
> the above function will find the right field in the outer JSON and then
> run GUNZIP on it.
>
> 1) is this a valid practice for spark map job?
> 2) any pointer on how to do that?
>
> Eran
>