Re: Using Spark to process JSON with gzip filed

Akhil Das Fri, 18 Dec 2015 01:37:50 -0800

Something like this? This one uses the ZLIB compression, you can replace
the decompression logic with GZip one in your case.


compressedStream.map(x => {
      val inflater = new Inflater()
      inflater.setInput(x.getPayload)
      val decompressedData = new Array[Byte](x.getPayload.size * 2)
      var count = inflater.inflate(decompressedData)
      var finalData = decompressedData.take(count)
      while (count > 0) {
        count = inflater.inflate(decompressedData)
        finalData = finalData ++ decompressedData.take(count)
      }
      new String(finalData)
    })




Thanks
Best Regards

On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon <eranwit...@gmail.com> wrote:

> Hi,
> I have a few JSON files in which one of the field is a binary filed - this
> field is the output of running GZIP of a JSON stream and compressing it to
> the binary field.
>
> Now I want to de-compress the field and get the outpur JSON.
> I was thinking of running map operation and passing a function to the map
> operation which will decompress each JSON file.
> the above function will find the right field in the outer JSON and then
> run GUNZIP on it.
>
> 1) is this a valid practice for spark map job?
> 2) any pointer on how to do that?
>
> Eran
>

Re: Using Spark to process JSON with gzip filed

Reply via email to