SparkStreaming Low Performance

Akhil Das Sat, 14 Feb 2015 05:27:08 -0800

I'm getting a low performance while parsing json data. My cluster setup is
1.2.0 version of spark with 10 Nodes each having 15Gb of memory and 4 cores.


I tried both scala.util.parsing.json.JSON and and fasterxml's Jackson
parser.

This is what i basically do:

*//Approach 1:*
val jsonStream = myDStream.map(x=> {
      val mapper = new ObjectMapper() with ScalaObjectMapper
      mapper.registerModule(DefaultScalaModule)
      mapper.readValue[Map[String,Any]](x)
    })

jsonStream.count().print()


*//Approach 2:*
val jsonStream2 =
myDStream.map(JSON.parseFull(_).get.asInstanceOf[scala.collection.immutable.Map[String,
Any]])

jsonStream2.count().print()



It takes around 15-20 Seconds to process/parse 35k json documents (contains
nested documents and arrays) which i put in the stream.

Is there any better approach/parser to process it faster? i also tried it
with mapPartitions but it did not make any difference.




Thanks
Best Regards

SparkStreaming Low Performance

Reply via email to