Hi, I had a Java parser using GSON and packaged it as java lib (e.g. messageparserLib.jar). I use this lib in the Spark streaming and parse the coming json messages. This is very slow and lots of time lag in parsing/inserting messages to Cassandra. What is the fast way to parse JSON messages in Spark on-the-fly? My Json message is complex and I want to extract over 30 fields and wrap them in a case class, then store it in Cassandra with Structure format. Some candidate solutions are appearing to my mind: (1) Use Spark SQL to register a temp table and then select the fields what I want to wrap in the case class. (2) Use native standard lib of Scala, like "scala.util.parsing.json.JSON.parseFull" to browse, parse and extract the fields to map the case class. (3) Use third-party libraries, play-json, lift-json to browse, parse then extract the fields to map the case class. The json messages are coming from Kafka consumer. It's over 1,500 messages per second. So the message processing (parser and write to Cassandra) is also need to be completed at the same time (1,500/second).
Thanks in advance. Jerry I appreciate it if you can give me any helps and advice. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fast-way-to-parse-JSON-in-Spark-tp26306.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org