Hi All,
Processing streaming JSON files with Spark features (Spark streaming and
Spark SQL), is very efficient and works like a charm.
Below is the code snippet to process JSON files.
windowDStream.foreachRDD(IncomingFiles => {
val IncomingFilesTable = sqlContext.jsonRDD(IncomingFiles);
IncomingFilesTable.registerAsTable("IncomingFilesTable");
val result = sqlContext.sql("select text from
IncomingFilesTable").collect;
sc.parallelize(result).saveAsTextFile("filepath");
}
But, I feel its difficult to use spark features efficiently with streaming
xml files (each compressed file would be 4 MB).
What is the best approach for processing compressed xml files?
Regards
Vijay