The fileStream is not designed to work with continuously updating file, as
the one of the main design goals of Spark is immutability (to guarantee
fault-tolerance by recomputation), and files that are appending (mutating)
defeats that. It rather designed to pickup new files added atomically
(using
Hi Akil,
It didnt work. Here is the code...
package com.paypal;
import org.apache.spark.SparkConf;
import org.apache.spark.storage.StorageLevel;
import org.apache.spark.streaming.api.java.JavaPairInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.sp
Try this out:
JavaStreamingContext sc = new
JavaStreamingContext(...);JavaDStream lines =
ctx.fileStream("whatever");JavaDStream words = lines.flatMap(
new FlatMapFunction() {
public Iterable call(String s) {
return Arrays.asList(s.split(" "));
}
});
JavaPairDStream ones = words