val ssc = new StreamingContext(sc, Minutes(10))
//500 textFile streams watching S3 directories
val streams = streamPaths.par.map { path =>
ssc.textFileStream(path)
}
streams.par.foreach { stream =>
stream.foreachRDD { rdd =>
//do something
}
}
ssc.start()
Would something like this scale? What would be the limiting factor to
performance? What is the best way to parallelize this? Any other ideas on
design?
