THanks. Sorry the last section was supposed be
streams.par.foreach { nameAndStream =>
nameAndStream._2.foreachRDD { rdd =>
df = sqlContext.jsonRDD(rdd)
df.insertInto(stream._1)
}
}
ssc.start()
On Fri, Jul 24, 2015 at 10:39 AM, Dean Wampler
wrote:
> You don't need the "par" (paral
You don't need the "par" (parallel) versions of the Scala collections,
actually, Recall that you are building a pipeline in the driver, but it
doesn't start running cluster tasks until ssc.start() is called, at which
point Spark will figure out the task parallelism. In fact, you might as
well do th
Hello,
So I have about 500 Spark Streams and I want to know the fastest and most
reliable way to process each of them. Right now, I am creating and process
them in a list:
val ssc = new StreamingContext(sc, Minutes(10))
val streams = paths.par.map { nameAndPath =>
(path._1, ssc.textFileStream