Re: Programmatically launch several hundred Spark Streams in parallel

2015-07-24 Thread Brandon White
THanks. Sorry the last section was supposed be streams.par.foreach { nameAndStream => nameAndStream._2.foreachRDD { rdd => df = sqlContext.jsonRDD(rdd) df.insertInto(stream._1) } } ssc.start() On Fri, Jul 24, 2015 at 10:39 AM, Dean Wampler wrote: > You don't need the "par" (paral

Re: Programmatically launch several hundred Spark Streams in parallel

2015-07-24 Thread Dean Wampler
You don't need the "par" (parallel) versions of the Scala collections, actually, Recall that you are building a pipeline in the driver, but it doesn't start running cluster tasks until ssc.start() is called, at which point Spark will figure out the task parallelism. In fact, you might as well do th

Programmatically launch several hundred Spark Streams in parallel

2015-07-24 Thread Brandon White
Hello, So I have about 500 Spark Streams and I want to know the fastest and most reliable way to process each of them. Right now, I am creating and process them in a list: val ssc = new StreamingContext(sc, Minutes(10)) val streams = paths.par.map { nameAndPath => (path._1, ssc.textFileStream