subject:"Programmatically launch several hundred Spark Streams in parallel"

Re: Programmatically launch several hundred Spark Streams in parallel

2015-07-24 Thread Brandon White

THanks. Sorry the last section was supposed be streams.par.foreach { nameAndStream => nameAndStream._2.foreachRDD { rdd => df = sqlContext.jsonRDD(rdd) df.insertInto(stream._1) } } ssc.start() On Fri, Jul 24, 2015 at 10:39 AM, Dean Wampler wrote: > You don't need the "par" (paral

Re: Programmatically launch several hundred Spark Streams in parallel

2015-07-24 Thread Dean Wampler

You don't need the "par" (parallel) versions of the Scala collections, actually, Recall that you are building a pipeline in the driver, but it doesn't start running cluster tasks until ssc.start() is called, at which point Spark will figure out the task parallelism. In fact, you might as well do th

Programmatically launch several hundred Spark Streams in parallel

2015-07-24 Thread Brandon White

Hello, So I have about 500 Spark Streams and I want to know the fastest and most reliable way to process each of them. Right now, I am creating and process them in a list: val ssc = new StreamingContext(sc, Minutes(10)) val streams = paths.par.map { nameAndPath => (path._1, ssc.textFileStream

Re: Programmatically launch several hundred Spark Streams in parallel

Re: Programmatically launch several hundred Spark Streams in parallel

Programmatically launch several hundred Spark Streams in parallel

3 matches

Site Navigation

Mail list logo

Footer information