Re: streaming sequence files?

Tathagata Das Sat, 26 Jul 2014 20:15:55 -0700

Which deployment environment are you running the streaming programs?
Standalone? In that case you have to specify what is the max cores for
each application, other all the cluster resources may get consumed by
the application.
http://spark.apache.org/docs/latest/spark-standalone.html


TD

On Thu, Jul 24, 2014 at 4:57 PM, Barnaby <bfa...@outlook.com> wrote:
> I have the streaming program writing sequence files. I can find one of the
> files and load it in the shell using:
>
> scala> val rdd = sc.sequenceFile[String,
> Int]("tachyon://localhost:19998/files/WordCounts/20140724-213930")
> 14/07/24 21:47:50 INFO storage.MemoryStore: ensureFreeSpace(32856) called
> with curMem=0, maxMem=309225062
> 14/07/24 21:47:50 INFO storage.MemoryStore: Block broadcast_0 stored as
> values to memory (estimated size 32.1 KB, free 294.9 MB)
> rdd: org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[1] at sequenceFile
> at <console>:12
>
> So I got some type information, seems good.
>
> It took a while to research but I got the following streaming code to
> compile and run:
>
> val wordCounts = ssc.fileStream[String, Int, SequenceFileInputFormat[String,
> Int]](args(0))
>
> It works now and I offer this for reference to anybody else who may be
> curious about saving sequence files and then streaming them back in.
>
> Question:
> When running both streaming programs at the same time using spark-submit I
> noticed that only one app would really run. To get the one app to continue I
> had to stop the other app. Is there a way to get these running
> simultaneously?
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/streaming-sequence-files-tp10557p10620.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: streaming sequence files?

Reply via email to