Hi all,
I am running some benchmarks on a simple Spark application which consists of
:
- textFileStream() to extract text records from HDFS files
- map() to parse records into JSON objects
- updateStateByKey() to calculate and store an in-memory state for each key.
The processing time per batch g
For anybody who's interested in this, here's a link to a PR that addresses
this feature :
https://github.com/apache/spark/pull/2077
(thanks to Todd Nist for sending it to me)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-the-DAG-of-a-Spark-app
Hi all,
We are looking for a tool that would let us visualize the DAG generated by a
Spark application as a simple graph.
This graph would represent the Spark Job, its stages and the tasks inside
the stages, with the dependencies between them (either narrow or shuffle
dependencies).
The Spark Re
Hi all,
I am trying to create RDDs from within /rdd.foreachPartition()/ so I can
save these RDDs to ElasticSearch on the fly :
stream.foreachRDD(rdd => {
rdd.foreachPartition {
iterator => {
val sc = rdd.context
iterator.foreach {
case (cid,
Hi all,
Spark Streaming occasionally (not always) hangs indefinitely on my program
right after the first batch has been processed.
As you can see in the following screenshots of the Spark Streaming
monitoring UI, it hangs on the map stages that correspond (I assume) to the
second batch that is bei
Here's what we've tried so far as a first example of a custom Mongo receiver
:
/class MongoStreamReceiver(host: String)
extends NetworkReceiver[String] {
protected lazy val blocksGenerator: BlockGenerator =
new BlockGenerator(StorageLevel.MEMORY_AND_DISK_SER_2)
protected def onStart()
Hello all,Spark newbie here.We are trying to use Spark Streaming
(unfortunately stuck on version 0.9.1 of Spark) to stream data out of
MongoDB.ReactiveMongo (http://reactivemongo.org/) is a scala driver that
enables you to stream a MongoDB capped collection (in our case, the
Oplog).Given that Mongo