Re: Spark Streaming/Flume display all events

2015-03-30 Thread Nathan Marin
Hi, DStream.print() only prints the first 10 elements contained in the Stream. You can call DStream.print(x) to print the first x elements but if you don’t know the exact count you can call DStream.foreachRDD and apply a function to display the content of every RDD. For example: stream.foreach

Re: [Spark Streaming] Disk not being cleaned up during runtime after RDD being processed

2015-03-30 Thread Nathan Marin
ill=false (It might end up in OOM) >> - Enable log rotation: >> >> sparkConf.set("spark.executor.logs.rolling.strategy", "size") >> .set("spark.executor.logs.rolling.size.maxBytes", "1024") >> .set("spark.executor.logs.rolling.max

[Spark Streaming] Disk not being cleaned up during runtime after RDD being processed

2015-03-28 Thread Nathan Marin
Hi, I’ve been trying to use Spark Streaming for my real-time analysis application using the Kafka Stream API on a cluster (using the yarn version) of 6 executors with 4 dedicated cores and 8192mb of dedicated RAM. The thing is, my application should run 24/7 but the disk usage is leaking. This le

[Spark Streaming] Disk not being cleaned up during runtime after RDD being processed

2015-03-26 Thread Nathan Marin
Hi, I’ve been trying to use Spark Streaming for my real-time analysis application using the Kafka Stream API on a cluster (using the yarn version) of 6 executors with 4 dedicated cores and 8192mb of dedicated RAM. The thing is, my application should run 24/7 but the disk usage is leaking. This le