I ran the same streaming application (compiled individually for 1.5.1 and 1.6.0) that processes 5-second tweet batches.
I noticed two things: 1. 10% regression in 1.6.0 vs 1.5.1 Spark v1.6.0: 1,564 tweets/s Spark v1.5.1: 1,747 tweets/s 2. 1.6.0 streaming seems to have a memory leak. 1.6.0, processing time gradually increases and eventually exceeds 5 seconds so batches started to queue up. While in 1.5.1, no such slow down. See chart below to see the increasing scheduling delay in 1.6: I captured heap dumps in two version and did a comparison. I noticed the Byte base class is using 50X more space in 1.5.1. Here are some top classes in heap histogram and references. Heap Histogram All Classes (excluding platform) 1.6.0 Streaming 1.5.1 Streaming Class Instance Count Total Size Class Instance Count Total Size class [B 8453 3,227,649,599 class [B 5095 62,938,466 class [C 44682 4,255,502 class [C 130482 12,844,182 class java.lang.reflect.Method 9059 1,177,670 class java.lang.String 130171 1,562,052 References by Type References by Type class [B [0x640039e38] class [B [0x6c020bb08] Referrers by Type Referrers by Type Class Count Class Count java.nio.HeapByteBuffer 3239 sun.security.util.DerInputBuffer 1233 sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier 620 sun.security.util.ObjectIdentifier 620 [[B 397 [Ljava.lang.Object; 408 java.lang.reflect.Method 326 ---- The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0. The Java.nio.HeapByteBuffer referencing class did not show up in top in 1.5.1. I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get them here https://ibm.box.com/sparkstreaming-jstack160 https://ibm.box.com/sparkstreaming-jstack151 Jesse