I ran the same streaming application (compiled individually for 1.5.1 and
1.6.0) that processes 5-second tweet batches.

I noticed two things:

1. 10% regression in 1.6.0 vs 1.5.1

     Spark v1.6.0: 1,564 tweets/s
     Spark v1.5.1: 1,747 tweets/s

2. 1.6.0 streaming seems to have a memory leak.

1.6.0, processing time gradually increases and eventually exceeds 5 seconds
so batches started to queue up.
While in 1.5.1, no such slow down.  See chart below to see the increasing
scheduling delay in 1.6:




I captured heap dumps in two version and did a comparison. I noticed the
Byte base class is using 50X more space in 1.5.1.

Here are some top classes in heap histogram and references.

Heap Histogram

All Classes (excluding platform)
        1.6.0 Streaming                 1.5.1 Streaming
Class   Instance Count  Total Size              Class   Instance Count
Total Size
class [B        8453    3,227,649,599           class [B        5095    
62,938,466
class [C        44682   4,255,502               class [C        130482  
12,844,182
class java.lang.reflect.Method  9059    1,177,670               class
java.lang.String        130171  1,562,052


References by Type                              References by Type

class [B [0x640039e38]                          class [B [0x6c020bb08]

Referrers by Type                               Referrers by Type

Class   Count                   Class   Count
java.nio.HeapByteBuffer 3239                    sun.security.util.DerInputBuffer
1233
sun.security.util.DerInputBuffer        1233
sun.security.util.ObjectIdentifier      620
sun.security.util.ObjectIdentifier      620                     [[B     397
[Ljava.lang.Object;     408                     java.lang.reflect.Method        
326


----

The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
The Java.nio.HeapByteBuffer referencing class did not show up in top in
1.5.1.

I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get
them here

https://ibm.box.com/sparkstreaming-jstack160
https://ibm.box.com/sparkstreaming-jstack151

Jesse





Reply via email to