After working around the seq + closure = death problem, I still had a 
severe memory leak in my code, which took many hours to find.

Holding a reference to a string returned by clojure.string/split is somehow 
retaining a reference to the original string. In my case I needed to hold 
the first column of each row in a tsv file that was 4G in size. This 
resulted in holding the entire 4G in memory.

Here's a demo. Function "data" returns a seq of lines that are about 1000 
bytes. The first column, however, is just a few bytes, and 10k of them 
should easily fit in 10M of heap space. But, no:

$ LEIN_JVM_OPTS=-Xmx10M lein repl
REPL started; server listening on localhost port 34955
user=> (defn data [] (for [i (range)] (str "row " i "\t" 
(clojure.string/join "" (repeat 1000 "x")))))
#'user/data
user=> (def x (vec (take 10000 (map #(first (clojure.string/split % #"\t")) 
(data)))))
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:4)
user=> 

If I copy the returned string with the String constructor, it's fine:

$ LEIN_JVM_OPTS=-Xmx10M lein repl
REPL started; server listening on localhost port 20587
user=> (defn data [] (for [i (range)] (str "row " i "\t" 
(clojure.string/join "" (repeat 1000 "x")))))
#'user/data
user=> (def x (vec (take 10000 (map #(String. (first (clojure.string/split 
% #"\t"))) (data)))))
#'user/x
user=> (x 10)
"row 10"
user=> 

Two observations about this.

First, this behavior is very unexpected to me. I don't understand if it is 
a property of strings, collections, or string/split specifically that is 
causing it. Is there something in the docs that I overlooked, that would 
have warned of this?

Second, for tracking down problems like this, the available tooling is 
pathetic, to put it as politely as possible. jhat would not trace the the 
leaked strings. It consistently froze up when tracing them to GC roots. 
visualvm traced it back to CacheLRU, as in the screenshot I posted in the 
other thread, which was perfectly uninformative.

Without any usable tooling, the only workflow I found to narrow the problem 
was to iteratively stub out portions of code and re-run the program for 
several minutes to determine if the leak was active. Obviously, this is 
incredibly painful, slow, and tedious.

I'm hoping someone can tell me there's a better way.

Note that the leak did not appear in when exercising subsystems 
independently, because in that case no references were retained from one 
subsystem to the other. So, "try it in the repl" was not an effective 
strategy.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to