I have just added some discussion of this on ClojureDocs.org for the
function clojure.core/subs, and references to that discussion for several
other Clojure functions that I am pretty sure are affected, e.g. re-find,
re-seq, re-matches, clojure.string/split, replace, replace-first

    http://clojuredocs.org/clojure_core/clojure.core/subs

Andy


On Thu, Sep 12, 2013 at 9:59 AM, Brian Craft <craft.br...@gmail.com> wrote:

> Ouch. Thanks for the explanation.
>
>
> On Thursday, September 12, 2013 9:46:47 AM UTC-7, Andy Fingerhut wrote:
>
>> Clojure's substr, and many other functions that return substrings of a
>> larger one (e.g. re-find, re-seq, etc) are based on Java's
>> java.lang.String/substring() method behavior.
>>
>> Before Java version 7u6 or thereabouts, this was implemented in O(1) time
>> by creating a String object that referred to an offset and length within
>> the original String object, thus retaining a reference to it as long as the
>> substrings were referenced.
>>
>> Around Java version 7u6, Java's substring() method behavior changed to
>> copy the desired substring into a new String object, so no references are
>> kept to the original.
>>
>>     http://www.javaadvent.com/**2012/12/changes-to-**
>> stringsubstring-in-java-7.html<http://www.javaadvent.com/2012/12/changes-to-stringsubstring-in-java-7.html>
>>
>> Fun, eh?  And no, this was not obvious to me until I ran across the issue
>> some time back.  Mark Engelberg encountered this issue while doing
>> performance tuning on his Instaparse library:
>> https://github.com/Engelberg/**instaparse<https://github.com/Engelberg/instaparse>
>>
>> If you know you are deploying on a Java version that is earlier than 7u6,
>> you can using the String constructor, e.g. (String. s) from Clojure to
>> force the copying of the string.  You could even get fancier and write code
>> that depends upon the Java version you are running upon, if that interests
>> you.
>>
>> Andy
>>
>>
>> On Thu, Sep 12, 2013 at 9:08 AM, Brian Craft <craft...@gmail.com> wrote:
>>
>>> After working around the seq + closure = death problem, I still had a
>>> severe memory leak in my code, which took many hours to find.
>>>
>>> Holding a reference to a string returned by clojure.string/split is
>>> somehow retaining a reference to the original string. In my case I needed
>>> to hold the first column of each row in a tsv file that was 4G in size.
>>> This resulted in holding the entire 4G in memory.
>>>
>>> Here's a demo. Function "data" returns a seq of lines that are about
>>> 1000 bytes. The first column, however, is just a few bytes, and 10k of them
>>> should easily fit in 10M of heap space. But, no:
>>>
>>> $ LEIN_JVM_OPTS=-Xmx10M lein repl
>>> REPL started; server listening on localhost port 34955
>>> user=> (defn data [] (for [i (range)] (str "row " i "\t"
>>> (clojure.string/join "" (repeat 1000 "x")))))
>>> #'user/data
>>> user=> (def x (vec (take 10000 (map #(first (clojure.string/split %
>>> #"\t")) (data)))))
>>> java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:4)
>>> user=>
>>>
>>> If I copy the returned string with the String constructor, it's fine:
>>>
>>> $ LEIN_JVM_OPTS=-Xmx10M lein repl
>>> REPL started; server listening on localhost port 20587
>>> user=> (defn data [] (for [i (range)] (str "row " i "\t"
>>> (clojure.string/join "" (repeat 1000 "x")))))
>>> #'user/data
>>> user=> (def x (vec (take 10000 (map #(String. (first
>>> (clojure.string/split % #"\t"))) (data)))))
>>> #'user/x
>>> user=> (x 10)
>>> "row 10"
>>> user=>
>>>
>>> Two observations about this.
>>>
>>> First, this behavior is very unexpected to me. I don't understand if it
>>> is a property of strings, collections, or string/split specifically that is
>>> causing it. Is there something in the docs that I overlooked, that would
>>> have warned of this?
>>>
>>> Second, for tracking down problems like this, the available tooling is
>>> pathetic, to put it as politely as possible. jhat would not trace the the
>>> leaked strings. It consistently froze up when tracing them to GC roots.
>>> visualvm traced it back to CacheLRU, as in the screenshot I posted in the
>>> other thread, which was perfectly uninformative.
>>>
>>> Without any usable tooling, the only workflow I found to narrow the
>>> problem was to iteratively stub out portions of code and re-run the program
>>> for several minutes to determine if the leak was active. Obviously, this is
>>> incredibly painful, slow, and tedious.
>>>
>>> I'm hoping someone can tell me there's a better way.
>>>
>>> Note that the leak did not appear in when exercising subsystems
>>> independently, because in that case no references were retained from one
>>> subsystem to the other. So, "try it in the repl" was not an effective
>>> strategy.
>>>
>>> --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>>
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@**googlegroups.com
>>>
>>> For more options, visit this group at
>>> http://groups.google.com/**group/clojure?hl=en<http://groups.google.com/group/clojure?hl=en>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+u...@**googlegroups.com.
>>>
>>> For more options, visit 
>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>> .
>>>
>>
>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to