I have just added some discussion of this on ClojureDocs.org for the function clojure.core/subs, and references to that discussion for several other Clojure functions that I am pretty sure are affected, e.g. re-find, re-seq, re-matches, clojure.string/split, replace, replace-first
http://clojuredocs.org/clojure_core/clojure.core/subs Andy On Thu, Sep 12, 2013 at 9:59 AM, Brian Craft <craft.br...@gmail.com> wrote: > Ouch. Thanks for the explanation. > > > On Thursday, September 12, 2013 9:46:47 AM UTC-7, Andy Fingerhut wrote: > >> Clojure's substr, and many other functions that return substrings of a >> larger one (e.g. re-find, re-seq, etc) are based on Java's >> java.lang.String/substring() method behavior. >> >> Before Java version 7u6 or thereabouts, this was implemented in O(1) time >> by creating a String object that referred to an offset and length within >> the original String object, thus retaining a reference to it as long as the >> substrings were referenced. >> >> Around Java version 7u6, Java's substring() method behavior changed to >> copy the desired substring into a new String object, so no references are >> kept to the original. >> >> http://www.javaadvent.com/**2012/12/changes-to-** >> stringsubstring-in-java-7.html<http://www.javaadvent.com/2012/12/changes-to-stringsubstring-in-java-7.html> >> >> Fun, eh? And no, this was not obvious to me until I ran across the issue >> some time back. Mark Engelberg encountered this issue while doing >> performance tuning on his Instaparse library: >> https://github.com/Engelberg/**instaparse<https://github.com/Engelberg/instaparse> >> >> If you know you are deploying on a Java version that is earlier than 7u6, >> you can using the String constructor, e.g. (String. s) from Clojure to >> force the copying of the string. You could even get fancier and write code >> that depends upon the Java version you are running upon, if that interests >> you. >> >> Andy >> >> >> On Thu, Sep 12, 2013 at 9:08 AM, Brian Craft <craft...@gmail.com> wrote: >> >>> After working around the seq + closure = death problem, I still had a >>> severe memory leak in my code, which took many hours to find. >>> >>> Holding a reference to a string returned by clojure.string/split is >>> somehow retaining a reference to the original string. In my case I needed >>> to hold the first column of each row in a tsv file that was 4G in size. >>> This resulted in holding the entire 4G in memory. >>> >>> Here's a demo. Function "data" returns a seq of lines that are about >>> 1000 bytes. The first column, however, is just a few bytes, and 10k of them >>> should easily fit in 10M of heap space. But, no: >>> >>> $ LEIN_JVM_OPTS=-Xmx10M lein repl >>> REPL started; server listening on localhost port 34955 >>> user=> (defn data [] (for [i (range)] (str "row " i "\t" >>> (clojure.string/join "" (repeat 1000 "x"))))) >>> #'user/data >>> user=> (def x (vec (take 10000 (map #(first (clojure.string/split % >>> #"\t")) (data))))) >>> java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:4) >>> user=> >>> >>> If I copy the returned string with the String constructor, it's fine: >>> >>> $ LEIN_JVM_OPTS=-Xmx10M lein repl >>> REPL started; server listening on localhost port 20587 >>> user=> (defn data [] (for [i (range)] (str "row " i "\t" >>> (clojure.string/join "" (repeat 1000 "x"))))) >>> #'user/data >>> user=> (def x (vec (take 10000 (map #(String. (first >>> (clojure.string/split % #"\t"))) (data))))) >>> #'user/x >>> user=> (x 10) >>> "row 10" >>> user=> >>> >>> Two observations about this. >>> >>> First, this behavior is very unexpected to me. I don't understand if it >>> is a property of strings, collections, or string/split specifically that is >>> causing it. Is there something in the docs that I overlooked, that would >>> have warned of this? >>> >>> Second, for tracking down problems like this, the available tooling is >>> pathetic, to put it as politely as possible. jhat would not trace the the >>> leaked strings. It consistently froze up when tracing them to GC roots. >>> visualvm traced it back to CacheLRU, as in the screenshot I posted in the >>> other thread, which was perfectly uninformative. >>> >>> Without any usable tooling, the only workflow I found to narrow the >>> problem was to iteratively stub out portions of code and re-run the program >>> for several minutes to determine if the leak was active. Obviously, this is >>> incredibly painful, slow, and tedious. >>> >>> I'm hoping someone can tell me there's a better way. >>> >>> Note that the leak did not appear in when exercising subsystems >>> independently, because in that case no references were retained from one >>> subsystem to the other. So, "try it in the repl" was not an effective >>> strategy. >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com >>> >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@**googlegroups.com >>> >>> For more options, visit this group at >>> http://groups.google.com/**group/clojure?hl=en<http://groups.google.com/group/clojure?hl=en> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+u...@**googlegroups.com. >>> >>> For more options, visit >>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> >>> . >>> >> >> -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.