I am still somewhat new to Clojure and the JVM. I am querying a database 
and trying to output some XML. For the XML, I am using this library:

https://github.com/clojure/data.xml

Apparently the database queries worked, and I was able to get the data into 
an XML structure, but at some point the data gets lost and nothing is being 
output. I decided, for the sake of debugging, I would add in some println 
statements and output it to the terminal. 

I was storing the xml in an atom called recent-activity. I attempted to 
store this as clojure.data.xml elements in a vector. I got no output so I 
decided t switch to emit-str and store it as a string. Then suddenly I got 
an OutOfMemory error. I found this very surprising. Are strings that 
expensive on memory?

The code looks like this. The first function does the query against the 
database:

(defn update-recent-discourse-from-this-site [db k]
  (jdbc/with-connection db
    (jdbc/with-query-results database-results
      [(str 
        " SELECT
              d.id, d.description, d.created_at, 
              p.id as profile_id, p.first_name, p.last_name, 
              u.username "
        " FROM discourse as d, sf_guard_user as u,  sf_guard_user_profile 
as p "
        " WHERE d.user_id=p.user_id "
        " AND d.user_id=u.id "
        " AND p.user_id=u.id "
        " AND d.question_id = 0 "
        " AND d.answer_id = 0 "
        " AND d.discourse_id = 0 "
        " AND d.created_at > ? "
        " ORDER BY d.created_at DESC "
        " LIMIT 100 ")
       (das/one-month-ago-as-a-string-for-the-database)]
      (let [feed (transform-posts-into-a-feed database-results k 
"discourse")]
        (swap! recent-activity concat feed @recent-activity)))))

and this function was suppose to put the content into an atom. At first I 
did not use emit-str, but I added that on my last attempt to figure out 
what is going on. 

I was using "conj", but then switched to "concat" when I switched to using 
emit-str. 

This is the function that formed the XML:

(defn transform-posts-into-a-feed [database-results k 
what-type-of-item-is-this]
  (let [site-url (make-site-url k)
        map-of-xml-elements (reduce 
                             (fn [feed db-row]
                               (conj feed
                                     (xml/emit-str 
                                      (xml/element :item {}
                                                   (xml/element 
:what-type-of-item-is-this {} (str what-type-of-item-is-this)) 
                                                   (xml/element :username 
{} (str (make-user-nice-name db-row))) 
                                                   (xml/element 
:user-profile-url {} (str (make-profile-url site-url (:profile_id 
db-row)))) 
                                                   (xml/element 
:in-response-to-url {} (str (make-in-response-to-url site-url 
what-type-of-item-is-this (:in_response_to_id db-row))))
                                                   (xml/element :site-url 
{} (str  site-url))
                                                   (xml/element :title {} 
(str  (:title db-row)))
                                                   (xml/element :item-url 
{} (str (make-item-url site-url what-type-of-item-is-this (:id db-row)))) 
                                                   (xml/element 
:description {} (str (:description db-row)))
                                                   (xml/element :date {} 
(str  (:created_at db-row)))
                                                   ))))
                             [] database-results)]
    map-of-xml-elements))

For awhile (before I used emit-str), at the end of this function, I had 
this:

(println (apply str   map-of-xml-elements))

and I could see that the data in map-of-xml-elements was what I expected. 
And yet the atom "recent-activity" seemed to remain empty, which I found 
very confusing. 

The function that basically drives this app (called from main) is the one 
that throws an error:

(defn iterate-through-sites-and-output-files []
  "2012-11-10 - The TMA server might have 10 or 20 or more websites, each 
with their own database config. We need to update every site. 
fg/database-connections holds a map where the key is the name of the site 
and the value is another map that has all of the info needed to connect to 
that sites database."
  (println "Now we will iterate over the sites again. The time is: " 
(das/current-time-as-string))
  (doseq [[k db] @fg/database-connections]
    (println (apply str " We will now update " (str k)))
    (ura/update-recent-activity-from-this-site db k)
    (println (mem/show-stats-regarding-resources-used-by-this-app))
    (println (apply str (debug/thread-top)))
    (println " We are processing " (str k))
    (println "This is what we currently have stored up as recent activity: 
")
  (println "We will now wait 1 hour, then iterate over all of the sites 
again. The time is: " (das/current-time-as-string))
  (. java.lang.Thread sleep 3600000)
  (iterate-through-sites-and-output-files))

The error I get: 

Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown 
Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuilder.append(Unknown Source)
        at clojure.core$str$fn__3501.invoke(core.clj:500)
        at clojure.core$str.doInvoke(core.clj:502)
        at clojure.lang.RestFn.applyTo(RestFn.java:139)
        at clojure.core$apply.invoke(core.clj:600)
        at 
recent_activity.core$iterate_through_sites_and_output_files.invoke(core.clj:33)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Unknown Source)

line 33 is:

  (println "We will now wait 1 hour, then iterate over all of the sites 
again. The time is: " (das/current-time-as-string))

That line looks innocent and it did not cause a problem before. It just 
stared causing a problem when I starting using emit-str to store the XML as 
a string in the atom recent-activity. 

That leads to 2 questions: 

1.) are strings expensive on memory? 

2.) what are the simplest profiling tools I can use to compare the memory 
use of emit-str versus what I was doing previously? 

I am giving up on the use of emit-str and I'm going to try a different 
approach. But I would be grateful for any insights about why I might have 
gotten the OutOfMemory error. 




-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to