I have a big MySQL database containing telephone call detail records.

The database is on a remote server and it's a bit slow to access. I
want to migrate the data to local SQLite databases on my machine for
further processing. I want to store each month's records in separate
files, located under ~/calls/<source>/<year>/<year>-<month>.db, where
<source> is a string identifying the telephone switch which processed
the call ("lucent" or "c4"), while <year> and <month> correspond to
the start date of the call.

I wrote two functions, one to SELECT the records from the MySQL
database, the other to store them in SQLite. The first function would
generate a lazy seq of records (maps), while the second would take
this seq of maps and INSERT the records to the appropriate SQLite
databases.

Here are the current versions of the two functions:

(defn c4-calls
  [st-prefix]
  (let [conn (datasource/connection datasource/cdr-c4)
        stmt (.prepareStatement
              conn
              (format "SELECT
                         SUBSTRING(a,5) AS a,
                         SUBSTRING(b,3) AS b,
                         SUBSTRING(st,1,14) AS st,
                         duration/1000 AS duration
                       FROM `call`
                       WHERE st LIKE '%s%%'
                       ORDER BY st" st-prefix))]
    (for [rec (resultset-seq (.executeQuery stmt))
          :let [{:keys [a b st duration]} rec]]
      {:a a
       :b b
       :st st
       :duration duration})))

The (datasource/connection datasource/cdr-c4) call returns a
java.sql.Connection to the MySQL db. Each record returned from MySQL
gets packaged into map and is returned to the caller in a lazy seq (or
at least that's what I expect reading the documentation of the 'for'
macro).

Here is the one which stores the records to the corresponding SQLite
db:

(defn store-calls
  [source calls]
  (when (not (#{"lucent" "c4"} source))
    (throw (RuntimeException. "invalid source, must be \"lucent\" or
\"c4\"")))
  (loop [calls calls
         conn nil
         stmt nil
         year nil
         month nil]
    (if-let [c (first calls)]
      (let [y (Integer/parseInt (.substring (:st c) 0 4))
            m (Integer/parseInt (.substring (:st c) 4 6))]
        (if (or (nil? conn) (not= y year) (not= m month))
          (do
            (when conn (.commit conn))
            (let [conn (call-db source y m)
                  stmt (.prepareStatement conn "INSERT INTO `call`
(a,b,st,duration) VALUES (?,?,?,?)")]
              (recur calls conn stmt y m)))
          (do
            (.setString stmt 1 (:a c))
            (.setString stmt 2 (:b c))
            (.setString stmt 3 (:st c))
            (.setDouble stmt 4 (:duration c))
            (.executeUpdate stmt)
            (recur (rest calls) conn stmt year month))))
      (when conn (.commit conn)))))

The idea is to iterate over the calls and if there is a difference
between the year/month of the previously processed record and this
one, switch to the corresponding SQLite connection, before the record
is written via INSERT. The function 'call-db' ensures that the SQLite
database identified by the passed source-year-month triad exists and
returns a java.sql.Connection to it.

My problem is a java heap space overflow error.

The likely reason may be that I keep a reference to 'calls' in the
store-calls function 1. in the function arg, 2. in the loop ([calls
calls]) - although I doubt that the second would be a problem.

I thought about using two signatures, [source calls] on one hand and
[source calls conn stmt year month] on the other, thereby substituting
the loop with a function which I can recur to, but even then, the
[source calls] version would be called first, which would pass the
calls to the other, which means the head was just kept again.

It also occurred to me that I should think in records instead of maps
and write store-calls as a kind of filter (store-call), but then I'd
lose the ability to compare the dates of the previous/current records
or would have to resort to some caching trickery which I'd rather
avoid.

Any ideas?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to