Check out assoc-in, get-in, and update-in. They make working with
nested maps a breeze. Here's a rewrite of your code:

(ns user
  (:require [clojure.string :as str]
            [clojure.java.io :as io]))

(def postcode-trie
  (with-open [r (io/reader "/path/to/data.csv")]
    (reduce
     (fn [trie line]
       (let [[postcode region] (.split line ",")
             postcode (str/replace postcode " " "")]
         (assoc-in trie postcode region)))
     {}
     (line-seq r))))

(get-in postcode-trie "SW1A2")
;; => "20"

This stores keys of the tree (trie) as characters instead of strings,
which lets you use get-in easily.

Using line-seq might help mitigate unnecessary memory usage, though as
Andy mentions, Java objects just carry a lot of baggage.

Justin

On Nov 3, 12:22 pm, Paul Ingles <p...@forward.co.uk> wrote:
> Hi,
>
> I've been playing around with breaking apart a list of postal codes to
> be stored in a tree with leaf nodes containing information about that
> area. I have some code that works with medium-ish size inputs but
> fails with a GC Overhead error with larger input sets (1.5m rows) and
> would really appreciate anyone being able to point me in the right
> direction.
>
> The full code is up as a gist here:https://gist.github.com/661278
>
> My input file contains something like:
>
> SW1A 1,10
> SW1A 2,20
> ...
>
> Which are then mapped to 2 trees:
>
> {"S" {"W" {"1" {"A" {"1" 10}}}}}
> {"S" {"W" {"1" {"A" {"2" 20}}}}}
>
> I then want to continually fold those trees into a master tree. For
> the 2 maps above the merged tree would be:
>
> {"S" {"W" {"1" {"A" {"1" 10 "2" 20}}}}}
>
> I'm sure I'm missing all kinds of awesome core/contrib functions to
> make it more concise and would appreciate anyone pointing out
> alternatives also.
>
> The main problem is that it fails when my input data gets sufficiently
> large. On my MacBook Pro it falls over with an input set of 1.5m
> records (although a lot of these would be branches from the first few
> levels). It reports GC Overhead limit exceeded, although also ran out
> of heap size before I upped that.
>
> I assume this is because during the tree reduction it's still
> retaining references to nodes eventually causing it to build
> continually larger structures?
>
> I've included the reduce function (and how that gets called to produce
> results) inline:
>
> (defn merge-tree
>   [tree other]
>   (if (not (map? other))
>     tree
>     (merge-with (fn [x y] (merge-tree x y))
>                 tree other)))
>
> (def results (reduce merge-tree
>                      {}
>                      (map record-to-tree
>                           postcodes-from-file)))
>
> All help much appreciated,
> Paul

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to