Check out assoc-in, get-in, and update-in. They make working with nested maps a breeze. Here's a rewrite of your code:
(ns user (:require [clojure.string :as str] [clojure.java.io :as io])) (def postcode-trie (with-open [r (io/reader "/path/to/data.csv")] (reduce (fn [trie line] (let [[postcode region] (.split line ",") postcode (str/replace postcode " " "")] (assoc-in trie postcode region))) {} (line-seq r)))) (get-in postcode-trie "SW1A2") ;; => "20" This stores keys of the tree (trie) as characters instead of strings, which lets you use get-in easily. Using line-seq might help mitigate unnecessary memory usage, though as Andy mentions, Java objects just carry a lot of baggage. Justin On Nov 3, 12:22 pm, Paul Ingles <p...@forward.co.uk> wrote: > Hi, > > I've been playing around with breaking apart a list of postal codes to > be stored in a tree with leaf nodes containing information about that > area. I have some code that works with medium-ish size inputs but > fails with a GC Overhead error with larger input sets (1.5m rows) and > would really appreciate anyone being able to point me in the right > direction. > > The full code is up as a gist here:https://gist.github.com/661278 > > My input file contains something like: > > SW1A 1,10 > SW1A 2,20 > ... > > Which are then mapped to 2 trees: > > {"S" {"W" {"1" {"A" {"1" 10}}}}} > {"S" {"W" {"1" {"A" {"2" 20}}}}} > > I then want to continually fold those trees into a master tree. For > the 2 maps above the merged tree would be: > > {"S" {"W" {"1" {"A" {"1" 10 "2" 20}}}}} > > I'm sure I'm missing all kinds of awesome core/contrib functions to > make it more concise and would appreciate anyone pointing out > alternatives also. > > The main problem is that it fails when my input data gets sufficiently > large. On my MacBook Pro it falls over with an input set of 1.5m > records (although a lot of these would be branches from the first few > levels). It reports GC Overhead limit exceeded, although also ran out > of heap size before I upped that. > > I assume this is because during the tree reduction it's still > retaining references to nodes eventually causing it to build > continually larger structures? > > I've included the reduce function (and how that gets called to produce > results) inline: > > (defn merge-tree > [tree other] > (if (not (map? other)) > tree > (merge-with (fn [x y] (merge-tree x y)) > tree other))) > > (def results (reduce merge-tree > {} > (map record-to-tree > postcodes-from-file))) > > All help much appreciated, > Paul -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en