Hi,

I (still) consider myself new to clojure.  I am trying to read a 37Mb
file that will grow 500k every 2 days. I don't consider this to be
input large enough file to merit using Hadoop and I'd like to process
it in Clojure in an efficient, speedy, and idiomatic way.

I simply want something akin to a transpose, where the input looks
like this:
( [ a1 b1 c1 d1 ] [ a2 b2 c2 d2 ] [ a3 b3 c3 d3 ])

…and the output looks like this:

[ [ a1 a2 a3 ] [ b1 b2 b3 ] [ c1 c2 c3 ] [ d1 d2 d3 ] ]

Gleaning what I can from various sources and cobbling them together, I
have the following below, which works for small input but not for the
intended file sizes (and larger) I'd like it to be able to handle.

(use 'clojure.contrib.io)
(require 'clojure.string)

(def tabfn "/Users/avram/data/testdata.tab")

(defn is-comment?
    "Checks if argument is a comment (i.e. starts with a '#').
       Returns: boolean."
  [line]
  (= \# (first line)))

(defn data-lines
    "Returns data lines in file (i.e. all lines that do not start with
'#')
      Returns: sequence containing data lines"
    [filename]
    (drop-while is-comment? (line-seq (reader filename))))

(defn parsed-data-lines
  [filename]
  (map #(clojure.string/split % #"\t") (data-lines filename)))

(def signals (vec (apply map vector (parsed-data-lines tabfn))))


user=> (def signals (vec (apply map vector (parsed-data-lines
tabfn))))
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:68)


How can I avoid the OutOfMemoryError?

Is there a Leiningen setting where I can increase the memory or is
there a more efficient way to achieve this?

Also, I'd prefer to read in gzip'd tab-delimited files instead of
uncompressed tab-delimited files.  What is the idiomatic clojure way
to do this?


Comments on improvements and criticisms welcome :)

Thanks,
Avram

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to