Neat blog by the way. On Monday, February 10, 2014 8:41:51 PM UTC-5, Rob Buhler wrote: > > Hi, > > I'm learning Clojure and I wrote a word-frequencies function that relies > heavily on clojure.core/frequencies (plus a little filtering) > > (ns topwords.core > (require [clojure.java.io :as io] > [clojure.string :as str])) > (def stop-words #{"other" "still" "again" "where" "could" "there" > > "their" "these" "those" "after" "while" "almost" "before" > "through" > > "every" "being" "never" "should" "might" "thing" "among" > > "which" "would" "though" "about"}) > (defn get-words [line] > (re-seq #"\p{Alpha}+" line)) > (defn min-length [word] > (< 4 (count word))) > (defn ignore-words [word] > (if-not (contains? stop-words word) word)) > (defn word-frequencies [filename] > (with-open [rdr (io/reader filename)] > (let [lines (line-seq rdr) > words (comp get-words str/lower-case) > preds (every-pred min-length ignore-words)] > (frequencies (filter preds (words lines)))))) > > > It works (you can see some output from it on my blog if you want - > http://robbuhler.blogspot.com/2014/02/word-frequencies-from-file.html) > > Anyway, my questions are: > > > 1) Why do I not need a doall on the line-seq? What is forcing the evaluation > here? > > > 2) I'm assuming this is still reading the entire file into memory at once? If > so, how would I > > count the frequencies of a really large file without consuming so much > memory? > > I've thought about using doseq and for each line updating a atom that > holds a map, > > but I'm not sure if I'm no the right track here. > > I'm just thinking of something like this (in Python): > > for i in xrange(100): > > key = i % 10 > > if key in d: > d[key] += 1 > else: > d[key] = 1 > > Can I somehow count all of the frequencies line by line and not use an > atom (or another ref type)? > > I'm not looking for the ultimate performance code, just something that > would be considered idiomatic Clojure > > > Thanks, > > Rob > >
-- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.