I am processing a very large xml file, 13MB, using clojure.xml.parse and clojure.contrib.zip-filter.xml with clojure 1.0.0.
The xml file contains information on 13000 japanese characters and I'm extracting about 200 or so. At its core it extracts a very small subset of elements using: (xml-> kdic :character [:literal #(contains? kcset (text %))] node) Where kcset is a set of desired characters. My understanding of this is that it returns a lazy-seq which if I "count"-ed the length of the sequence it would return 200 (not 13000). But in practice it actually generates a stack overflow. At the end of this post I have a relatively short version of the program which throws the stack overflow. In this case it has a (count ...) call which causes the stack overflow. In the full program I tried a few variations like so: (dorun (for [knode knodes] (print-kinfo knode)))) To try to get the information to print, but before it also reaches the end of list it also throws a stack overflow. I also have the stack trace at the end as well. Thanks! Here's the short version of the program: (ns kanji.prkanji (:use clojure.xml ) (:use [clojure.zip :only (xml-zip node)]) (:use clojure.contrib.zip-filter.xml) (:import java.lang.Character$UnicodeBlock) (:import java.io.File)) (def CJK Character$UnicodeBlock/CJK_UNIFIED_IDEOGRAPHS) (defn filter-for-kanji [chars] (filter #(= CJK (Character$UnicodeBlock/of %)) chars)) (defn get-unique-kanji [chars] (let [kchars (filter-for-kanji chars)] (set kchars))) (defn print-kinfos [knodes] (count knodes)) ;; this is what I would normally do: (dorun (for [knode knodes] (print- kinfo knode)))) (defn get-kdic-info [kdic kchars] (let [kcset (set (map str kchars))] (xml-> kdic :character [:literal #(contains? kcset (text %))] node))) (defn load-kdic [fname] (xml-zip (parse (File. fname)))) (defn process-file [file] (let [kchars (get-unique-kanji (slurp file))] (print-kinfos (get-kdic-info (load-kdic "kanji/kdic-data.xml") kchars)))) (process-file (second *command-line-args*)) And here's the top of the stack trace: Exception in thread "main" java.lang.StackOverflowError (prkanji.clj: 0) at clojure.lang.Compiler.eval(Compiler.java:4543) at clojure.lang.Compiler.load(Compiler.java:4857) at clojure.lang.Compiler.loadFile(Compiler.java:4824) at clojure.main$load_script__5833.invoke(main.clj:206) at clojure.main$init_opt__5836.invoke(main.clj:211) at clojure.main$initialize__5846.invoke(main.clj:239) at clojure.main$null_opt__5868.invoke(main.clj:264) at clojure.main$legacy_script__5883.invoke(main.clj:295) at clojure.lang.Var.invoke(Var.java:346) at clojure.main.legacy_script(main.java:34) at clojure.lang.Script.main(Script.java:20) Caused by: java.lang.StackOverflowError at clojure.lang.Cons.next(Cons.java:37) at clojure.lang.RT.boundedLength(RT.java:1117) at clojure.lang.AFn.applyToHelper(AFn.java:168) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply__3243.doInvoke(core.clj:390) at clojure.lang.RestFn.invoke(RestFn.java:443) at clojure.core$mapcat__3842.doInvoke(core.clj:1528) at clojure.lang.RestFn.invoke(RestFn.java:428) at clojure.contrib.zip_filter$descendants__48$fn__50.invoke (zip_filter.clj:63) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.core$seq__3133.invoke(core.clj:103) at clojure.core$map__3815$fn__3817.invoke(core.clj:1502) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.Cons.next(Cons.java:37) at clojure.lang.RT.boundedLength(RT.java:1117) at clojure.lang.RestFn.applyTo(RestFn.java:135) at clojure.core$apply__3243.doInvoke(core.clj:390) at clojure.lang.RestFn.invoke(RestFn.java:428) at clojure.core$mapcat__3842.doInvoke(core.clj:1528) at clojure.lang.RestFn.invoke(RestFn.java:428) at clojure.contrib.zip_filter$mapcat_chain__65$fn__67.invoke (zip_filter.clj:88) at clojure.lang.ArraySeq.reduce(ArraySeq.java:116) at clojure.core$reduce__3319.invoke(core.clj:536) at clojure.contrib.zip_filter$mapcat_chain__65.invoke(zip_filter.clj: 89) at clojure.contrib.zip_filter.xml$xml__GT___119.doInvoke(xml.clj:75) at clojure.lang.RestFn.invoke(RestFn.java:460) at clojure.contrib.zip_filter.xml$text__102.invoke(xml.clj:43) at kanji.prkanji$get_kdic_info__147$fn__149.invoke(prkanji.clj:36) at clojure.contrib.zip_filter$fixup_apply__60.invoke(zip_filter.clj: 76) at clojure.contrib.zip_filter$mapcat_chain__65$fn__67$fn__69.invoke (zip_filter.clj:88) at clojure.core$map__3815$fn__3817.invoke(core.clj:1503) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.core$seq__3133.invoke(core.clj:103) at clojure.core$spread__3240.invoke(core.clj:383) at clojure.core$apply__3243.doInvoke(core.clj:390) at clojure.lang.RestFn.invoke(RestFn.java:428) at clojure.core$mapcat__3842.doInvoke(core.clj:1528) at clojure.lang.RestFn.invoke(RestFn.java:428) at clojure.contrib.zip_filter$mapcat_chain__65$fn__67.invoke (zip_filter.clj:88) at clojure.lang.APersistentVector$Seq.reduce(APersistentVector.java: 476) at clojure.core$reduce__3319.invoke(core.clj:536) at clojure.contrib.zip_filter$mapcat_chain__65.invoke(zip_filter.clj: 89) at clojure.contrib.zip_filter.xml$xml__GT___119.doInvoke(xml.clj:75) at clojure.lang.RestFn.applyTo(RestFn.java:144) at clojure.core$apply__3243.doInvoke(core.clj:390) at clojure.lang.RestFn.invoke(RestFn.java:443) at clojure.contrib.zip_filter.xml$seq_test__111$fn__113.invoke (xml.clj:55) at clojure.contrib.zip_filter$fixup_apply__60.invoke(zip_filter.clj: 76) at clojure.contrib.zip_filter$mapcat_chain__65$fn__67$fn__69.invoke (zip_filter.clj:88) at clojure.core$map__3815$fn__3817.invoke(core.clj:1503) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.Cons.next(Cons.java:37) at clojure.lang.RT.next(RT.java:560) at clojure.core$next__3117.invoke(core.clj:50) at clojure.core$concat__3255$cat__3269$fn__3270.invoke(core.clj:428) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.lang.LazySeq.seq(LazySeq.java:41) at clojure.lang.RT.seq(RT.java:436) at clojure.lang.LazySeq.seq(LazySeq.java:41) -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en