Hm, I just tested it with a 160M file, split into 2 million strings. The memory footprint was substantially worse than String. With String I get about a 3.2X increase in memory footprint for this file, but with pjstadig.utf8 I get about 5X. Also, it's slower by orders of magnitude.
On Tuesday, October 28, 2014 12:49:53 PM UTC-7, Andy Fingerhut wrote: > > Sorry, no feedback on your attempt, but a note that you may want to check > out Paul Stadig's utf8 library to see if it serves your purpose. I believe > it should store text that fits within the ASCII subset into 1 byte of > memory per character, only using 2, 3, or 4 bytes for other Unicode > characters, depending on the code point. > > https://github.com/pjstadig/utf8 > > Andy > > On Tue, Oct 28, 2014 at 12:24 PM, Brian Craft <craft...@gmail.com > <javascript:>> wrote: > >> Following up on the thread about the massive overhead of String, I tried >> writing a string collection type that stores strings as bytes, converting >> to String on-demand. It seems to work. Memory footprint and performance are >> good for the application. >> >> The hard part was trying to track down the correct interfaces and >> invocations. I note that "Clojure Programming" makes the same observation >> in the section about clojure abstractions: "such things are largely >> undocumented". I guess this situation hasn't improved? I had to proceed >> mostly by experimentation, and am still unclear on, for example, why I >> needed to use an interop call in some places (like cons), but should not in >> others. >> >> Would be happy for any feedback on this attempt: >> >> (deftype StringVec [pv] >> clojure.lang.IPersistentVector >> (seq [self] (map #(String. ^bytes %) pv)) >> (nth [self i] (String. ^bytes (.nth ^clojure.lang.IPersistentVector pv >> i))) >> (nth [self i notfound] (String. ^bytes (.nth >> ^clojure.lang.IPersistentVector pv i (.getBytes ^String notfound)))) >> clojure.lang.ILookup >> (valAt [self i] (when-let [res (.valAt ^clojure.lang.IPersistentVector >> pv i)] >> (String. ^bytes res))) >> (valAt [self i notfound] (String. ^bytes (.valAt >> ^clojure.lang.IPersistentVector pv i (.getBytes ^String notfound)))) >> clojure.lang.ISeq >> (first [self] (String. ^bytes (first pv))) >> (next [self] (->StringVec (next pv))) >> (more [self] (->StringVec (rest pv))) >> (cons [self s] (->StringVec (.cons ^clojure.lang.IPersistentVector pv >> (.getBytes ^String s)))) >> (count [self] (count pv)) >> Object >> (toString [self] (str (into [] self)))) >> >> (defn stringvec [coll] >> (into (->StringVec []) coll)) >> >> (defmethod print-method StringVec [v, ^java.io.Writer w] >> (.write w (.toString ^StringVec v))) >> >> Speak of cons, I gather ISeq cons is unrelated to cons, the function, but >> rather is required for conj? >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clo...@googlegroups.com >> <javascript:> >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+u...@googlegroups.com <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+u...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.