Sorry, no feedback on your attempt, but a note that you may want to check out Paul Stadig's utf8 library to see if it serves your purpose. I believe it should store text that fits within the ASCII subset into 1 byte of memory per character, only using 2, 3, or 4 bytes for other Unicode characters, depending on the code point.
https://github.com/pjstadig/utf8 Andy On Tue, Oct 28, 2014 at 12:24 PM, Brian Craft <craft.br...@gmail.com> wrote: > Following up on the thread about the massive overhead of String, I tried > writing a string collection type that stores strings as bytes, converting > to String on-demand. It seems to work. Memory footprint and performance are > good for the application. > > The hard part was trying to track down the correct interfaces and > invocations. I note that "Clojure Programming" makes the same observation > in the section about clojure abstractions: "such things are largely > undocumented". I guess this situation hasn't improved? I had to proceed > mostly by experimentation, and am still unclear on, for example, why I > needed to use an interop call in some places (like cons), but should not in > others. > > Would be happy for any feedback on this attempt: > > (deftype StringVec [pv] > clojure.lang.IPersistentVector > (seq [self] (map #(String. ^bytes %) pv)) > (nth [self i] (String. ^bytes (.nth ^clojure.lang.IPersistentVector pv > i))) > (nth [self i notfound] (String. ^bytes (.nth > ^clojure.lang.IPersistentVector pv i (.getBytes ^String notfound)))) > clojure.lang.ILookup > (valAt [self i] (when-let [res (.valAt ^clojure.lang.IPersistentVector > pv i)] > (String. ^bytes res))) > (valAt [self i notfound] (String. ^bytes (.valAt > ^clojure.lang.IPersistentVector pv i (.getBytes ^String notfound)))) > clojure.lang.ISeq > (first [self] (String. ^bytes (first pv))) > (next [self] (->StringVec (next pv))) > (more [self] (->StringVec (rest pv))) > (cons [self s] (->StringVec (.cons ^clojure.lang.IPersistentVector pv > (.getBytes ^String s)))) > (count [self] (count pv)) > Object > (toString [self] (str (into [] self)))) > > (defn stringvec [coll] > (into (->StringVec []) coll)) > > (defmethod print-method StringVec [v, ^java.io.Writer w] > (.write w (.toString ^StringVec v))) > > Speak of cons, I gather ISeq cons is unrelated to cons, the function, but > rather is required for conj? > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.