Sorry, no feedback on your attempt, but a note that you may want to check
out Paul Stadig's utf8 library to see if it serves your purpose.  I believe
it should store text that fits within the ASCII subset into 1 byte of
memory per character, only using 2, 3, or 4 bytes for other Unicode
characters, depending on the code point.

https://github.com/pjstadig/utf8

Andy

On Tue, Oct 28, 2014 at 12:24 PM, Brian Craft <craft.br...@gmail.com> wrote:

> Following up on the thread about the massive overhead of String, I tried
> writing a string collection type that stores strings as bytes, converting
> to String on-demand. It seems to work. Memory footprint and performance are
> good for the application.
>
> The hard part was trying to track down the correct interfaces and
> invocations. I note that "Clojure Programming" makes the same observation
> in the section about clojure abstractions: "such things are largely
> undocumented". I guess this situation hasn't improved? I had to proceed
> mostly by experimentation, and am still unclear on, for example, why I
> needed to use an interop call in some places (like cons), but should not in
> others.
>
> Would be happy for any feedback on this attempt:
>
> (deftype StringVec [pv]
>   clojure.lang.IPersistentVector
>   (seq [self] (map #(String. ^bytes %) pv))
>   (nth [self i] (String. ^bytes (.nth ^clojure.lang.IPersistentVector pv
> i)))
>   (nth [self i notfound] (String. ^bytes (.nth
> ^clojure.lang.IPersistentVector pv i (.getBytes ^String notfound))))
>   clojure.lang.ILookup
>   (valAt [self i] (when-let [res (.valAt ^clojure.lang.IPersistentVector
> pv i)]
>                     (String. ^bytes res)))
>   (valAt [self i notfound] (String. ^bytes (.valAt
> ^clojure.lang.IPersistentVector pv i (.getBytes ^String notfound))))
>   clojure.lang.ISeq
>   (first [self] (String. ^bytes (first pv)))
>   (next [self] (->StringVec (next pv)))
>   (more [self] (->StringVec (rest pv)))
>   (cons [self s] (->StringVec (.cons ^clojure.lang.IPersistentVector pv
> (.getBytes ^String s))))
>   (count [self] (count pv))
>   Object
>   (toString [self] (str (into [] self))))
>
> (defn stringvec [coll]
>   (into (->StringVec []) coll))
>
> (defmethod print-method StringVec [v, ^java.io.Writer w]
>   (.write w (.toString ^StringVec v)))
>
> Speak of cons, I gather ISeq cons is unrelated to cons, the function, but
> rather is required for conj?
>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to