I am not certain whether Paul's intention was a lower memory footprint than
Java's strings, but I can't think of a strong reason to use his library
other than that.  Would you be willing to file an issue on Github with your
findings to see if he thinks there is a problem there?

Also, his code itself may have some answers to your questions about how to
make new data structures satisfy some Clojure abstractions.

Andy

On Tue, Oct 28, 2014 at 3:57 PM, Brian Craft <craft.br...@gmail.com> wrote:

> Hm, I just tested it with a 160M file, split into 2 million strings. The
> memory footprint was substantially worse than String. With String I get
> about a 3.2X increase in memory footprint for this file, but with
> pjstadig.utf8 I get about 5X. Also, it's slower by orders of magnitude.
>
> On Tuesday, October 28, 2014 12:49:53 PM UTC-7, Andy Fingerhut wrote:
>>
>> Sorry, no feedback on your attempt, but a note that you may want to check
>> out Paul Stadig's utf8 library to see if it serves your purpose.  I believe
>> it should store text that fits within the ASCII subset into 1 byte of
>> memory per character, only using 2, 3, or 4 bytes for other Unicode
>> characters, depending on the code point.
>>
>> https://github.com/pjstadig/utf8
>>
>> Andy
>>
>> On Tue, Oct 28, 2014 at 12:24 PM, Brian Craft <craft...@gmail.com> wrote:
>>
>>> Following up on the thread about the massive overhead of String, I tried
>>> writing a string collection type that stores strings as bytes, converting
>>> to String on-demand. It seems to work. Memory footprint and performance are
>>> good for the application.
>>>
>>> The hard part was trying to track down the correct interfaces and
>>> invocations. I note that "Clojure Programming" makes the same observation
>>> in the section about clojure abstractions: "such things are largely
>>> undocumented". I guess this situation hasn't improved? I had to proceed
>>> mostly by experimentation, and am still unclear on, for example, why I
>>> needed to use an interop call in some places (like cons), but should not in
>>> others.
>>>
>>> Would be happy for any feedback on this attempt:
>>>
>>> (deftype StringVec [pv]
>>>   clojure.lang.IPersistentVector
>>>   (seq [self] (map #(String. ^bytes %) pv))
>>>   (nth [self i] (String. ^bytes (.nth ^clojure.lang.IPersistentVector
>>> pv i)))
>>>   (nth [self i notfound] (String. ^bytes (.nth 
>>> ^clojure.lang.IPersistentVector
>>> pv i (.getBytes ^String notfound))))
>>>   clojure.lang.ILookup
>>>   (valAt [self i] (when-let [res (.valAt ^clojure.lang.IPersistentVector
>>> pv i)]
>>>                     (String. ^bytes res)))
>>>   (valAt [self i notfound] (String. ^bytes (.valAt 
>>> ^clojure.lang.IPersistentVector
>>> pv i (.getBytes ^String notfound))))
>>>   clojure.lang.ISeq
>>>   (first [self] (String. ^bytes (first pv)))
>>>   (next [self] (->StringVec (next pv)))
>>>   (more [self] (->StringVec (rest pv)))
>>>   (cons [self s] (->StringVec (.cons ^clojure.lang.IPersistentVector pv
>>> (.getBytes ^String s))))
>>>   (count [self] (count pv))
>>>   Object
>>>   (toString [self] (str (into [] self))))
>>>
>>> (defn stringvec [coll]
>>>   (into (->StringVec []) coll))
>>>
>>> (defmethod print-method StringVec [v, ^java.io.Writer w]
>>>   (.write w (.toString ^StringVec v)))
>>>
>>> Speak of cons, I gather ISeq cons is unrelated to cons, the function,
>>> but rather is required for conj?
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to