Hm, I just tested it with a 160M file, split into 2 million strings. The 
memory footprint was substantially worse than String. With String I get 
about a 3.2X increase in memory footprint for this file, but with 
pjstadig.utf8 I get about 5X. Also, it's slower by orders of magnitude.

On Tuesday, October 28, 2014 12:49:53 PM UTC-7, Andy Fingerhut wrote:
>
> Sorry, no feedback on your attempt, but a note that you may want to check 
> out Paul Stadig's utf8 library to see if it serves your purpose.  I believe 
> it should store text that fits within the ASCII subset into 1 byte of 
> memory per character, only using 2, 3, or 4 bytes for other Unicode 
> characters, depending on the code point.
>
> https://github.com/pjstadig/utf8
>
> Andy
>
> On Tue, Oct 28, 2014 at 12:24 PM, Brian Craft <craft...@gmail.com 
> <javascript:>> wrote:
>
>> Following up on the thread about the massive overhead of String, I tried 
>> writing a string collection type that stores strings as bytes, converting 
>> to String on-demand. It seems to work. Memory footprint and performance are 
>> good for the application.
>>
>> The hard part was trying to track down the correct interfaces and 
>> invocations. I note that "Clojure Programming" makes the same observation 
>> in the section about clojure abstractions: "such things are largely 
>> undocumented". I guess this situation hasn't improved? I had to proceed 
>> mostly by experimentation, and am still unclear on, for example, why I 
>> needed to use an interop call in some places (like cons), but should not in 
>> others.
>>
>> Would be happy for any feedback on this attempt:
>>
>> (deftype StringVec [pv]
>>   clojure.lang.IPersistentVector
>>   (seq [self] (map #(String. ^bytes %) pv))
>>   (nth [self i] (String. ^bytes (.nth ^clojure.lang.IPersistentVector pv 
>> i)))
>>   (nth [self i notfound] (String. ^bytes (.nth 
>> ^clojure.lang.IPersistentVector pv i (.getBytes ^String notfound))))
>>   clojure.lang.ILookup
>>   (valAt [self i] (when-let [res (.valAt ^clojure.lang.IPersistentVector 
>> pv i)]
>>                     (String. ^bytes res)))
>>   (valAt [self i notfound] (String. ^bytes (.valAt 
>> ^clojure.lang.IPersistentVector pv i (.getBytes ^String notfound))))
>>   clojure.lang.ISeq
>>   (first [self] (String. ^bytes (first pv)))
>>   (next [self] (->StringVec (next pv)))
>>   (more [self] (->StringVec (rest pv)))
>>   (cons [self s] (->StringVec (.cons ^clojure.lang.IPersistentVector pv 
>> (.getBytes ^String s))))
>>   (count [self] (count pv))
>>   Object
>>   (toString [self] (str (into [] self))))
>>
>> (defn stringvec [coll]
>>   (into (->StringVec []) coll))
>>
>> (defmethod print-method StringVec [v, ^java.io.Writer w]
>>   (.write w (.toString ^StringVec v)))
>>
>> Speak of cons, I gather ISeq cons is unrelated to cons, the function, but 
>> rather is required for conj?
>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com 
>> <javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clojure+u...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to