Looks like it's based on the vector-of feature. I've seen this described 
elsewhere as a "compact" data structure, but it doesn't seem very compact. 
Maybe I'm using it wrong?

 (apply vector-of :byte (into [] (.getBytes "foo")))
[102 111 111]

Looks like this works. Applying to the same test file:

(def g (mapv #(apply vector-of :byte (into [] (.getBytes ^String %))) 
(line-seq f)))

OutOfMemoryError Java heap space  java.util.Arrays.copyOf (Arrays.java:2894)

:-p   This is a 160M file. Trying again, with more memory, jhat tells me 
this:

 num     #instances         #bytes  class name
----------------------------------------------
   1:       2000961      285940064  [Ljava.lang.Object;
   2:       5726264      257130296  [B
   3:       5721690      137320560  clojure.core.VecNode
   4:       1917764       76710560  clojure.core.Vec
...
Total      15717203      785217096

Started with about 28M, so loading a 160M file with 2 million lines 
consumed 757M, or nearly 5X the size of the data. Same as what I see with 
pjstadig.utf8.




On Tuesday, October 28, 2014 4:50:04 PM UTC-7, Andy Fingerhut wrote:
>
> I am not certain whether Paul's intention was a lower memory footprint 
> than Java's strings, but I can't think of a strong reason to use his 
> library other than that.  Would you be willing to file an issue on Github 
> with your findings to see if he thinks there is a problem there?
>
> Also, his code itself may have some answers to your questions about how to 
> make new data structures satisfy some Clojure abstractions.
>
> Andy
>
> On Tue, Oct 28, 2014 at 3:57 PM, Brian Craft <craft...@gmail.com 
> <javascript:>> wrote:
>
>> Hm, I just tested it with a 160M file, split into 2 million strings. The 
>> memory footprint was substantially worse than String. With String I get 
>> about a 3.2X increase in memory footprint for this file, but with 
>> pjstadig.utf8 I get about 5X. Also, it's slower by orders of magnitude.
>>
>> On Tuesday, October 28, 2014 12:49:53 PM UTC-7, Andy Fingerhut wrote:
>>>
>>> Sorry, no feedback on your attempt, but a note that you may want to 
>>> check out Paul Stadig's utf8 library to see if it serves your purpose.  I 
>>> believe it should store text that fits within the ASCII subset into 1 byte 
>>> of memory per character, only using 2, 3, or 4 bytes for other Unicode 
>>> characters, depending on the code point.
>>>
>>> https://github.com/pjstadig/utf8
>>>
>>> Andy
>>>
>>> On Tue, Oct 28, 2014 at 12:24 PM, Brian Craft <craft...@gmail.com> 
>>> wrote:
>>>
>>>> Following up on the thread about the massive overhead of String, I 
>>>> tried writing a string collection type that stores strings as bytes, 
>>>> converting to String on-demand. It seems to work. Memory footprint and 
>>>> performance are good for the application.
>>>>
>>>> The hard part was trying to track down the correct interfaces and 
>>>> invocations. I note that "Clojure Programming" makes the same observation 
>>>> in the section about clojure abstractions: "such things are largely 
>>>> undocumented". I guess this situation hasn't improved? I had to proceed 
>>>> mostly by experimentation, and am still unclear on, for example, why I 
>>>> needed to use an interop call in some places (like cons), but should not 
>>>> in 
>>>> others.
>>>>
>>>> Would be happy for any feedback on this attempt:
>>>>
>>>> (deftype StringVec [pv]
>>>>   clojure.lang.IPersistentVector
>>>>   (seq [self] (map #(String. ^bytes %) pv))
>>>>   (nth [self i] (String. ^bytes (.nth ^clojure.lang.IPersistentVector 
>>>> pv i)))
>>>>   (nth [self i notfound] (String. ^bytes (.nth 
>>>> ^clojure.lang.IPersistentVector 
>>>> pv i (.getBytes ^String notfound))))
>>>>   clojure.lang.ILookup
>>>>   (valAt [self i] (when-let [res (.valAt ^clojure.lang.IPersistentVector 
>>>> pv i)]
>>>>                     (String. ^bytes res)))
>>>>   (valAt [self i notfound] (String. ^bytes (.valAt 
>>>> ^clojure.lang.IPersistentVector 
>>>> pv i (.getBytes ^String notfound))))
>>>>   clojure.lang.ISeq
>>>>   (first [self] (String. ^bytes (first pv)))
>>>>   (next [self] (->StringVec (next pv)))
>>>>   (more [self] (->StringVec (rest pv)))
>>>>   (cons [self s] (->StringVec (.cons ^clojure.lang.IPersistentVector 
>>>> pv (.getBytes ^String s))))
>>>>   (count [self] (count pv))
>>>>   Object
>>>>   (toString [self] (str (into [] self))))
>>>>
>>>> (defn stringvec [coll]
>>>>   (into (->StringVec []) coll))
>>>>
>>>> (defmethod print-method StringVec [v, ^java.io.Writer w]
>>>>   (.write w (.toString ^StringVec v)))
>>>>
>>>> Speak of cons, I gather ISeq cons is unrelated to cons, the function, 
>>>> but rather is required for conj?
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clo...@googlegroups.com
>>>> Note that posts from new members are moderated - please be patient with 
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clojure+u...@googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to clojure+u...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com 
>> <javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clojure+u...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to