Looks like it's based on the vector-of feature. I've seen this described elsewhere as a "compact" data structure, but it doesn't seem very compact. Maybe I'm using it wrong?
(apply vector-of :byte (into [] (.getBytes "foo"))) [102 111 111] Looks like this works. Applying to the same test file: (def g (mapv #(apply vector-of :byte (into [] (.getBytes ^String %))) (line-seq f))) OutOfMemoryError Java heap space java.util.Arrays.copyOf (Arrays.java:2894) :-p This is a 160M file. Trying again, with more memory, jhat tells me this: num #instances #bytes class name ---------------------------------------------- 1: 2000961 285940064 [Ljava.lang.Object; 2: 5726264 257130296 [B 3: 5721690 137320560 clojure.core.VecNode 4: 1917764 76710560 clojure.core.Vec ... Total 15717203 785217096 Started with about 28M, so loading a 160M file with 2 million lines consumed 757M, or nearly 5X the size of the data. Same as what I see with pjstadig.utf8. On Tuesday, October 28, 2014 4:50:04 PM UTC-7, Andy Fingerhut wrote: > > I am not certain whether Paul's intention was a lower memory footprint > than Java's strings, but I can't think of a strong reason to use his > library other than that. Would you be willing to file an issue on Github > with your findings to see if he thinks there is a problem there? > > Also, his code itself may have some answers to your questions about how to > make new data structures satisfy some Clojure abstractions. > > Andy > > On Tue, Oct 28, 2014 at 3:57 PM, Brian Craft <craft...@gmail.com > <javascript:>> wrote: > >> Hm, I just tested it with a 160M file, split into 2 million strings. The >> memory footprint was substantially worse than String. With String I get >> about a 3.2X increase in memory footprint for this file, but with >> pjstadig.utf8 I get about 5X. Also, it's slower by orders of magnitude. >> >> On Tuesday, October 28, 2014 12:49:53 PM UTC-7, Andy Fingerhut wrote: >>> >>> Sorry, no feedback on your attempt, but a note that you may want to >>> check out Paul Stadig's utf8 library to see if it serves your purpose. I >>> believe it should store text that fits within the ASCII subset into 1 byte >>> of memory per character, only using 2, 3, or 4 bytes for other Unicode >>> characters, depending on the code point. >>> >>> https://github.com/pjstadig/utf8 >>> >>> Andy >>> >>> On Tue, Oct 28, 2014 at 12:24 PM, Brian Craft <craft...@gmail.com> >>> wrote: >>> >>>> Following up on the thread about the massive overhead of String, I >>>> tried writing a string collection type that stores strings as bytes, >>>> converting to String on-demand. It seems to work. Memory footprint and >>>> performance are good for the application. >>>> >>>> The hard part was trying to track down the correct interfaces and >>>> invocations. I note that "Clojure Programming" makes the same observation >>>> in the section about clojure abstractions: "such things are largely >>>> undocumented". I guess this situation hasn't improved? I had to proceed >>>> mostly by experimentation, and am still unclear on, for example, why I >>>> needed to use an interop call in some places (like cons), but should not >>>> in >>>> others. >>>> >>>> Would be happy for any feedback on this attempt: >>>> >>>> (deftype StringVec [pv] >>>> clojure.lang.IPersistentVector >>>> (seq [self] (map #(String. ^bytes %) pv)) >>>> (nth [self i] (String. ^bytes (.nth ^clojure.lang.IPersistentVector >>>> pv i))) >>>> (nth [self i notfound] (String. ^bytes (.nth >>>> ^clojure.lang.IPersistentVector >>>> pv i (.getBytes ^String notfound)))) >>>> clojure.lang.ILookup >>>> (valAt [self i] (when-let [res (.valAt ^clojure.lang.IPersistentVector >>>> pv i)] >>>> (String. ^bytes res))) >>>> (valAt [self i notfound] (String. ^bytes (.valAt >>>> ^clojure.lang.IPersistentVector >>>> pv i (.getBytes ^String notfound)))) >>>> clojure.lang.ISeq >>>> (first [self] (String. ^bytes (first pv))) >>>> (next [self] (->StringVec (next pv))) >>>> (more [self] (->StringVec (rest pv))) >>>> (cons [self s] (->StringVec (.cons ^clojure.lang.IPersistentVector >>>> pv (.getBytes ^String s)))) >>>> (count [self] (count pv)) >>>> Object >>>> (toString [self] (str (into [] self)))) >>>> >>>> (defn stringvec [coll] >>>> (into (->StringVec []) coll)) >>>> >>>> (defmethod print-method StringVec [v, ^java.io.Writer w] >>>> (.write w (.toString ^StringVec v))) >>>> >>>> Speak of cons, I gather ISeq cons is unrelated to cons, the function, >>>> but rather is required for conj? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To post to this group, send email to clo...@googlegroups.com >>>> Note that posts from new members are moderated - please be patient with >>>> your first post. >>>> To unsubscribe from this group, send email to >>>> clojure+u...@googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/group/clojure?hl=en >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to clojure+u...@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clo...@googlegroups.com >> <javascript:> >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+u...@googlegroups.com <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+u...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.